16
Research Article Novel Receiver Architecture for LTE-A Downlink Physical Control Format Indicator Channel with Diversity S. Syed Ameer Abbas, 1 S. J. Thiruvengadam, 2 and S. Susithra 1 1 Department of Electronics and Communication Engineering, Mepco Schlenk Engineering College, Sivakasi 626 005, India 2 Department of Electronics and Communication Engineering, iagarajar College of Engineering, Madurai 625 015, India Correspondence should be addressed to S. Syed Ameer Abbas; abbas [email protected] Received 14 November 2013; Revised 27 April 2014; Accepted 5 May 2014; Published 5 June 2014 Academic Editor: Chien-In Henry Chen Copyright © 2014 S. Syed Ameer Abbas et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Physical control format indicator channel (PCFICH) carries the control information about the number of orthogonal frequency division multiplexing (OFDM) symbols used for transmission of control information in long term evolution-advanced (LTE- A) downlink system. In this paper, two novel low complexity receiver architectures are proposed to implement the maximum likelihood- (ML-) based algorithm which decodes the CFI value in field programmable gate array (FPGA) at user equipment (UE). e performance of the proposed architectures is analyzed in terms of the timing cycles, operational resource requirement, and resource complexity. In LTE-A, base station and UE have multiple antenna ports to provide transmit and receive diversities. e proposed architectures are implemented in Virtex-6 xc6vlx240tff1156-1 FPGA device for various antenna configurations at base station and UE. When multiple antenna ports are used at base station, transmit diversity is obtained by applying the concept of space frequency block code (SFBC). It is shown that the proposed architectures use minimum number of operational units in FPGA compared to the traditional direct method of implementation. 1. Introduction e goal of third generation partnership project (3GPP) long term evolution-advanced (LTE-A) wireless standard is to increase the capacity and speed of wireless data communi- cation. e LTE-A physical layer is a highly efficient means of conveying both data and control information between an enhanced base station, popularly known as eNodeB, and mobile user equipment (UE). It supports both frequency division duplex (FDD) and time division duplex (TDD) configurations in uplink and downlink operations. Further, it provides a wide range of system bandwidths in order to operate in a large number of different spectrum allocations [1]. LTE-A standard has six physical channels for downlink. ey are physical broadcast channel (PBCH), physical down- link shared channel (PDSCH), physical multicast channel (PMCH), physical downlink control channel (PDCCH), physical hybrid automatic repeat request (ARQ) indicator channel (PHICH), and physical control format indicator channel (PCFICH). PBCH carries the basic system informa- tion for the other channels to be configured and operated in the LTE-A grid. e PDSCH is the main data-bearing channel. PMCH is defined for future use. In LTE-A, the control signals are transmitted at the start of each subframe in the LTE-A grid. PDCCH is used to carry the scheduling information of different types such as downlink resource scheduling and uplink power control instructions. PHICH is used to send the acknowledgement/negative acknowledge- ment bit to UEs to indicate whether the uplink user data is correctly received or not. PCFICH carries the control information about the number of orthogonal frequency divi- sion multiplexing (OFDM) symbols used for transmission of downlink control information. e high data rate in LTE-A requires high processing demands on all layers of the system which includes high digital signal processing (DSP) hardware processing in the physical layer. Further, the hardware implementation of receiver structures of various physical channels in LTE-A becomes a challenging task as the computational complexity increases. Hindawi Publishing Corporation VLSI Design Volume 2014, Article ID 825183, 15 pages http://dx.doi.org/10.1155/2014/825183

Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

Research ArticleNovel Receiver Architecture for LTE-A Downlink PhysicalControl Format Indicator Channel with Diversity

S Syed Ameer Abbas1 S J Thiruvengadam2 and S Susithra1

1 Department of Electronics and Communication Engineering Mepco Schlenk Engineering College Sivakasi 626 005 India2Department of Electronics and Communication Engineering Thiagarajar College of Engineering Madurai 625 015 India

Correspondence should be addressed to S Syed Ameer Abbas abbas mepcoyahoocom

Received 14 November 2013 Revised 27 April 2014 Accepted 5 May 2014 Published 5 June 2014

Academic Editor Chien-In Henry Chen

Copyright copy 2014 S Syed Ameer Abbas et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Physical control format indicator channel (PCFICH) carries the control information about the number of orthogonal frequencydivision multiplexing (OFDM) symbols used for transmission of control information in long term evolution-advanced (LTE-A) downlink system In this paper two novel low complexity receiver architectures are proposed to implement the maximumlikelihood- (ML-) based algorithm which decodes the CFI value in field programmable gate array (FPGA) at user equipment (UE)The performance of the proposed architectures is analyzed in terms of the timing cycles operational resource requirement andresource complexity In LTE-A base station and UE have multiple antenna ports to provide transmit and receive diversities Theproposed architectures are implemented in Virtex-6 xc6vlx240tff1156-1 FPGA device for various antenna configurations at basestation and UE When multiple antenna ports are used at base station transmit diversity is obtained by applying the concept ofspace frequency block code (SFBC) It is shown that the proposed architectures use minimum number of operational units inFPGA compared to the traditional direct method of implementation

1 Introduction

The goal of third generation partnership project (3GPP) longterm evolution-advanced (LTE-A) wireless standard is toincrease the capacity and speed of wireless data communi-cation The LTE-A physical layer is a highly efficient meansof conveying both data and control information betweenan enhanced base station popularly known as eNodeB andmobile user equipment (UE) It supports both frequencydivision duplex (FDD) and time division duplex (TDD)configurations in uplink and downlink operations Furtherit provides a wide range of system bandwidths in order tooperate in a large number of different spectrum allocations[1]

LTE-A standard has six physical channels for downlinkThey are physical broadcast channel (PBCH) physical down-link shared channel (PDSCH) physical multicast channel(PMCH) physical downlink control channel (PDCCH)physical hybrid automatic repeat request (ARQ) indicatorchannel (PHICH) and physical control format indicator

channel (PCFICH) PBCH carries the basic system informa-tion for the other channels to be configured and operatedin the LTE-A grid The PDSCH is the main data-bearingchannel PMCH is defined for future use In LTE-A thecontrol signals are transmitted at the start of each subframein the LTE-A grid PDCCH is used to carry the schedulinginformation of different types such as downlink resourcescheduling and uplink power control instructions PHICHis used to send the acknowledgementnegative acknowledge-ment bit to UEs to indicate whether the uplink user datais correctly received or not PCFICH carries the controlinformation about the number of orthogonal frequency divi-sion multiplexing (OFDM) symbols used for transmissionof downlink control information The high data rate inLTE-A requires high processing demands on all layers ofthe system which includes high digital signal processing(DSP) hardware processing in the physical layer Further thehardware implementation of receiver structures of variousphysical channels in LTE-A becomes a challenging task as thecomputational complexity increases

Hindawi Publishing CorporationVLSI DesignVolume 2014 Article ID 825183 15 pageshttpdxdoiorg1011552014825183

2 VLSI Design

In [2] receivers were designed for a 2 times 2 antenna systemand for quadrature phase shift keying (QPSK) modulationand quadrature amplitude modulation (16-QAM and 64-QAM) Though successive interference cancellation (SIC)receiver meets the timing requirements in the LTE systemit is complex and the K-best list sphere detector (K-LSD)receiver has high latency In [3] field programmable gatearray (FPGA) and application specific integrated circuit(ASIC) implementations of receivers based on the linearminimummean-square error (LMMSE) the K-LSD iterativesuccessive interference cancellation (SIC) detector and theiterative K-LSD algorithms are carried out for spatial multi-plexing based LTE-A system The SIC algorithm is found toperformworse than the K-LSDwhen theMIMO channels arehighly correlated while the performance difference dimin-ishes when the correlation decreases The ASIC receiversare designed to meet the decoding throughput requirementsin LTE and the K-LSD is found to be the most complexreceiver although it gives the best reliable data transmissionthroughput It is shown that the receiver architecture whichcould be reconfigured to use a simple or a more complexdetector as the channel conditions change would achieve thebest performancewhile consuming the least amount of powerin the receiver FPGA implementation of MIMO detectorbased on two typical sphere decoding algorithms namely theViterbo-Boutros (VB) algorithm and the Schnorr-Euchner(SE) algorithm is carried out in [4] In this implementationmethod three levels of parallelism are explored to improvethe decoding rate the concurrent execution of the channelmatrix preprocessing on an embedded processor and thedecoding functions on customized hardware modules theparallel decoding of realimaginary parts for complex con-stellation and the concurrent execution of multiple stepsduring the closest lattice point search The implementationof low-complexity codebook searching engine is proposedto support both LTE and LTE-A operations [5] In [6]VLSI implementation of a low-complexity multiple inputmultiple output (MIMO) symbol detector based on a novelMIMOdetection algorithm calledmodified fixed-complexitysoft-output (MFCSO) detection is presented It includes amicrocode-controlled channel preprocessing unit separatechannel memory and a pipelined detection unit MATLAB-based downlink physical-layer simulator for LTE only forresearch applications is presented [7] In [8] maximumlikelihood- (ML-) based receiver structures are developedfor decoding the downlink control channels PCFICH andPHICH in LTE wireless standard and the performance ofthe receivers has been analyzed for various configurationsThe analytical results were validated against computer sim-ulations but hardware implementation of the structures wasnot coded or synthesized In [9] direct implementation ofreceive algorithms was carried out in FPGA for downlinkcontrol channels in LTE However most of these workseither propose architectures for FPGA implementation oranalyze the performance of various receiver structures in ageneralized manner The objective of this paper is to proposenovel architectures for FPGA implementation of transmit andreceive processing of downlink PCFICH channel in LTE-Astandard in particular

Table 1 CFI 32-bit block code

CFI ⟨11988731 119887

0⟩

1 01101101101101101101101101101101

2 10110110110110110110110110110110

3 11011011011011011011011011011011

4 00000000000000000000000000000000

11 Transmit and Receive Processing of PCFICH In PCFICHthe control format indicator (CFI) contains a 32-bit codeword that represents the value of CFI as 1 2 3 or 4 The CFIinforms the UE about the number of OFDM symbols usedfor the transmission of PDCCH information in a subframeThe 32-bit code word corresponding to the value of CFIis scrambled and QPSK modulated The resultant 16 QPSKcomplex symbols are mapped to the resource elements of thefirst OFDM symbol of every subframe after layer mappingand precoding to obtain transmit diversity when two ormore antenna ports are used at eNodeB [10] The 32-bit codewords for the four possible values of CFI are given in Table 1A general block diagram of the transmitter and receiverprocessing of PCFICH is shown in Figure 1

The OFDM signal is transmitted through a frequencyselective fading channel It is assumed that the number ofreceive antenna ports at UE is 119870 At each receive antennaport of theUE resource-element demapping follows the cyclicprefix removal and fast fourier transformation (FFT) The16times1 receive signal vector at each antenna port is equalized infrequency domain at each subcarrier using the corresponding16 times 1 channel frequency response vector The outputs offrequency domain equalizer from each antenna port aresummed up The resultant 16 times 1 complex vector is appliedto the maximum likelihood (ML) detector for detecting theCFI value The objective of this paper is to synthesize andimplement the receiver architecture for PCFICH

The paper is structured as follows Section 2 explains thesystem model and basic implementation architectures forsingle input single output (SISO) and single input multipleoutput (SIMO) configurations The system model and basicimplementation architecture for multiple input single output(MISO) and multiple input multiple output (MIMO) con-figurations are described in Sections 3 and 4 respectivelyThe proposed implementation architectures using foldingand superscalar methods are given in Section 5 for SISOSIMO MISO and MIMO configurations Section 6 analyzesthe performance of the proposed architectures and Section 7concludes the paper with remarks on future work

2 System Model and ImplementationArchitecture for SISO and SIMOConfigurations

The received signal model for SISO configuration of PCFICHis given by

y = h ∘ d (119898) + w (1)

VLSI Design 3

Maximum likelihood detector

CFI (1234)

Detected PCFICH information

signal vector

Mapping to 32 bits

Resource elementmapper

QPSK modulator

Baseband transmit processing at base station eNodeB

Receive processing at user equipment

OFDM symbol generator

Transmission channels

Input signal at antenna port 0

Input signal at Resource element demapper

Frequency domain equalization of each subcarrier

received signal vector

response vectorvectors for CFI 1 2 3 4

Resource element demapper

Frequency domain equalization of each subcarrier

received signal vector

Layermapping Precoding

Layers

Transmit antenna ports

middotmiddotmiddot

16 times 1

16 times 1 channel frequency

middotmiddotmiddot

middotmiddotmiddot

response vector16 times 1 channel frequency

16 times 1 precomputed data

sum

16 times 1 transmit

d(1) d(2) d(3) d(4)antenna port K minus 1

middot middot middot

16 times 1

Figure 1 Block diagram of transmitter and receiver processing

Precomputed data vectors

Detected CFI

RPB

CFI-1

RPB

CFI-2

RPB

CFI-3

RPB

CFI-4

CFI detector

d(1)

d(2)

d(3)

d(4)

16 times 1 channel frequency response vector h16 times 1 received signal vector y

r1

r2

r3

r4

Figure 2 Basic architecture for SISO configuration

where y = [1199100 1199101 119910

15]119879 is a 16 times 1 received signal

vector h = [ℎ0 ℎ1 ℎ

15]119879 is a 16 times 1 channel frequency

response vector d(119898) = [119889(119898)

0 119889(119898)

1 119889

(119898)

15]119879

is a 16 times 1complex QPSK symbol vector corresponding to CFI value

from the set 1 2 3 4 ldquo∘rdquo represents the element by elementmultiplication and w is a 16 times 1 additive white noise vectorand its elements are zero mean Gaussian random numberswith unit variance The objective is to detect the value ofCFI from the received signal vector y assuming the channelfrequency response vector h to be known Using maximumlikelihood (ML) principle CFI is detected as

CFI = argmin119898

10038171003817100381710038171003817y minus h ∘ d(119898)10038171003817100381710038171003817

2

(2)

Figure 2 shows the basic architecture for estimating CFIusing (2) in SISO configurationThe received signal vector yand the channel frequency response vector h are provided asinput to the four receiver processing blocks (RPB) along withprecomputed data vectorsd(1)d(2)d(3) andd(4)The internaldiagram for RPB CFI-1 is shown in Figure 3 It computesthe expression y minus h ∘ d(1)

2

assuming the CFI = 1 In RPB-m the precomputed data vector d(119898) is multiplied elementby element with the channel frequency response vector Theresultant (16 times 1) vector is subtracted from the (16 times 1)received signal vector y The sum of squared magnitude ofeach element in the resultant vector is the output of RPB

The inputs to the CFI detector are the 16-bit outputs ofRPBs 119903

1 1199032 1199033 and 119903

4 The CFI detector determines which

RPB output has minimum value The internal diagram forCFI detector circuit which has 4 comparator modules (CM)is shown in Figure 4 In CM-1 input 119903

2and onersquos complement

4 VLSI Design

d(1)0

d(1)1

d(1)15

minus

minus

minus

16 times 1 channel frequency

16 times 1 received signal vector

Output r1

response vector

[middot]2

[middot]2

[middot]2

h0 y0h1 middot middot middot h15 y1 middot middot middot y15

Re

Im

Re

Im

Re

Im

Precomputed data vectors

Figure 3 Internal architecture of receiver processing block (RPB CFI-1)

CFI

Sum

Sum

Carry

Sum

Carry

Sum

00011011

Generate

Multiplexer control input

CM-1

CM-3

CM-2

CM-4

16

16

1616

16

16

16

16

r1

r2

r3

r4

Carry1

Carry2

Cr1

Cr2

Cr3

Cr4

Sr1

Sr2

Sr3

Sr4

CFI = 1

CFI = 2

CFI = 3

CFI = 4

Figure 4 Internal architecture of CFI detector

of input 1199031are added If carry is generated then 119903

1is less than

1199032 The outputs Cr

1and Sr

1of the CM-1 are defined as

C1199031=

1199031

if carry1= ldquo1rdquo (119903

1lt 1199032)

0 if carry1= ldquo0rdquo (119903

1gt 1199032)

S1199031=

1199032

if carry1= ldquo0rdquo (119903

1gt 1199032)

0 if carry1= ldquo1rdquo (119903

1lt 1199032)

(3)

InCM-2 input 1199034and onersquos complement of input 119903

3are added

If carry is generated then 1199033is less than 119903

4 The outputs Cr

2

and Sr2of CM-2 are defined as

C1199032=

1199033

if carry2= ldquo1rdquo (119903

3lt 1199034)

0 if carry2= ldquo0rdquo (119903

3gt 1199034)

S1199032=

1199034

if carry2= ldquo0rdquo (119903

3gt 1199034)

0 if carry2= ldquo1rdquo (119903

3lt 1199034)

(4)

VLSI Design 5

Antenna ldquo0rdquo Antenna ldquo1rdquo

Precomputed data vectorsRPB

RPB

RPB

RPB

RPB

RPB

RPB

RPB

CFI detector Detected CFI

Precomputed data vectors

16 times 1 channel frequencyresponse vector h(0)16 times 1 received signalvector y (0)

16 times 1 channel frequencyresponse vector h(1)16 times 1 received signalvector y (1)

r(0)1

r(0)2

r(0)3

r(0)4

r1 r2 r3 r4

CFI(1)1

CFI(1)2

CFI(1)3

CFI(1)4

d(1)

d(2)

d(3)

d(4)

r(1)1

r(1)2

r(1)3

r(1)4

CFI(0)1

CFI(0)2

CFI(0)3

CFI(0)4

d(1)

d(2)

d(3)

d(4)

Figure 5 Basic architecture for SIMO configuration

A0

A1

OFDM subcarriers

d15 d14 d13 d12 d11 d10 d9 d8 d7 d6 d5 d4 d3 d2 d1 d0

dlowast14 minusdlowast15 dlowast12 minusdlowast13 dlowast10 minusdlowast11 dlowast8 minusdlowast9 dlowast6 minusdlowast7 dlowast4 minusdlowast5 dlowast2 minusdlowast3 dlowast0 minusdlowast1

Figure 6 Subcarrier mapping for SFBC for 2-antenna system

The multiplexer control input is activated based on theoutputs from CM-3 and CM-4 One of the four outputs Cr

3

Sr3 Cr4 and Sr

4would be ldquo1rdquo based on the minimum value

of four inputs 1199031 1199032 1199033 and 119903

4 respectively Based on this 00

01 10 or 11 in the multiplexer control unit would be activatedto obtain the detected CFI value

In SIMO the 16times1 receive signal vector at the 119896th receiveantenna is modeled as

y(119896) = h(119896) ∘ d(119898) + w119896 119896 = 0 1 2 119870 minus 1 (5)

where ldquo119870rdquo represents the number of receive antennas at UEh(119896) is 16 times 1 channel frequency response vector between thetransmit antenna and 119896th receive antenna and w

119896is 16 times 1

noise vector at 119896th receive antenna Now the objective is todetect the value ofCFI from the received signal vectors at eachreceive antenna assuming the channel frequency responsevectors at each receive antenna are knownThemaximal ratiocombining is carried out at the receiver Using maximumlikelihood (ML) principle CFI is estimated as [9]

CFI = min119898=1234

119870

sum

119896=1

10038171003817100381710038171003817y(119896) minus (h(119896) ∘ d(119898))10038171003817100381710038171003817

2

(6)

The basic architecture for estimating CFI using (6) in1 times 2 SIMO configuration shown in Figure 5 is similar to thebasic architecture of SISO configuration The received signalvector y(119896) and the channel frequency response vector h(119896)are provided as input to the four receiver processing blocks(RPB-CFI(119896)

119898) at 119896th receive antenna along with precomputed

data vectors d(1) d(2) d(3) and d(4)The outputs from themthRPB at 0th receive antenna 119903(0)

119898and 1st receive antenna 119903(1)

119898are

added to get themth input 119903119898of the CFI detector circuit

3 System Model and ImplementationArchitecture for MISO Configuration

In MISO and MIMO configurations space frequency blockcode (SFBC) based layer mapping and precoding are carriedout to obtain transmit diversity when two or more antennaports are used at eNodeB as per the 3GPP LTE wirelessstandard [1 11] It is assumed that 2 antenna ports are usedat eNodeB The 16 times 1 complex symbol vector output ofthe modulation mapper is applied to the layer mapper The8 times 1 symbol vectors at layer 0 and layer 1 are given by[1198890 1198892 1198894 11988961198898 11988910 11988912 and 119889

14] and [119889

1 1198893 1198895 1198897

1198899 11988911 11988913 and 119889

15] The precoding is carried out using

6 VLSI Design

PB-4

Detected CFI

PB-1

PB-2

PB-3

RDB-1

CFI detector

RDB

RDB-2

RDB-3

RDB-4

16 times 1 channel frequencyresponse vector

16 times 1 received signalvector y

16 times 1 precomputed signalvector s1

16 times 1 precomputed signalvector s2

16 times 1 precomputed signalvector s3

16 times 1 precomputed signalvector s4

h(0)h(1)

z

z(1)

z(2)

z(3)

z(4)

r1

r2

r3r4

Figure 7 Proposed MISO receiver architecture for PCFICH

16 times 1 channel frequency response vectors h(0) and h(1)

16 times 1 received signalvector y

h(0)lowast0 h(0)1 middot middot middot h(0)lowast14 h(0)15 h(1)lowast0 h(1)1 middot middot middot h(1)lowast14 h(1)15

C

C

y0Ylowast1

y4

Ylowast15

z0

z1

z14

z15

Figure 8 Internal architecture of MISO receiver decoding block (RDB)

the SFBC in the LTE-A standard The precoder output atantenna port 0 (119860

0) and antenna port 1 (119860

1) is shown in

Figure 6The notation ldquolowastrdquo represents the complex conjugate of the

symbol Basically in precoding a symbol 1198890from layer 0 and

a symbol 1198891from layer 1 are encoded such that the antenna

output is formulated using the orthogonal matrix given by

119860 = [11988901198891

minus119889lowast

1119889lowast

0

] (7)

VLSI Design 7

Precomputed data

Output vector from RDB1

Im

Re

Im

Re

Im

Re

Im

Re

[middot]2

[middot]2

[middot]2

[middot]2 Output r1

minus

minus

minus

minus

z(1)0

z(1)1

z(1)14

z(1)15

z0 z15z1 middot middot middot z14

Figure 9 Internal architecture of MISO processing block (PB-1)

16 times 1 channel frequency response vectorsh(00)h(01)h(10)h(11)

16 times 1 received signalvector y (0)

y (1)

16 times 1 precomputedsignal vector s(0)1

s(1)1

16 times 1 precomputedsignal vector s(0)2

s(1)2

16 times 1 precomputedsignal vector s(0)3

s(1)3

16 times 1 precomputedsignal vector s(0)4

s(1)4

r1

r2

r3

r4

z

z

(1)

z(2)

z(3)

z(4)

RDBM

RDBM-1

RDBM-2

RDBM-3

RDBM-4

PB-1

PB-2

PB-3

PB-4

CFIdetector

Detected CFI

Figure 10 Proposed MIMO receiver architecture for PCFICH

8 VLSI Design

16 times 1 receivedsignal vector

Y(0)0

Y(0)1

Y(1)lowast0

Y(1)lowast1

Y(0)14

Y(0)15

Y(1)lowast14

Y(1)lowast15

h(00)lowast0 minush(00)1 h(00)lowast14 minush(00)15 h(01)0 h(01)lowast15 h(10)0 minush(10)lowast1 h(10)14 minush(10)lowast15 minush(11)lowast0 h(11)1 minush(11)lowast14 h(11)15

Z0

Z1

Z14

Z15C

C

16 times 1 channel frequency response vectors h(00) h(01) h(10) and h(11)

h(01)lowast1 middot middot middot h(01)

14

Figure 11 Internal architecture receiver decoding block-MIMO (RDBM)

Re(y )

Im(y )

Re(h)

Im(h)

e

f

Figure 12 Multiplicands rearrangement for a single complexmultiplication block

This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as

[119910119894

119910lowast

119894+1

] = [ℎ(0)

119894minusℎ(1)

119894

ℎlowast(1)

119894+1ℎlowast(0)

119894+1

] [119889119894

119889lowast

119894+1

] + [119899119894

119899lowast

119894+1

]

for 119894 = 0 2 4 6 8 10 12 14

(8)

where ℎ(0)119894

represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive

antenna 119889119894is data symbol at 119894th subcarrier and 119899

119894is the

noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (9)

where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2

channel matrix d119894is 2 times 1 complex signal vector and n

119894is

2 times 1 noise vector The objective is to detect the elements119889119894and 119889

119894+1of the data vector d

119894 Assuming that the elements

of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z

119894is given

byz119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (10)

whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as

[119911119894

119911lowast

119894+1

] = [ℎlowast(0)

119894ℎ(1)

119894+1

minusℎlowast(1)

119894ℎ(0)

119894+1

] [119910119894

119910lowast

119894+1

]

for 119894 = 0 2 4 6 8 10 12 14(11)

The elements of decoder output are calculated as

119911119894= ℎlowast(0)

119894119910119894+ ℎ(1)

119894+1119910lowast

119894+1 for 119894 = 0 2 4 6 8 10 12 14

119911lowast

i+1 = minusℎlowast(1)

119894119910119894+ ℎ(0)

119894+1119910lowast

119894+1 for 119894 = 0 2 4 6 8 10 12 14

(12)

The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)

VLSI Design 9

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15Folded unit

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi+1

Imi Im+1

r1

Channel frequency response hi

Received signalvector y i

Precomputeddata si

minus

2l + 1 3 5 7 9 11 13 15

Figure 13 Illustration of folded architecture of RPB in SISO and SIMO

gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z

119894 119894 = 0 2 4 14 are stacked as

16 times 1 vector z = [1199111198790 119911119879

2 119911

119879

14]119879 The 16 times 1 precomputed

data vectors forCFI= 1234 are represented as s1 s2 s3and s4

respectivelyThe detailed structure of receiver decoding blocks (RDB)

is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s

1is the output 119903

1of PB-1 Similarly

1199032 1199033 and 119903

4are computed for CFI = 2 3 and 4 using PB-

2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903

4are applied to the CFI determination circuit

shown in Figure 4 to detect the CFI value

4 System Model and ImplementationArchitecture for MIMO Configuration

In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by

[[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]]

]

=

[[[

[

ℎ00

ℎ01ℎ10 ℎ11ℎlowast

01 minusℎlowast

00ℎlowast

11 minusℎlowast

10

]]]

]

[119889119894

119889lowast

119894+1

] +

[[[[[[[[

[

119899(0)

119894

119899(0)lowast

119894+1

119899(1)

119894

119899(1)lowast

119894+1

]]]]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14(13)

where ℎ119886119887

represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)

119894represents the noise in 119894th subcarrier in 119895th receive

antenna In vector form it is written as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (14)

where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel

frequency response vector at 119894th and (119894 + 1)th subcarrier d119894

is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is

4 times 1 noise vector The objective is to detect the elements 119889119894

and 119889119894+1

of the data vector d119894 Assuming that the elements of

channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by

z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)

whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as

[119911119894

119911lowast

+1

] = [ℎlowast

00minusℎlowast

10

ℎlowast

01minusℎlowast

11

ℎ01

ℎ11

minusℎ00minusℎ10

]

[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14

(16)

The decoder outputs are given by

119911119894= ℎlowast

00119910(0)

119894minus ℎlowast

10119910(0)

119894+1+ ℎ01119910(1)lowast

119894+ ℎ11119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

119911lowast

119894+1= ℎlowast

01119910(0)

119894minus ℎlowast

11119910(0)

119894+1minusℎ00119910(1)lowast

119894minusℎ10119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

(17)

10 VLSI Design

Channel frequencyresponse vector

Channel frequencyresponse vector

Received signalvector

Received signalvector

Precomputeddata vector z(k)i

from RDB-1using s1i

y i

y i

y i+1

y i+1

hi+1 hi+1

hihi

zi

Folded RDB unit

Folded PB unit

Conjugate

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi

Imi+1

Im+1

minus

r1

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO

The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10

Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)

1

s(0)2 s(0)3 and s(0)

4 respectively for antenna 0 and as s(1)

1

s(1)2 s(1)3 and s(1)

4 respectively for antenna 1 The received

signal vectors y(0)119894

and y(1)119894multiply with the four channel

estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911

119894 119894 = 0 2 4 14 are stacked as 16 times 1

vector z = [1199111198790 119911119879

2 119911

119879

14]119879 Similarly RDBM1 gives output

vector z(1) using the precomputed data vectors y(0)1

and y(1)1

and channel estimation vectors The architecture of PBs and

VLSI Design 11

clk

T1 T2 T3 T4

RPB

RPB

RPB

RPB

d

d

dd

d

d

r4

r3

r2

r1

Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879

1to 1198794)

the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903

1of PB1

Similarly 1199032 1199033 and 119903

4are computed for other CFIThe 119903

1 1199032

1199033 and 119903

4are compared to determine the minimum value by

the CFI detector shown in Figure 4

5 PCFICH Receiver Implementation Methods

The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach

51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891

with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by

119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910

(18)

It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as

119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]

minus Re 119910 Im ℎ + Im 119910Re ℎ

119891 = Re 119910 Im ℎ + Im 119910Re ℎ

(19)

Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12

52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time

The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903

1each requiring

twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911

119894 Four switches operating in system clock speed

are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911

119894 The output of the first adder

is passed to the second adder with a delay to accumulate thevalues 119911

0to 11991115into a register in subsequent clock cyclesThis

process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod

The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel

12 VLSI Design

clk

T1 T2 T3 T4 T5 T6

PB

PB

PB

PB

d

d

dd

d

d

dr4

r3

r2

r1

From other hardware lines

From other hardware lines

Complexmultiplications(real part) of a

hardware line from

Complex

multiplications

(imaginary part) of

a hardware line from RDB

RDB

Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)

Figure 17 Simulation waveform for PCFICH receiver

frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle

53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers

This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant

For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant

6 Results and Discussion

The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO

VLSI Design 13

clk clk

clk

clk

clk

clkDiversity[10]

Diversity[1] GND 1 o wide mux 1Address[10] Output[30]

RTL rom 2 4

3

2

1

0

e

e

e

e

Div0

Div1

Div2

Div3

SISO

SIMO

MISO

MIMO

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op

Choose

Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]

e[30]Out1[310]Out2[310]Out3[310]Out4[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Figure 18 RTL schematic for combined PCFICH architecture with diversity

Table 2 Performance of proposed architectures based on folding and superscalar method

Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power

SISODirect 125 245 125 5479 39081 25587 mdash

Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93

SIMODirect 250 494 250 10942 40278 24827 mdash

Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170

MISODirect 224 580 594 14880 43023 23243 mdash

Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208

MIMODirect 320 844 675 17380 56962 17555 mdash

Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382

Figure 19 Resource utilization graph for generalized architecture

configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements

In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time

frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations

A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture

14 VLSI Design

Table 3 Resource requirements of proposed architecture using folding method

Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019

Figure 20 Implemented device in FPGA editor

a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)

Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching

Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used

Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device

7 Conclusion

In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding

method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented

Conflict of Interests

The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests

Acknowledgments

The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully

References

[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012

[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008

[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010

[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008

[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013

[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010

[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in

VLSI Design 15

Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009

[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010

[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998

[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013

[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013

[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 2: Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

2 VLSI Design

In [2] receivers were designed for a 2 times 2 antenna systemand for quadrature phase shift keying (QPSK) modulationand quadrature amplitude modulation (16-QAM and 64-QAM) Though successive interference cancellation (SIC)receiver meets the timing requirements in the LTE systemit is complex and the K-best list sphere detector (K-LSD)receiver has high latency In [3] field programmable gatearray (FPGA) and application specific integrated circuit(ASIC) implementations of receivers based on the linearminimummean-square error (LMMSE) the K-LSD iterativesuccessive interference cancellation (SIC) detector and theiterative K-LSD algorithms are carried out for spatial multi-plexing based LTE-A system The SIC algorithm is found toperformworse than the K-LSDwhen theMIMO channels arehighly correlated while the performance difference dimin-ishes when the correlation decreases The ASIC receiversare designed to meet the decoding throughput requirementsin LTE and the K-LSD is found to be the most complexreceiver although it gives the best reliable data transmissionthroughput It is shown that the receiver architecture whichcould be reconfigured to use a simple or a more complexdetector as the channel conditions change would achieve thebest performancewhile consuming the least amount of powerin the receiver FPGA implementation of MIMO detectorbased on two typical sphere decoding algorithms namely theViterbo-Boutros (VB) algorithm and the Schnorr-Euchner(SE) algorithm is carried out in [4] In this implementationmethod three levels of parallelism are explored to improvethe decoding rate the concurrent execution of the channelmatrix preprocessing on an embedded processor and thedecoding functions on customized hardware modules theparallel decoding of realimaginary parts for complex con-stellation and the concurrent execution of multiple stepsduring the closest lattice point search The implementationof low-complexity codebook searching engine is proposedto support both LTE and LTE-A operations [5] In [6]VLSI implementation of a low-complexity multiple inputmultiple output (MIMO) symbol detector based on a novelMIMOdetection algorithm calledmodified fixed-complexitysoft-output (MFCSO) detection is presented It includes amicrocode-controlled channel preprocessing unit separatechannel memory and a pipelined detection unit MATLAB-based downlink physical-layer simulator for LTE only forresearch applications is presented [7] In [8] maximumlikelihood- (ML-) based receiver structures are developedfor decoding the downlink control channels PCFICH andPHICH in LTE wireless standard and the performance ofthe receivers has been analyzed for various configurationsThe analytical results were validated against computer sim-ulations but hardware implementation of the structures wasnot coded or synthesized In [9] direct implementation ofreceive algorithms was carried out in FPGA for downlinkcontrol channels in LTE However most of these workseither propose architectures for FPGA implementation oranalyze the performance of various receiver structures in ageneralized manner The objective of this paper is to proposenovel architectures for FPGA implementation of transmit andreceive processing of downlink PCFICH channel in LTE-Astandard in particular

Table 1 CFI 32-bit block code

CFI ⟨11988731 119887

0⟩

1 01101101101101101101101101101101

2 10110110110110110110110110110110

3 11011011011011011011011011011011

4 00000000000000000000000000000000

11 Transmit and Receive Processing of PCFICH In PCFICHthe control format indicator (CFI) contains a 32-bit codeword that represents the value of CFI as 1 2 3 or 4 The CFIinforms the UE about the number of OFDM symbols usedfor the transmission of PDCCH information in a subframeThe 32-bit code word corresponding to the value of CFIis scrambled and QPSK modulated The resultant 16 QPSKcomplex symbols are mapped to the resource elements of thefirst OFDM symbol of every subframe after layer mappingand precoding to obtain transmit diversity when two ormore antenna ports are used at eNodeB [10] The 32-bit codewords for the four possible values of CFI are given in Table 1A general block diagram of the transmitter and receiverprocessing of PCFICH is shown in Figure 1

The OFDM signal is transmitted through a frequencyselective fading channel It is assumed that the number ofreceive antenna ports at UE is 119870 At each receive antennaport of theUE resource-element demapping follows the cyclicprefix removal and fast fourier transformation (FFT) The16times1 receive signal vector at each antenna port is equalized infrequency domain at each subcarrier using the corresponding16 times 1 channel frequency response vector The outputs offrequency domain equalizer from each antenna port aresummed up The resultant 16 times 1 complex vector is appliedto the maximum likelihood (ML) detector for detecting theCFI value The objective of this paper is to synthesize andimplement the receiver architecture for PCFICH

The paper is structured as follows Section 2 explains thesystem model and basic implementation architectures forsingle input single output (SISO) and single input multipleoutput (SIMO) configurations The system model and basicimplementation architecture for multiple input single output(MISO) and multiple input multiple output (MIMO) con-figurations are described in Sections 3 and 4 respectivelyThe proposed implementation architectures using foldingand superscalar methods are given in Section 5 for SISOSIMO MISO and MIMO configurations Section 6 analyzesthe performance of the proposed architectures and Section 7concludes the paper with remarks on future work

2 System Model and ImplementationArchitecture for SISO and SIMOConfigurations

The received signal model for SISO configuration of PCFICHis given by

y = h ∘ d (119898) + w (1)

VLSI Design 3

Maximum likelihood detector

CFI (1234)

Detected PCFICH information

signal vector

Mapping to 32 bits

Resource elementmapper

QPSK modulator

Baseband transmit processing at base station eNodeB

Receive processing at user equipment

OFDM symbol generator

Transmission channels

Input signal at antenna port 0

Input signal at Resource element demapper

Frequency domain equalization of each subcarrier

received signal vector

response vectorvectors for CFI 1 2 3 4

Resource element demapper

Frequency domain equalization of each subcarrier

received signal vector

Layermapping Precoding

Layers

Transmit antenna ports

middotmiddotmiddot

16 times 1

16 times 1 channel frequency

middotmiddotmiddot

middotmiddotmiddot

response vector16 times 1 channel frequency

16 times 1 precomputed data

sum

16 times 1 transmit

d(1) d(2) d(3) d(4)antenna port K minus 1

middot middot middot

16 times 1

Figure 1 Block diagram of transmitter and receiver processing

Precomputed data vectors

Detected CFI

RPB

CFI-1

RPB

CFI-2

RPB

CFI-3

RPB

CFI-4

CFI detector

d(1)

d(2)

d(3)

d(4)

16 times 1 channel frequency response vector h16 times 1 received signal vector y

r1

r2

r3

r4

Figure 2 Basic architecture for SISO configuration

where y = [1199100 1199101 119910

15]119879 is a 16 times 1 received signal

vector h = [ℎ0 ℎ1 ℎ

15]119879 is a 16 times 1 channel frequency

response vector d(119898) = [119889(119898)

0 119889(119898)

1 119889

(119898)

15]119879

is a 16 times 1complex QPSK symbol vector corresponding to CFI value

from the set 1 2 3 4 ldquo∘rdquo represents the element by elementmultiplication and w is a 16 times 1 additive white noise vectorand its elements are zero mean Gaussian random numberswith unit variance The objective is to detect the value ofCFI from the received signal vector y assuming the channelfrequency response vector h to be known Using maximumlikelihood (ML) principle CFI is detected as

CFI = argmin119898

10038171003817100381710038171003817y minus h ∘ d(119898)10038171003817100381710038171003817

2

(2)

Figure 2 shows the basic architecture for estimating CFIusing (2) in SISO configurationThe received signal vector yand the channel frequency response vector h are provided asinput to the four receiver processing blocks (RPB) along withprecomputed data vectorsd(1)d(2)d(3) andd(4)The internaldiagram for RPB CFI-1 is shown in Figure 3 It computesthe expression y minus h ∘ d(1)

2

assuming the CFI = 1 In RPB-m the precomputed data vector d(119898) is multiplied elementby element with the channel frequency response vector Theresultant (16 times 1) vector is subtracted from the (16 times 1)received signal vector y The sum of squared magnitude ofeach element in the resultant vector is the output of RPB

The inputs to the CFI detector are the 16-bit outputs ofRPBs 119903

1 1199032 1199033 and 119903

4 The CFI detector determines which

RPB output has minimum value The internal diagram forCFI detector circuit which has 4 comparator modules (CM)is shown in Figure 4 In CM-1 input 119903

2and onersquos complement

4 VLSI Design

d(1)0

d(1)1

d(1)15

minus

minus

minus

16 times 1 channel frequency

16 times 1 received signal vector

Output r1

response vector

[middot]2

[middot]2

[middot]2

h0 y0h1 middot middot middot h15 y1 middot middot middot y15

Re

Im

Re

Im

Re

Im

Precomputed data vectors

Figure 3 Internal architecture of receiver processing block (RPB CFI-1)

CFI

Sum

Sum

Carry

Sum

Carry

Sum

00011011

Generate

Multiplexer control input

CM-1

CM-3

CM-2

CM-4

16

16

1616

16

16

16

16

r1

r2

r3

r4

Carry1

Carry2

Cr1

Cr2

Cr3

Cr4

Sr1

Sr2

Sr3

Sr4

CFI = 1

CFI = 2

CFI = 3

CFI = 4

Figure 4 Internal architecture of CFI detector

of input 1199031are added If carry is generated then 119903

1is less than

1199032 The outputs Cr

1and Sr

1of the CM-1 are defined as

C1199031=

1199031

if carry1= ldquo1rdquo (119903

1lt 1199032)

0 if carry1= ldquo0rdquo (119903

1gt 1199032)

S1199031=

1199032

if carry1= ldquo0rdquo (119903

1gt 1199032)

0 if carry1= ldquo1rdquo (119903

1lt 1199032)

(3)

InCM-2 input 1199034and onersquos complement of input 119903

3are added

If carry is generated then 1199033is less than 119903

4 The outputs Cr

2

and Sr2of CM-2 are defined as

C1199032=

1199033

if carry2= ldquo1rdquo (119903

3lt 1199034)

0 if carry2= ldquo0rdquo (119903

3gt 1199034)

S1199032=

1199034

if carry2= ldquo0rdquo (119903

3gt 1199034)

0 if carry2= ldquo1rdquo (119903

3lt 1199034)

(4)

VLSI Design 5

Antenna ldquo0rdquo Antenna ldquo1rdquo

Precomputed data vectorsRPB

RPB

RPB

RPB

RPB

RPB

RPB

RPB

CFI detector Detected CFI

Precomputed data vectors

16 times 1 channel frequencyresponse vector h(0)16 times 1 received signalvector y (0)

16 times 1 channel frequencyresponse vector h(1)16 times 1 received signalvector y (1)

r(0)1

r(0)2

r(0)3

r(0)4

r1 r2 r3 r4

CFI(1)1

CFI(1)2

CFI(1)3

CFI(1)4

d(1)

d(2)

d(3)

d(4)

r(1)1

r(1)2

r(1)3

r(1)4

CFI(0)1

CFI(0)2

CFI(0)3

CFI(0)4

d(1)

d(2)

d(3)

d(4)

Figure 5 Basic architecture for SIMO configuration

A0

A1

OFDM subcarriers

d15 d14 d13 d12 d11 d10 d9 d8 d7 d6 d5 d4 d3 d2 d1 d0

dlowast14 minusdlowast15 dlowast12 minusdlowast13 dlowast10 minusdlowast11 dlowast8 minusdlowast9 dlowast6 minusdlowast7 dlowast4 minusdlowast5 dlowast2 minusdlowast3 dlowast0 minusdlowast1

Figure 6 Subcarrier mapping for SFBC for 2-antenna system

The multiplexer control input is activated based on theoutputs from CM-3 and CM-4 One of the four outputs Cr

3

Sr3 Cr4 and Sr

4would be ldquo1rdquo based on the minimum value

of four inputs 1199031 1199032 1199033 and 119903

4 respectively Based on this 00

01 10 or 11 in the multiplexer control unit would be activatedto obtain the detected CFI value

In SIMO the 16times1 receive signal vector at the 119896th receiveantenna is modeled as

y(119896) = h(119896) ∘ d(119898) + w119896 119896 = 0 1 2 119870 minus 1 (5)

where ldquo119870rdquo represents the number of receive antennas at UEh(119896) is 16 times 1 channel frequency response vector between thetransmit antenna and 119896th receive antenna and w

119896is 16 times 1

noise vector at 119896th receive antenna Now the objective is todetect the value ofCFI from the received signal vectors at eachreceive antenna assuming the channel frequency responsevectors at each receive antenna are knownThemaximal ratiocombining is carried out at the receiver Using maximumlikelihood (ML) principle CFI is estimated as [9]

CFI = min119898=1234

119870

sum

119896=1

10038171003817100381710038171003817y(119896) minus (h(119896) ∘ d(119898))10038171003817100381710038171003817

2

(6)

The basic architecture for estimating CFI using (6) in1 times 2 SIMO configuration shown in Figure 5 is similar to thebasic architecture of SISO configuration The received signalvector y(119896) and the channel frequency response vector h(119896)are provided as input to the four receiver processing blocks(RPB-CFI(119896)

119898) at 119896th receive antenna along with precomputed

data vectors d(1) d(2) d(3) and d(4)The outputs from themthRPB at 0th receive antenna 119903(0)

119898and 1st receive antenna 119903(1)

119898are

added to get themth input 119903119898of the CFI detector circuit

3 System Model and ImplementationArchitecture for MISO Configuration

In MISO and MIMO configurations space frequency blockcode (SFBC) based layer mapping and precoding are carriedout to obtain transmit diversity when two or more antennaports are used at eNodeB as per the 3GPP LTE wirelessstandard [1 11] It is assumed that 2 antenna ports are usedat eNodeB The 16 times 1 complex symbol vector output ofthe modulation mapper is applied to the layer mapper The8 times 1 symbol vectors at layer 0 and layer 1 are given by[1198890 1198892 1198894 11988961198898 11988910 11988912 and 119889

14] and [119889

1 1198893 1198895 1198897

1198899 11988911 11988913 and 119889

15] The precoding is carried out using

6 VLSI Design

PB-4

Detected CFI

PB-1

PB-2

PB-3

RDB-1

CFI detector

RDB

RDB-2

RDB-3

RDB-4

16 times 1 channel frequencyresponse vector

16 times 1 received signalvector y

16 times 1 precomputed signalvector s1

16 times 1 precomputed signalvector s2

16 times 1 precomputed signalvector s3

16 times 1 precomputed signalvector s4

h(0)h(1)

z

z(1)

z(2)

z(3)

z(4)

r1

r2

r3r4

Figure 7 Proposed MISO receiver architecture for PCFICH

16 times 1 channel frequency response vectors h(0) and h(1)

16 times 1 received signalvector y

h(0)lowast0 h(0)1 middot middot middot h(0)lowast14 h(0)15 h(1)lowast0 h(1)1 middot middot middot h(1)lowast14 h(1)15

C

C

y0Ylowast1

y4

Ylowast15

z0

z1

z14

z15

Figure 8 Internal architecture of MISO receiver decoding block (RDB)

the SFBC in the LTE-A standard The precoder output atantenna port 0 (119860

0) and antenna port 1 (119860

1) is shown in

Figure 6The notation ldquolowastrdquo represents the complex conjugate of the

symbol Basically in precoding a symbol 1198890from layer 0 and

a symbol 1198891from layer 1 are encoded such that the antenna

output is formulated using the orthogonal matrix given by

119860 = [11988901198891

minus119889lowast

1119889lowast

0

] (7)

VLSI Design 7

Precomputed data

Output vector from RDB1

Im

Re

Im

Re

Im

Re

Im

Re

[middot]2

[middot]2

[middot]2

[middot]2 Output r1

minus

minus

minus

minus

z(1)0

z(1)1

z(1)14

z(1)15

z0 z15z1 middot middot middot z14

Figure 9 Internal architecture of MISO processing block (PB-1)

16 times 1 channel frequency response vectorsh(00)h(01)h(10)h(11)

16 times 1 received signalvector y (0)

y (1)

16 times 1 precomputedsignal vector s(0)1

s(1)1

16 times 1 precomputedsignal vector s(0)2

s(1)2

16 times 1 precomputedsignal vector s(0)3

s(1)3

16 times 1 precomputedsignal vector s(0)4

s(1)4

r1

r2

r3

r4

z

z

(1)

z(2)

z(3)

z(4)

RDBM

RDBM-1

RDBM-2

RDBM-3

RDBM-4

PB-1

PB-2

PB-3

PB-4

CFIdetector

Detected CFI

Figure 10 Proposed MIMO receiver architecture for PCFICH

8 VLSI Design

16 times 1 receivedsignal vector

Y(0)0

Y(0)1

Y(1)lowast0

Y(1)lowast1

Y(0)14

Y(0)15

Y(1)lowast14

Y(1)lowast15

h(00)lowast0 minush(00)1 h(00)lowast14 minush(00)15 h(01)0 h(01)lowast15 h(10)0 minush(10)lowast1 h(10)14 minush(10)lowast15 minush(11)lowast0 h(11)1 minush(11)lowast14 h(11)15

Z0

Z1

Z14

Z15C

C

16 times 1 channel frequency response vectors h(00) h(01) h(10) and h(11)

h(01)lowast1 middot middot middot h(01)

14

Figure 11 Internal architecture receiver decoding block-MIMO (RDBM)

Re(y )

Im(y )

Re(h)

Im(h)

e

f

Figure 12 Multiplicands rearrangement for a single complexmultiplication block

This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as

[119910119894

119910lowast

119894+1

] = [ℎ(0)

119894minusℎ(1)

119894

ℎlowast(1)

119894+1ℎlowast(0)

119894+1

] [119889119894

119889lowast

119894+1

] + [119899119894

119899lowast

119894+1

]

for 119894 = 0 2 4 6 8 10 12 14

(8)

where ℎ(0)119894

represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive

antenna 119889119894is data symbol at 119894th subcarrier and 119899

119894is the

noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (9)

where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2

channel matrix d119894is 2 times 1 complex signal vector and n

119894is

2 times 1 noise vector The objective is to detect the elements119889119894and 119889

119894+1of the data vector d

119894 Assuming that the elements

of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z

119894is given

byz119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (10)

whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as

[119911119894

119911lowast

119894+1

] = [ℎlowast(0)

119894ℎ(1)

119894+1

minusℎlowast(1)

119894ℎ(0)

119894+1

] [119910119894

119910lowast

119894+1

]

for 119894 = 0 2 4 6 8 10 12 14(11)

The elements of decoder output are calculated as

119911119894= ℎlowast(0)

119894119910119894+ ℎ(1)

119894+1119910lowast

119894+1 for 119894 = 0 2 4 6 8 10 12 14

119911lowast

i+1 = minusℎlowast(1)

119894119910119894+ ℎ(0)

119894+1119910lowast

119894+1 for 119894 = 0 2 4 6 8 10 12 14

(12)

The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)

VLSI Design 9

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15Folded unit

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi+1

Imi Im+1

r1

Channel frequency response hi

Received signalvector y i

Precomputeddata si

minus

2l + 1 3 5 7 9 11 13 15

Figure 13 Illustration of folded architecture of RPB in SISO and SIMO

gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z

119894 119894 = 0 2 4 14 are stacked as

16 times 1 vector z = [1199111198790 119911119879

2 119911

119879

14]119879 The 16 times 1 precomputed

data vectors forCFI= 1234 are represented as s1 s2 s3and s4

respectivelyThe detailed structure of receiver decoding blocks (RDB)

is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s

1is the output 119903

1of PB-1 Similarly

1199032 1199033 and 119903

4are computed for CFI = 2 3 and 4 using PB-

2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903

4are applied to the CFI determination circuit

shown in Figure 4 to detect the CFI value

4 System Model and ImplementationArchitecture for MIMO Configuration

In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by

[[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]]

]

=

[[[

[

ℎ00

ℎ01ℎ10 ℎ11ℎlowast

01 minusℎlowast

00ℎlowast

11 minusℎlowast

10

]]]

]

[119889119894

119889lowast

119894+1

] +

[[[[[[[[

[

119899(0)

119894

119899(0)lowast

119894+1

119899(1)

119894

119899(1)lowast

119894+1

]]]]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14(13)

where ℎ119886119887

represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)

119894represents the noise in 119894th subcarrier in 119895th receive

antenna In vector form it is written as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (14)

where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel

frequency response vector at 119894th and (119894 + 1)th subcarrier d119894

is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is

4 times 1 noise vector The objective is to detect the elements 119889119894

and 119889119894+1

of the data vector d119894 Assuming that the elements of

channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by

z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)

whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as

[119911119894

119911lowast

+1

] = [ℎlowast

00minusℎlowast

10

ℎlowast

01minusℎlowast

11

ℎ01

ℎ11

minusℎ00minusℎ10

]

[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14

(16)

The decoder outputs are given by

119911119894= ℎlowast

00119910(0)

119894minus ℎlowast

10119910(0)

119894+1+ ℎ01119910(1)lowast

119894+ ℎ11119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

119911lowast

119894+1= ℎlowast

01119910(0)

119894minus ℎlowast

11119910(0)

119894+1minusℎ00119910(1)lowast

119894minusℎ10119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

(17)

10 VLSI Design

Channel frequencyresponse vector

Channel frequencyresponse vector

Received signalvector

Received signalvector

Precomputeddata vector z(k)i

from RDB-1using s1i

y i

y i

y i+1

y i+1

hi+1 hi+1

hihi

zi

Folded RDB unit

Folded PB unit

Conjugate

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi

Imi+1

Im+1

minus

r1

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO

The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10

Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)

1

s(0)2 s(0)3 and s(0)

4 respectively for antenna 0 and as s(1)

1

s(1)2 s(1)3 and s(1)

4 respectively for antenna 1 The received

signal vectors y(0)119894

and y(1)119894multiply with the four channel

estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911

119894 119894 = 0 2 4 14 are stacked as 16 times 1

vector z = [1199111198790 119911119879

2 119911

119879

14]119879 Similarly RDBM1 gives output

vector z(1) using the precomputed data vectors y(0)1

and y(1)1

and channel estimation vectors The architecture of PBs and

VLSI Design 11

clk

T1 T2 T3 T4

RPB

RPB

RPB

RPB

d

d

dd

d

d

r4

r3

r2

r1

Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879

1to 1198794)

the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903

1of PB1

Similarly 1199032 1199033 and 119903

4are computed for other CFIThe 119903

1 1199032

1199033 and 119903

4are compared to determine the minimum value by

the CFI detector shown in Figure 4

5 PCFICH Receiver Implementation Methods

The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach

51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891

with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by

119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910

(18)

It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as

119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]

minus Re 119910 Im ℎ + Im 119910Re ℎ

119891 = Re 119910 Im ℎ + Im 119910Re ℎ

(19)

Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12

52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time

The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903

1each requiring

twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911

119894 Four switches operating in system clock speed

are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911

119894 The output of the first adder

is passed to the second adder with a delay to accumulate thevalues 119911

0to 11991115into a register in subsequent clock cyclesThis

process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod

The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel

12 VLSI Design

clk

T1 T2 T3 T4 T5 T6

PB

PB

PB

PB

d

d

dd

d

d

dr4

r3

r2

r1

From other hardware lines

From other hardware lines

Complexmultiplications(real part) of a

hardware line from

Complex

multiplications

(imaginary part) of

a hardware line from RDB

RDB

Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)

Figure 17 Simulation waveform for PCFICH receiver

frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle

53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers

This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant

For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant

6 Results and Discussion

The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO

VLSI Design 13

clk clk

clk

clk

clk

clkDiversity[10]

Diversity[1] GND 1 o wide mux 1Address[10] Output[30]

RTL rom 2 4

3

2

1

0

e

e

e

e

Div0

Div1

Div2

Div3

SISO

SIMO

MISO

MIMO

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op

Choose

Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]

e[30]Out1[310]Out2[310]Out3[310]Out4[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Figure 18 RTL schematic for combined PCFICH architecture with diversity

Table 2 Performance of proposed architectures based on folding and superscalar method

Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power

SISODirect 125 245 125 5479 39081 25587 mdash

Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93

SIMODirect 250 494 250 10942 40278 24827 mdash

Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170

MISODirect 224 580 594 14880 43023 23243 mdash

Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208

MIMODirect 320 844 675 17380 56962 17555 mdash

Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382

Figure 19 Resource utilization graph for generalized architecture

configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements

In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time

frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations

A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture

14 VLSI Design

Table 3 Resource requirements of proposed architecture using folding method

Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019

Figure 20 Implemented device in FPGA editor

a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)

Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching

Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used

Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device

7 Conclusion

In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding

method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented

Conflict of Interests

The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests

Acknowledgments

The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully

References

[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012

[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008

[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010

[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008

[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013

[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010

[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in

VLSI Design 15

Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009

[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010

[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998

[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013

[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013

[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 3: Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

VLSI Design 3

Maximum likelihood detector

CFI (1234)

Detected PCFICH information

signal vector

Mapping to 32 bits

Resource elementmapper

QPSK modulator

Baseband transmit processing at base station eNodeB

Receive processing at user equipment

OFDM symbol generator

Transmission channels

Input signal at antenna port 0

Input signal at Resource element demapper

Frequency domain equalization of each subcarrier

received signal vector

response vectorvectors for CFI 1 2 3 4

Resource element demapper

Frequency domain equalization of each subcarrier

received signal vector

Layermapping Precoding

Layers

Transmit antenna ports

middotmiddotmiddot

16 times 1

16 times 1 channel frequency

middotmiddotmiddot

middotmiddotmiddot

response vector16 times 1 channel frequency

16 times 1 precomputed data

sum

16 times 1 transmit

d(1) d(2) d(3) d(4)antenna port K minus 1

middot middot middot

16 times 1

Figure 1 Block diagram of transmitter and receiver processing

Precomputed data vectors

Detected CFI

RPB

CFI-1

RPB

CFI-2

RPB

CFI-3

RPB

CFI-4

CFI detector

d(1)

d(2)

d(3)

d(4)

16 times 1 channel frequency response vector h16 times 1 received signal vector y

r1

r2

r3

r4

Figure 2 Basic architecture for SISO configuration

where y = [1199100 1199101 119910

15]119879 is a 16 times 1 received signal

vector h = [ℎ0 ℎ1 ℎ

15]119879 is a 16 times 1 channel frequency

response vector d(119898) = [119889(119898)

0 119889(119898)

1 119889

(119898)

15]119879

is a 16 times 1complex QPSK symbol vector corresponding to CFI value

from the set 1 2 3 4 ldquo∘rdquo represents the element by elementmultiplication and w is a 16 times 1 additive white noise vectorand its elements are zero mean Gaussian random numberswith unit variance The objective is to detect the value ofCFI from the received signal vector y assuming the channelfrequency response vector h to be known Using maximumlikelihood (ML) principle CFI is detected as

CFI = argmin119898

10038171003817100381710038171003817y minus h ∘ d(119898)10038171003817100381710038171003817

2

(2)

Figure 2 shows the basic architecture for estimating CFIusing (2) in SISO configurationThe received signal vector yand the channel frequency response vector h are provided asinput to the four receiver processing blocks (RPB) along withprecomputed data vectorsd(1)d(2)d(3) andd(4)The internaldiagram for RPB CFI-1 is shown in Figure 3 It computesthe expression y minus h ∘ d(1)

2

assuming the CFI = 1 In RPB-m the precomputed data vector d(119898) is multiplied elementby element with the channel frequency response vector Theresultant (16 times 1) vector is subtracted from the (16 times 1)received signal vector y The sum of squared magnitude ofeach element in the resultant vector is the output of RPB

The inputs to the CFI detector are the 16-bit outputs ofRPBs 119903

1 1199032 1199033 and 119903

4 The CFI detector determines which

RPB output has minimum value The internal diagram forCFI detector circuit which has 4 comparator modules (CM)is shown in Figure 4 In CM-1 input 119903

2and onersquos complement

4 VLSI Design

d(1)0

d(1)1

d(1)15

minus

minus

minus

16 times 1 channel frequency

16 times 1 received signal vector

Output r1

response vector

[middot]2

[middot]2

[middot]2

h0 y0h1 middot middot middot h15 y1 middot middot middot y15

Re

Im

Re

Im

Re

Im

Precomputed data vectors

Figure 3 Internal architecture of receiver processing block (RPB CFI-1)

CFI

Sum

Sum

Carry

Sum

Carry

Sum

00011011

Generate

Multiplexer control input

CM-1

CM-3

CM-2

CM-4

16

16

1616

16

16

16

16

r1

r2

r3

r4

Carry1

Carry2

Cr1

Cr2

Cr3

Cr4

Sr1

Sr2

Sr3

Sr4

CFI = 1

CFI = 2

CFI = 3

CFI = 4

Figure 4 Internal architecture of CFI detector

of input 1199031are added If carry is generated then 119903

1is less than

1199032 The outputs Cr

1and Sr

1of the CM-1 are defined as

C1199031=

1199031

if carry1= ldquo1rdquo (119903

1lt 1199032)

0 if carry1= ldquo0rdquo (119903

1gt 1199032)

S1199031=

1199032

if carry1= ldquo0rdquo (119903

1gt 1199032)

0 if carry1= ldquo1rdquo (119903

1lt 1199032)

(3)

InCM-2 input 1199034and onersquos complement of input 119903

3are added

If carry is generated then 1199033is less than 119903

4 The outputs Cr

2

and Sr2of CM-2 are defined as

C1199032=

1199033

if carry2= ldquo1rdquo (119903

3lt 1199034)

0 if carry2= ldquo0rdquo (119903

3gt 1199034)

S1199032=

1199034

if carry2= ldquo0rdquo (119903

3gt 1199034)

0 if carry2= ldquo1rdquo (119903

3lt 1199034)

(4)

VLSI Design 5

Antenna ldquo0rdquo Antenna ldquo1rdquo

Precomputed data vectorsRPB

RPB

RPB

RPB

RPB

RPB

RPB

RPB

CFI detector Detected CFI

Precomputed data vectors

16 times 1 channel frequencyresponse vector h(0)16 times 1 received signalvector y (0)

16 times 1 channel frequencyresponse vector h(1)16 times 1 received signalvector y (1)

r(0)1

r(0)2

r(0)3

r(0)4

r1 r2 r3 r4

CFI(1)1

CFI(1)2

CFI(1)3

CFI(1)4

d(1)

d(2)

d(3)

d(4)

r(1)1

r(1)2

r(1)3

r(1)4

CFI(0)1

CFI(0)2

CFI(0)3

CFI(0)4

d(1)

d(2)

d(3)

d(4)

Figure 5 Basic architecture for SIMO configuration

A0

A1

OFDM subcarriers

d15 d14 d13 d12 d11 d10 d9 d8 d7 d6 d5 d4 d3 d2 d1 d0

dlowast14 minusdlowast15 dlowast12 minusdlowast13 dlowast10 minusdlowast11 dlowast8 minusdlowast9 dlowast6 minusdlowast7 dlowast4 minusdlowast5 dlowast2 minusdlowast3 dlowast0 minusdlowast1

Figure 6 Subcarrier mapping for SFBC for 2-antenna system

The multiplexer control input is activated based on theoutputs from CM-3 and CM-4 One of the four outputs Cr

3

Sr3 Cr4 and Sr

4would be ldquo1rdquo based on the minimum value

of four inputs 1199031 1199032 1199033 and 119903

4 respectively Based on this 00

01 10 or 11 in the multiplexer control unit would be activatedto obtain the detected CFI value

In SIMO the 16times1 receive signal vector at the 119896th receiveantenna is modeled as

y(119896) = h(119896) ∘ d(119898) + w119896 119896 = 0 1 2 119870 minus 1 (5)

where ldquo119870rdquo represents the number of receive antennas at UEh(119896) is 16 times 1 channel frequency response vector between thetransmit antenna and 119896th receive antenna and w

119896is 16 times 1

noise vector at 119896th receive antenna Now the objective is todetect the value ofCFI from the received signal vectors at eachreceive antenna assuming the channel frequency responsevectors at each receive antenna are knownThemaximal ratiocombining is carried out at the receiver Using maximumlikelihood (ML) principle CFI is estimated as [9]

CFI = min119898=1234

119870

sum

119896=1

10038171003817100381710038171003817y(119896) minus (h(119896) ∘ d(119898))10038171003817100381710038171003817

2

(6)

The basic architecture for estimating CFI using (6) in1 times 2 SIMO configuration shown in Figure 5 is similar to thebasic architecture of SISO configuration The received signalvector y(119896) and the channel frequency response vector h(119896)are provided as input to the four receiver processing blocks(RPB-CFI(119896)

119898) at 119896th receive antenna along with precomputed

data vectors d(1) d(2) d(3) and d(4)The outputs from themthRPB at 0th receive antenna 119903(0)

119898and 1st receive antenna 119903(1)

119898are

added to get themth input 119903119898of the CFI detector circuit

3 System Model and ImplementationArchitecture for MISO Configuration

In MISO and MIMO configurations space frequency blockcode (SFBC) based layer mapping and precoding are carriedout to obtain transmit diversity when two or more antennaports are used at eNodeB as per the 3GPP LTE wirelessstandard [1 11] It is assumed that 2 antenna ports are usedat eNodeB The 16 times 1 complex symbol vector output ofthe modulation mapper is applied to the layer mapper The8 times 1 symbol vectors at layer 0 and layer 1 are given by[1198890 1198892 1198894 11988961198898 11988910 11988912 and 119889

14] and [119889

1 1198893 1198895 1198897

1198899 11988911 11988913 and 119889

15] The precoding is carried out using

6 VLSI Design

PB-4

Detected CFI

PB-1

PB-2

PB-3

RDB-1

CFI detector

RDB

RDB-2

RDB-3

RDB-4

16 times 1 channel frequencyresponse vector

16 times 1 received signalvector y

16 times 1 precomputed signalvector s1

16 times 1 precomputed signalvector s2

16 times 1 precomputed signalvector s3

16 times 1 precomputed signalvector s4

h(0)h(1)

z

z(1)

z(2)

z(3)

z(4)

r1

r2

r3r4

Figure 7 Proposed MISO receiver architecture for PCFICH

16 times 1 channel frequency response vectors h(0) and h(1)

16 times 1 received signalvector y

h(0)lowast0 h(0)1 middot middot middot h(0)lowast14 h(0)15 h(1)lowast0 h(1)1 middot middot middot h(1)lowast14 h(1)15

C

C

y0Ylowast1

y4

Ylowast15

z0

z1

z14

z15

Figure 8 Internal architecture of MISO receiver decoding block (RDB)

the SFBC in the LTE-A standard The precoder output atantenna port 0 (119860

0) and antenna port 1 (119860

1) is shown in

Figure 6The notation ldquolowastrdquo represents the complex conjugate of the

symbol Basically in precoding a symbol 1198890from layer 0 and

a symbol 1198891from layer 1 are encoded such that the antenna

output is formulated using the orthogonal matrix given by

119860 = [11988901198891

minus119889lowast

1119889lowast

0

] (7)

VLSI Design 7

Precomputed data

Output vector from RDB1

Im

Re

Im

Re

Im

Re

Im

Re

[middot]2

[middot]2

[middot]2

[middot]2 Output r1

minus

minus

minus

minus

z(1)0

z(1)1

z(1)14

z(1)15

z0 z15z1 middot middot middot z14

Figure 9 Internal architecture of MISO processing block (PB-1)

16 times 1 channel frequency response vectorsh(00)h(01)h(10)h(11)

16 times 1 received signalvector y (0)

y (1)

16 times 1 precomputedsignal vector s(0)1

s(1)1

16 times 1 precomputedsignal vector s(0)2

s(1)2

16 times 1 precomputedsignal vector s(0)3

s(1)3

16 times 1 precomputedsignal vector s(0)4

s(1)4

r1

r2

r3

r4

z

z

(1)

z(2)

z(3)

z(4)

RDBM

RDBM-1

RDBM-2

RDBM-3

RDBM-4

PB-1

PB-2

PB-3

PB-4

CFIdetector

Detected CFI

Figure 10 Proposed MIMO receiver architecture for PCFICH

8 VLSI Design

16 times 1 receivedsignal vector

Y(0)0

Y(0)1

Y(1)lowast0

Y(1)lowast1

Y(0)14

Y(0)15

Y(1)lowast14

Y(1)lowast15

h(00)lowast0 minush(00)1 h(00)lowast14 minush(00)15 h(01)0 h(01)lowast15 h(10)0 minush(10)lowast1 h(10)14 minush(10)lowast15 minush(11)lowast0 h(11)1 minush(11)lowast14 h(11)15

Z0

Z1

Z14

Z15C

C

16 times 1 channel frequency response vectors h(00) h(01) h(10) and h(11)

h(01)lowast1 middot middot middot h(01)

14

Figure 11 Internal architecture receiver decoding block-MIMO (RDBM)

Re(y )

Im(y )

Re(h)

Im(h)

e

f

Figure 12 Multiplicands rearrangement for a single complexmultiplication block

This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as

[119910119894

119910lowast

119894+1

] = [ℎ(0)

119894minusℎ(1)

119894

ℎlowast(1)

119894+1ℎlowast(0)

119894+1

] [119889119894

119889lowast

119894+1

] + [119899119894

119899lowast

119894+1

]

for 119894 = 0 2 4 6 8 10 12 14

(8)

where ℎ(0)119894

represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive

antenna 119889119894is data symbol at 119894th subcarrier and 119899

119894is the

noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (9)

where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2

channel matrix d119894is 2 times 1 complex signal vector and n

119894is

2 times 1 noise vector The objective is to detect the elements119889119894and 119889

119894+1of the data vector d

119894 Assuming that the elements

of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z

119894is given

byz119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (10)

whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as

[119911119894

119911lowast

119894+1

] = [ℎlowast(0)

119894ℎ(1)

119894+1

minusℎlowast(1)

119894ℎ(0)

119894+1

] [119910119894

119910lowast

119894+1

]

for 119894 = 0 2 4 6 8 10 12 14(11)

The elements of decoder output are calculated as

119911119894= ℎlowast(0)

119894119910119894+ ℎ(1)

119894+1119910lowast

119894+1 for 119894 = 0 2 4 6 8 10 12 14

119911lowast

i+1 = minusℎlowast(1)

119894119910119894+ ℎ(0)

119894+1119910lowast

119894+1 for 119894 = 0 2 4 6 8 10 12 14

(12)

The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)

VLSI Design 9

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15Folded unit

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi+1

Imi Im+1

r1

Channel frequency response hi

Received signalvector y i

Precomputeddata si

minus

2l + 1 3 5 7 9 11 13 15

Figure 13 Illustration of folded architecture of RPB in SISO and SIMO

gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z

119894 119894 = 0 2 4 14 are stacked as

16 times 1 vector z = [1199111198790 119911119879

2 119911

119879

14]119879 The 16 times 1 precomputed

data vectors forCFI= 1234 are represented as s1 s2 s3and s4

respectivelyThe detailed structure of receiver decoding blocks (RDB)

is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s

1is the output 119903

1of PB-1 Similarly

1199032 1199033 and 119903

4are computed for CFI = 2 3 and 4 using PB-

2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903

4are applied to the CFI determination circuit

shown in Figure 4 to detect the CFI value

4 System Model and ImplementationArchitecture for MIMO Configuration

In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by

[[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]]

]

=

[[[

[

ℎ00

ℎ01ℎ10 ℎ11ℎlowast

01 minusℎlowast

00ℎlowast

11 minusℎlowast

10

]]]

]

[119889119894

119889lowast

119894+1

] +

[[[[[[[[

[

119899(0)

119894

119899(0)lowast

119894+1

119899(1)

119894

119899(1)lowast

119894+1

]]]]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14(13)

where ℎ119886119887

represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)

119894represents the noise in 119894th subcarrier in 119895th receive

antenna In vector form it is written as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (14)

where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel

frequency response vector at 119894th and (119894 + 1)th subcarrier d119894

is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is

4 times 1 noise vector The objective is to detect the elements 119889119894

and 119889119894+1

of the data vector d119894 Assuming that the elements of

channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by

z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)

whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as

[119911119894

119911lowast

+1

] = [ℎlowast

00minusℎlowast

10

ℎlowast

01minusℎlowast

11

ℎ01

ℎ11

minusℎ00minusℎ10

]

[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14

(16)

The decoder outputs are given by

119911119894= ℎlowast

00119910(0)

119894minus ℎlowast

10119910(0)

119894+1+ ℎ01119910(1)lowast

119894+ ℎ11119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

119911lowast

119894+1= ℎlowast

01119910(0)

119894minus ℎlowast

11119910(0)

119894+1minusℎ00119910(1)lowast

119894minusℎ10119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

(17)

10 VLSI Design

Channel frequencyresponse vector

Channel frequencyresponse vector

Received signalvector

Received signalvector

Precomputeddata vector z(k)i

from RDB-1using s1i

y i

y i

y i+1

y i+1

hi+1 hi+1

hihi

zi

Folded RDB unit

Folded PB unit

Conjugate

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi

Imi+1

Im+1

minus

r1

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO

The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10

Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)

1

s(0)2 s(0)3 and s(0)

4 respectively for antenna 0 and as s(1)

1

s(1)2 s(1)3 and s(1)

4 respectively for antenna 1 The received

signal vectors y(0)119894

and y(1)119894multiply with the four channel

estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911

119894 119894 = 0 2 4 14 are stacked as 16 times 1

vector z = [1199111198790 119911119879

2 119911

119879

14]119879 Similarly RDBM1 gives output

vector z(1) using the precomputed data vectors y(0)1

and y(1)1

and channel estimation vectors The architecture of PBs and

VLSI Design 11

clk

T1 T2 T3 T4

RPB

RPB

RPB

RPB

d

d

dd

d

d

r4

r3

r2

r1

Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879

1to 1198794)

the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903

1of PB1

Similarly 1199032 1199033 and 119903

4are computed for other CFIThe 119903

1 1199032

1199033 and 119903

4are compared to determine the minimum value by

the CFI detector shown in Figure 4

5 PCFICH Receiver Implementation Methods

The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach

51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891

with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by

119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910

(18)

It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as

119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]

minus Re 119910 Im ℎ + Im 119910Re ℎ

119891 = Re 119910 Im ℎ + Im 119910Re ℎ

(19)

Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12

52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time

The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903

1each requiring

twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911

119894 Four switches operating in system clock speed

are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911

119894 The output of the first adder

is passed to the second adder with a delay to accumulate thevalues 119911

0to 11991115into a register in subsequent clock cyclesThis

process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod

The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel

12 VLSI Design

clk

T1 T2 T3 T4 T5 T6

PB

PB

PB

PB

d

d

dd

d

d

dr4

r3

r2

r1

From other hardware lines

From other hardware lines

Complexmultiplications(real part) of a

hardware line from

Complex

multiplications

(imaginary part) of

a hardware line from RDB

RDB

Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)

Figure 17 Simulation waveform for PCFICH receiver

frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle

53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers

This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant

For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant

6 Results and Discussion

The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO

VLSI Design 13

clk clk

clk

clk

clk

clkDiversity[10]

Diversity[1] GND 1 o wide mux 1Address[10] Output[30]

RTL rom 2 4

3

2

1

0

e

e

e

e

Div0

Div1

Div2

Div3

SISO

SIMO

MISO

MIMO

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op

Choose

Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]

e[30]Out1[310]Out2[310]Out3[310]Out4[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Figure 18 RTL schematic for combined PCFICH architecture with diversity

Table 2 Performance of proposed architectures based on folding and superscalar method

Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power

SISODirect 125 245 125 5479 39081 25587 mdash

Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93

SIMODirect 250 494 250 10942 40278 24827 mdash

Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170

MISODirect 224 580 594 14880 43023 23243 mdash

Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208

MIMODirect 320 844 675 17380 56962 17555 mdash

Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382

Figure 19 Resource utilization graph for generalized architecture

configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements

In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time

frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations

A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture

14 VLSI Design

Table 3 Resource requirements of proposed architecture using folding method

Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019

Figure 20 Implemented device in FPGA editor

a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)

Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching

Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used

Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device

7 Conclusion

In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding

method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented

Conflict of Interests

The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests

Acknowledgments

The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully

References

[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012

[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008

[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010

[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008

[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013

[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010

[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in

VLSI Design 15

Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009

[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010

[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998

[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013

[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013

[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 4: Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

4 VLSI Design

d(1)0

d(1)1

d(1)15

minus

minus

minus

16 times 1 channel frequency

16 times 1 received signal vector

Output r1

response vector

[middot]2

[middot]2

[middot]2

h0 y0h1 middot middot middot h15 y1 middot middot middot y15

Re

Im

Re

Im

Re

Im

Precomputed data vectors

Figure 3 Internal architecture of receiver processing block (RPB CFI-1)

CFI

Sum

Sum

Carry

Sum

Carry

Sum

00011011

Generate

Multiplexer control input

CM-1

CM-3

CM-2

CM-4

16

16

1616

16

16

16

16

r1

r2

r3

r4

Carry1

Carry2

Cr1

Cr2

Cr3

Cr4

Sr1

Sr2

Sr3

Sr4

CFI = 1

CFI = 2

CFI = 3

CFI = 4

Figure 4 Internal architecture of CFI detector

of input 1199031are added If carry is generated then 119903

1is less than

1199032 The outputs Cr

1and Sr

1of the CM-1 are defined as

C1199031=

1199031

if carry1= ldquo1rdquo (119903

1lt 1199032)

0 if carry1= ldquo0rdquo (119903

1gt 1199032)

S1199031=

1199032

if carry1= ldquo0rdquo (119903

1gt 1199032)

0 if carry1= ldquo1rdquo (119903

1lt 1199032)

(3)

InCM-2 input 1199034and onersquos complement of input 119903

3are added

If carry is generated then 1199033is less than 119903

4 The outputs Cr

2

and Sr2of CM-2 are defined as

C1199032=

1199033

if carry2= ldquo1rdquo (119903

3lt 1199034)

0 if carry2= ldquo0rdquo (119903

3gt 1199034)

S1199032=

1199034

if carry2= ldquo0rdquo (119903

3gt 1199034)

0 if carry2= ldquo1rdquo (119903

3lt 1199034)

(4)

VLSI Design 5

Antenna ldquo0rdquo Antenna ldquo1rdquo

Precomputed data vectorsRPB

RPB

RPB

RPB

RPB

RPB

RPB

RPB

CFI detector Detected CFI

Precomputed data vectors

16 times 1 channel frequencyresponse vector h(0)16 times 1 received signalvector y (0)

16 times 1 channel frequencyresponse vector h(1)16 times 1 received signalvector y (1)

r(0)1

r(0)2

r(0)3

r(0)4

r1 r2 r3 r4

CFI(1)1

CFI(1)2

CFI(1)3

CFI(1)4

d(1)

d(2)

d(3)

d(4)

r(1)1

r(1)2

r(1)3

r(1)4

CFI(0)1

CFI(0)2

CFI(0)3

CFI(0)4

d(1)

d(2)

d(3)

d(4)

Figure 5 Basic architecture for SIMO configuration

A0

A1

OFDM subcarriers

d15 d14 d13 d12 d11 d10 d9 d8 d7 d6 d5 d4 d3 d2 d1 d0

dlowast14 minusdlowast15 dlowast12 minusdlowast13 dlowast10 minusdlowast11 dlowast8 minusdlowast9 dlowast6 minusdlowast7 dlowast4 minusdlowast5 dlowast2 minusdlowast3 dlowast0 minusdlowast1

Figure 6 Subcarrier mapping for SFBC for 2-antenna system

The multiplexer control input is activated based on theoutputs from CM-3 and CM-4 One of the four outputs Cr

3

Sr3 Cr4 and Sr

4would be ldquo1rdquo based on the minimum value

of four inputs 1199031 1199032 1199033 and 119903

4 respectively Based on this 00

01 10 or 11 in the multiplexer control unit would be activatedto obtain the detected CFI value

In SIMO the 16times1 receive signal vector at the 119896th receiveantenna is modeled as

y(119896) = h(119896) ∘ d(119898) + w119896 119896 = 0 1 2 119870 minus 1 (5)

where ldquo119870rdquo represents the number of receive antennas at UEh(119896) is 16 times 1 channel frequency response vector between thetransmit antenna and 119896th receive antenna and w

119896is 16 times 1

noise vector at 119896th receive antenna Now the objective is todetect the value ofCFI from the received signal vectors at eachreceive antenna assuming the channel frequency responsevectors at each receive antenna are knownThemaximal ratiocombining is carried out at the receiver Using maximumlikelihood (ML) principle CFI is estimated as [9]

CFI = min119898=1234

119870

sum

119896=1

10038171003817100381710038171003817y(119896) minus (h(119896) ∘ d(119898))10038171003817100381710038171003817

2

(6)

The basic architecture for estimating CFI using (6) in1 times 2 SIMO configuration shown in Figure 5 is similar to thebasic architecture of SISO configuration The received signalvector y(119896) and the channel frequency response vector h(119896)are provided as input to the four receiver processing blocks(RPB-CFI(119896)

119898) at 119896th receive antenna along with precomputed

data vectors d(1) d(2) d(3) and d(4)The outputs from themthRPB at 0th receive antenna 119903(0)

119898and 1st receive antenna 119903(1)

119898are

added to get themth input 119903119898of the CFI detector circuit

3 System Model and ImplementationArchitecture for MISO Configuration

In MISO and MIMO configurations space frequency blockcode (SFBC) based layer mapping and precoding are carriedout to obtain transmit diversity when two or more antennaports are used at eNodeB as per the 3GPP LTE wirelessstandard [1 11] It is assumed that 2 antenna ports are usedat eNodeB The 16 times 1 complex symbol vector output ofthe modulation mapper is applied to the layer mapper The8 times 1 symbol vectors at layer 0 and layer 1 are given by[1198890 1198892 1198894 11988961198898 11988910 11988912 and 119889

14] and [119889

1 1198893 1198895 1198897

1198899 11988911 11988913 and 119889

15] The precoding is carried out using

6 VLSI Design

PB-4

Detected CFI

PB-1

PB-2

PB-3

RDB-1

CFI detector

RDB

RDB-2

RDB-3

RDB-4

16 times 1 channel frequencyresponse vector

16 times 1 received signalvector y

16 times 1 precomputed signalvector s1

16 times 1 precomputed signalvector s2

16 times 1 precomputed signalvector s3

16 times 1 precomputed signalvector s4

h(0)h(1)

z

z(1)

z(2)

z(3)

z(4)

r1

r2

r3r4

Figure 7 Proposed MISO receiver architecture for PCFICH

16 times 1 channel frequency response vectors h(0) and h(1)

16 times 1 received signalvector y

h(0)lowast0 h(0)1 middot middot middot h(0)lowast14 h(0)15 h(1)lowast0 h(1)1 middot middot middot h(1)lowast14 h(1)15

C

C

y0Ylowast1

y4

Ylowast15

z0

z1

z14

z15

Figure 8 Internal architecture of MISO receiver decoding block (RDB)

the SFBC in the LTE-A standard The precoder output atantenna port 0 (119860

0) and antenna port 1 (119860

1) is shown in

Figure 6The notation ldquolowastrdquo represents the complex conjugate of the

symbol Basically in precoding a symbol 1198890from layer 0 and

a symbol 1198891from layer 1 are encoded such that the antenna

output is formulated using the orthogonal matrix given by

119860 = [11988901198891

minus119889lowast

1119889lowast

0

] (7)

VLSI Design 7

Precomputed data

Output vector from RDB1

Im

Re

Im

Re

Im

Re

Im

Re

[middot]2

[middot]2

[middot]2

[middot]2 Output r1

minus

minus

minus

minus

z(1)0

z(1)1

z(1)14

z(1)15

z0 z15z1 middot middot middot z14

Figure 9 Internal architecture of MISO processing block (PB-1)

16 times 1 channel frequency response vectorsh(00)h(01)h(10)h(11)

16 times 1 received signalvector y (0)

y (1)

16 times 1 precomputedsignal vector s(0)1

s(1)1

16 times 1 precomputedsignal vector s(0)2

s(1)2

16 times 1 precomputedsignal vector s(0)3

s(1)3

16 times 1 precomputedsignal vector s(0)4

s(1)4

r1

r2

r3

r4

z

z

(1)

z(2)

z(3)

z(4)

RDBM

RDBM-1

RDBM-2

RDBM-3

RDBM-4

PB-1

PB-2

PB-3

PB-4

CFIdetector

Detected CFI

Figure 10 Proposed MIMO receiver architecture for PCFICH

8 VLSI Design

16 times 1 receivedsignal vector

Y(0)0

Y(0)1

Y(1)lowast0

Y(1)lowast1

Y(0)14

Y(0)15

Y(1)lowast14

Y(1)lowast15

h(00)lowast0 minush(00)1 h(00)lowast14 minush(00)15 h(01)0 h(01)lowast15 h(10)0 minush(10)lowast1 h(10)14 minush(10)lowast15 minush(11)lowast0 h(11)1 minush(11)lowast14 h(11)15

Z0

Z1

Z14

Z15C

C

16 times 1 channel frequency response vectors h(00) h(01) h(10) and h(11)

h(01)lowast1 middot middot middot h(01)

14

Figure 11 Internal architecture receiver decoding block-MIMO (RDBM)

Re(y )

Im(y )

Re(h)

Im(h)

e

f

Figure 12 Multiplicands rearrangement for a single complexmultiplication block

This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as

[119910119894

119910lowast

119894+1

] = [ℎ(0)

119894minusℎ(1)

119894

ℎlowast(1)

119894+1ℎlowast(0)

119894+1

] [119889119894

119889lowast

119894+1

] + [119899119894

119899lowast

119894+1

]

for 119894 = 0 2 4 6 8 10 12 14

(8)

where ℎ(0)119894

represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive

antenna 119889119894is data symbol at 119894th subcarrier and 119899

119894is the

noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (9)

where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2

channel matrix d119894is 2 times 1 complex signal vector and n

119894is

2 times 1 noise vector The objective is to detect the elements119889119894and 119889

119894+1of the data vector d

119894 Assuming that the elements

of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z

119894is given

byz119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (10)

whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as

[119911119894

119911lowast

119894+1

] = [ℎlowast(0)

119894ℎ(1)

119894+1

minusℎlowast(1)

119894ℎ(0)

119894+1

] [119910119894

119910lowast

119894+1

]

for 119894 = 0 2 4 6 8 10 12 14(11)

The elements of decoder output are calculated as

119911119894= ℎlowast(0)

119894119910119894+ ℎ(1)

119894+1119910lowast

119894+1 for 119894 = 0 2 4 6 8 10 12 14

119911lowast

i+1 = minusℎlowast(1)

119894119910119894+ ℎ(0)

119894+1119910lowast

119894+1 for 119894 = 0 2 4 6 8 10 12 14

(12)

The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)

VLSI Design 9

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15Folded unit

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi+1

Imi Im+1

r1

Channel frequency response hi

Received signalvector y i

Precomputeddata si

minus

2l + 1 3 5 7 9 11 13 15

Figure 13 Illustration of folded architecture of RPB in SISO and SIMO

gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z

119894 119894 = 0 2 4 14 are stacked as

16 times 1 vector z = [1199111198790 119911119879

2 119911

119879

14]119879 The 16 times 1 precomputed

data vectors forCFI= 1234 are represented as s1 s2 s3and s4

respectivelyThe detailed structure of receiver decoding blocks (RDB)

is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s

1is the output 119903

1of PB-1 Similarly

1199032 1199033 and 119903

4are computed for CFI = 2 3 and 4 using PB-

2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903

4are applied to the CFI determination circuit

shown in Figure 4 to detect the CFI value

4 System Model and ImplementationArchitecture for MIMO Configuration

In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by

[[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]]

]

=

[[[

[

ℎ00

ℎ01ℎ10 ℎ11ℎlowast

01 minusℎlowast

00ℎlowast

11 minusℎlowast

10

]]]

]

[119889119894

119889lowast

119894+1

] +

[[[[[[[[

[

119899(0)

119894

119899(0)lowast

119894+1

119899(1)

119894

119899(1)lowast

119894+1

]]]]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14(13)

where ℎ119886119887

represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)

119894represents the noise in 119894th subcarrier in 119895th receive

antenna In vector form it is written as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (14)

where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel

frequency response vector at 119894th and (119894 + 1)th subcarrier d119894

is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is

4 times 1 noise vector The objective is to detect the elements 119889119894

and 119889119894+1

of the data vector d119894 Assuming that the elements of

channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by

z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)

whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as

[119911119894

119911lowast

+1

] = [ℎlowast

00minusℎlowast

10

ℎlowast

01minusℎlowast

11

ℎ01

ℎ11

minusℎ00minusℎ10

]

[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14

(16)

The decoder outputs are given by

119911119894= ℎlowast

00119910(0)

119894minus ℎlowast

10119910(0)

119894+1+ ℎ01119910(1)lowast

119894+ ℎ11119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

119911lowast

119894+1= ℎlowast

01119910(0)

119894minus ℎlowast

11119910(0)

119894+1minusℎ00119910(1)lowast

119894minusℎ10119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

(17)

10 VLSI Design

Channel frequencyresponse vector

Channel frequencyresponse vector

Received signalvector

Received signalvector

Precomputeddata vector z(k)i

from RDB-1using s1i

y i

y i

y i+1

y i+1

hi+1 hi+1

hihi

zi

Folded RDB unit

Folded PB unit

Conjugate

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi

Imi+1

Im+1

minus

r1

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO

The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10

Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)

1

s(0)2 s(0)3 and s(0)

4 respectively for antenna 0 and as s(1)

1

s(1)2 s(1)3 and s(1)

4 respectively for antenna 1 The received

signal vectors y(0)119894

and y(1)119894multiply with the four channel

estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911

119894 119894 = 0 2 4 14 are stacked as 16 times 1

vector z = [1199111198790 119911119879

2 119911

119879

14]119879 Similarly RDBM1 gives output

vector z(1) using the precomputed data vectors y(0)1

and y(1)1

and channel estimation vectors The architecture of PBs and

VLSI Design 11

clk

T1 T2 T3 T4

RPB

RPB

RPB

RPB

d

d

dd

d

d

r4

r3

r2

r1

Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879

1to 1198794)

the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903

1of PB1

Similarly 1199032 1199033 and 119903

4are computed for other CFIThe 119903

1 1199032

1199033 and 119903

4are compared to determine the minimum value by

the CFI detector shown in Figure 4

5 PCFICH Receiver Implementation Methods

The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach

51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891

with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by

119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910

(18)

It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as

119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]

minus Re 119910 Im ℎ + Im 119910Re ℎ

119891 = Re 119910 Im ℎ + Im 119910Re ℎ

(19)

Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12

52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time

The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903

1each requiring

twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911

119894 Four switches operating in system clock speed

are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911

119894 The output of the first adder

is passed to the second adder with a delay to accumulate thevalues 119911

0to 11991115into a register in subsequent clock cyclesThis

process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod

The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel

12 VLSI Design

clk

T1 T2 T3 T4 T5 T6

PB

PB

PB

PB

d

d

dd

d

d

dr4

r3

r2

r1

From other hardware lines

From other hardware lines

Complexmultiplications(real part) of a

hardware line from

Complex

multiplications

(imaginary part) of

a hardware line from RDB

RDB

Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)

Figure 17 Simulation waveform for PCFICH receiver

frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle

53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers

This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant

For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant

6 Results and Discussion

The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO

VLSI Design 13

clk clk

clk

clk

clk

clkDiversity[10]

Diversity[1] GND 1 o wide mux 1Address[10] Output[30]

RTL rom 2 4

3

2

1

0

e

e

e

e

Div0

Div1

Div2

Div3

SISO

SIMO

MISO

MIMO

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op

Choose

Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]

e[30]Out1[310]Out2[310]Out3[310]Out4[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Figure 18 RTL schematic for combined PCFICH architecture with diversity

Table 2 Performance of proposed architectures based on folding and superscalar method

Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power

SISODirect 125 245 125 5479 39081 25587 mdash

Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93

SIMODirect 250 494 250 10942 40278 24827 mdash

Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170

MISODirect 224 580 594 14880 43023 23243 mdash

Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208

MIMODirect 320 844 675 17380 56962 17555 mdash

Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382

Figure 19 Resource utilization graph for generalized architecture

configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements

In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time

frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations

A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture

14 VLSI Design

Table 3 Resource requirements of proposed architecture using folding method

Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019

Figure 20 Implemented device in FPGA editor

a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)

Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching

Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used

Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device

7 Conclusion

In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding

method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented

Conflict of Interests

The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests

Acknowledgments

The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully

References

[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012

[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008

[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010

[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008

[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013

[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010

[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in

VLSI Design 15

Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009

[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010

[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998

[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013

[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013

[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 5: Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

VLSI Design 5

Antenna ldquo0rdquo Antenna ldquo1rdquo

Precomputed data vectorsRPB

RPB

RPB

RPB

RPB

RPB

RPB

RPB

CFI detector Detected CFI

Precomputed data vectors

16 times 1 channel frequencyresponse vector h(0)16 times 1 received signalvector y (0)

16 times 1 channel frequencyresponse vector h(1)16 times 1 received signalvector y (1)

r(0)1

r(0)2

r(0)3

r(0)4

r1 r2 r3 r4

CFI(1)1

CFI(1)2

CFI(1)3

CFI(1)4

d(1)

d(2)

d(3)

d(4)

r(1)1

r(1)2

r(1)3

r(1)4

CFI(0)1

CFI(0)2

CFI(0)3

CFI(0)4

d(1)

d(2)

d(3)

d(4)

Figure 5 Basic architecture for SIMO configuration

A0

A1

OFDM subcarriers

d15 d14 d13 d12 d11 d10 d9 d8 d7 d6 d5 d4 d3 d2 d1 d0

dlowast14 minusdlowast15 dlowast12 minusdlowast13 dlowast10 minusdlowast11 dlowast8 minusdlowast9 dlowast6 minusdlowast7 dlowast4 minusdlowast5 dlowast2 minusdlowast3 dlowast0 minusdlowast1

Figure 6 Subcarrier mapping for SFBC for 2-antenna system

The multiplexer control input is activated based on theoutputs from CM-3 and CM-4 One of the four outputs Cr

3

Sr3 Cr4 and Sr

4would be ldquo1rdquo based on the minimum value

of four inputs 1199031 1199032 1199033 and 119903

4 respectively Based on this 00

01 10 or 11 in the multiplexer control unit would be activatedto obtain the detected CFI value

In SIMO the 16times1 receive signal vector at the 119896th receiveantenna is modeled as

y(119896) = h(119896) ∘ d(119898) + w119896 119896 = 0 1 2 119870 minus 1 (5)

where ldquo119870rdquo represents the number of receive antennas at UEh(119896) is 16 times 1 channel frequency response vector between thetransmit antenna and 119896th receive antenna and w

119896is 16 times 1

noise vector at 119896th receive antenna Now the objective is todetect the value ofCFI from the received signal vectors at eachreceive antenna assuming the channel frequency responsevectors at each receive antenna are knownThemaximal ratiocombining is carried out at the receiver Using maximumlikelihood (ML) principle CFI is estimated as [9]

CFI = min119898=1234

119870

sum

119896=1

10038171003817100381710038171003817y(119896) minus (h(119896) ∘ d(119898))10038171003817100381710038171003817

2

(6)

The basic architecture for estimating CFI using (6) in1 times 2 SIMO configuration shown in Figure 5 is similar to thebasic architecture of SISO configuration The received signalvector y(119896) and the channel frequency response vector h(119896)are provided as input to the four receiver processing blocks(RPB-CFI(119896)

119898) at 119896th receive antenna along with precomputed

data vectors d(1) d(2) d(3) and d(4)The outputs from themthRPB at 0th receive antenna 119903(0)

119898and 1st receive antenna 119903(1)

119898are

added to get themth input 119903119898of the CFI detector circuit

3 System Model and ImplementationArchitecture for MISO Configuration

In MISO and MIMO configurations space frequency blockcode (SFBC) based layer mapping and precoding are carriedout to obtain transmit diversity when two or more antennaports are used at eNodeB as per the 3GPP LTE wirelessstandard [1 11] It is assumed that 2 antenna ports are usedat eNodeB The 16 times 1 complex symbol vector output ofthe modulation mapper is applied to the layer mapper The8 times 1 symbol vectors at layer 0 and layer 1 are given by[1198890 1198892 1198894 11988961198898 11988910 11988912 and 119889

14] and [119889

1 1198893 1198895 1198897

1198899 11988911 11988913 and 119889

15] The precoding is carried out using

6 VLSI Design

PB-4

Detected CFI

PB-1

PB-2

PB-3

RDB-1

CFI detector

RDB

RDB-2

RDB-3

RDB-4

16 times 1 channel frequencyresponse vector

16 times 1 received signalvector y

16 times 1 precomputed signalvector s1

16 times 1 precomputed signalvector s2

16 times 1 precomputed signalvector s3

16 times 1 precomputed signalvector s4

h(0)h(1)

z

z(1)

z(2)

z(3)

z(4)

r1

r2

r3r4

Figure 7 Proposed MISO receiver architecture for PCFICH

16 times 1 channel frequency response vectors h(0) and h(1)

16 times 1 received signalvector y

h(0)lowast0 h(0)1 middot middot middot h(0)lowast14 h(0)15 h(1)lowast0 h(1)1 middot middot middot h(1)lowast14 h(1)15

C

C

y0Ylowast1

y4

Ylowast15

z0

z1

z14

z15

Figure 8 Internal architecture of MISO receiver decoding block (RDB)

the SFBC in the LTE-A standard The precoder output atantenna port 0 (119860

0) and antenna port 1 (119860

1) is shown in

Figure 6The notation ldquolowastrdquo represents the complex conjugate of the

symbol Basically in precoding a symbol 1198890from layer 0 and

a symbol 1198891from layer 1 are encoded such that the antenna

output is formulated using the orthogonal matrix given by

119860 = [11988901198891

minus119889lowast

1119889lowast

0

] (7)

VLSI Design 7

Precomputed data

Output vector from RDB1

Im

Re

Im

Re

Im

Re

Im

Re

[middot]2

[middot]2

[middot]2

[middot]2 Output r1

minus

minus

minus

minus

z(1)0

z(1)1

z(1)14

z(1)15

z0 z15z1 middot middot middot z14

Figure 9 Internal architecture of MISO processing block (PB-1)

16 times 1 channel frequency response vectorsh(00)h(01)h(10)h(11)

16 times 1 received signalvector y (0)

y (1)

16 times 1 precomputedsignal vector s(0)1

s(1)1

16 times 1 precomputedsignal vector s(0)2

s(1)2

16 times 1 precomputedsignal vector s(0)3

s(1)3

16 times 1 precomputedsignal vector s(0)4

s(1)4

r1

r2

r3

r4

z

z

(1)

z(2)

z(3)

z(4)

RDBM

RDBM-1

RDBM-2

RDBM-3

RDBM-4

PB-1

PB-2

PB-3

PB-4

CFIdetector

Detected CFI

Figure 10 Proposed MIMO receiver architecture for PCFICH

8 VLSI Design

16 times 1 receivedsignal vector

Y(0)0

Y(0)1

Y(1)lowast0

Y(1)lowast1

Y(0)14

Y(0)15

Y(1)lowast14

Y(1)lowast15

h(00)lowast0 minush(00)1 h(00)lowast14 minush(00)15 h(01)0 h(01)lowast15 h(10)0 minush(10)lowast1 h(10)14 minush(10)lowast15 minush(11)lowast0 h(11)1 minush(11)lowast14 h(11)15

Z0

Z1

Z14

Z15C

C

16 times 1 channel frequency response vectors h(00) h(01) h(10) and h(11)

h(01)lowast1 middot middot middot h(01)

14

Figure 11 Internal architecture receiver decoding block-MIMO (RDBM)

Re(y )

Im(y )

Re(h)

Im(h)

e

f

Figure 12 Multiplicands rearrangement for a single complexmultiplication block

This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as

[119910119894

119910lowast

119894+1

] = [ℎ(0)

119894minusℎ(1)

119894

ℎlowast(1)

119894+1ℎlowast(0)

119894+1

] [119889119894

119889lowast

119894+1

] + [119899119894

119899lowast

119894+1

]

for 119894 = 0 2 4 6 8 10 12 14

(8)

where ℎ(0)119894

represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive

antenna 119889119894is data symbol at 119894th subcarrier and 119899

119894is the

noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (9)

where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2

channel matrix d119894is 2 times 1 complex signal vector and n

119894is

2 times 1 noise vector The objective is to detect the elements119889119894and 119889

119894+1of the data vector d

119894 Assuming that the elements

of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z

119894is given

byz119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (10)

whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as

[119911119894

119911lowast

119894+1

] = [ℎlowast(0)

119894ℎ(1)

119894+1

minusℎlowast(1)

119894ℎ(0)

119894+1

] [119910119894

119910lowast

119894+1

]

for 119894 = 0 2 4 6 8 10 12 14(11)

The elements of decoder output are calculated as

119911119894= ℎlowast(0)

119894119910119894+ ℎ(1)

119894+1119910lowast

119894+1 for 119894 = 0 2 4 6 8 10 12 14

119911lowast

i+1 = minusℎlowast(1)

119894119910119894+ ℎ(0)

119894+1119910lowast

119894+1 for 119894 = 0 2 4 6 8 10 12 14

(12)

The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)

VLSI Design 9

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15Folded unit

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi+1

Imi Im+1

r1

Channel frequency response hi

Received signalvector y i

Precomputeddata si

minus

2l + 1 3 5 7 9 11 13 15

Figure 13 Illustration of folded architecture of RPB in SISO and SIMO

gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z

119894 119894 = 0 2 4 14 are stacked as

16 times 1 vector z = [1199111198790 119911119879

2 119911

119879

14]119879 The 16 times 1 precomputed

data vectors forCFI= 1234 are represented as s1 s2 s3and s4

respectivelyThe detailed structure of receiver decoding blocks (RDB)

is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s

1is the output 119903

1of PB-1 Similarly

1199032 1199033 and 119903

4are computed for CFI = 2 3 and 4 using PB-

2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903

4are applied to the CFI determination circuit

shown in Figure 4 to detect the CFI value

4 System Model and ImplementationArchitecture for MIMO Configuration

In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by

[[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]]

]

=

[[[

[

ℎ00

ℎ01ℎ10 ℎ11ℎlowast

01 minusℎlowast

00ℎlowast

11 minusℎlowast

10

]]]

]

[119889119894

119889lowast

119894+1

] +

[[[[[[[[

[

119899(0)

119894

119899(0)lowast

119894+1

119899(1)

119894

119899(1)lowast

119894+1

]]]]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14(13)

where ℎ119886119887

represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)

119894represents the noise in 119894th subcarrier in 119895th receive

antenna In vector form it is written as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (14)

where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel

frequency response vector at 119894th and (119894 + 1)th subcarrier d119894

is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is

4 times 1 noise vector The objective is to detect the elements 119889119894

and 119889119894+1

of the data vector d119894 Assuming that the elements of

channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by

z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)

whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as

[119911119894

119911lowast

+1

] = [ℎlowast

00minusℎlowast

10

ℎlowast

01minusℎlowast

11

ℎ01

ℎ11

minusℎ00minusℎ10

]

[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14

(16)

The decoder outputs are given by

119911119894= ℎlowast

00119910(0)

119894minus ℎlowast

10119910(0)

119894+1+ ℎ01119910(1)lowast

119894+ ℎ11119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

119911lowast

119894+1= ℎlowast

01119910(0)

119894minus ℎlowast

11119910(0)

119894+1minusℎ00119910(1)lowast

119894minusℎ10119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

(17)

10 VLSI Design

Channel frequencyresponse vector

Channel frequencyresponse vector

Received signalvector

Received signalvector

Precomputeddata vector z(k)i

from RDB-1using s1i

y i

y i

y i+1

y i+1

hi+1 hi+1

hihi

zi

Folded RDB unit

Folded PB unit

Conjugate

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi

Imi+1

Im+1

minus

r1

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO

The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10

Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)

1

s(0)2 s(0)3 and s(0)

4 respectively for antenna 0 and as s(1)

1

s(1)2 s(1)3 and s(1)

4 respectively for antenna 1 The received

signal vectors y(0)119894

and y(1)119894multiply with the four channel

estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911

119894 119894 = 0 2 4 14 are stacked as 16 times 1

vector z = [1199111198790 119911119879

2 119911

119879

14]119879 Similarly RDBM1 gives output

vector z(1) using the precomputed data vectors y(0)1

and y(1)1

and channel estimation vectors The architecture of PBs and

VLSI Design 11

clk

T1 T2 T3 T4

RPB

RPB

RPB

RPB

d

d

dd

d

d

r4

r3

r2

r1

Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879

1to 1198794)

the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903

1of PB1

Similarly 1199032 1199033 and 119903

4are computed for other CFIThe 119903

1 1199032

1199033 and 119903

4are compared to determine the minimum value by

the CFI detector shown in Figure 4

5 PCFICH Receiver Implementation Methods

The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach

51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891

with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by

119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910

(18)

It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as

119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]

minus Re 119910 Im ℎ + Im 119910Re ℎ

119891 = Re 119910 Im ℎ + Im 119910Re ℎ

(19)

Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12

52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time

The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903

1each requiring

twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911

119894 Four switches operating in system clock speed

are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911

119894 The output of the first adder

is passed to the second adder with a delay to accumulate thevalues 119911

0to 11991115into a register in subsequent clock cyclesThis

process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod

The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel

12 VLSI Design

clk

T1 T2 T3 T4 T5 T6

PB

PB

PB

PB

d

d

dd

d

d

dr4

r3

r2

r1

From other hardware lines

From other hardware lines

Complexmultiplications(real part) of a

hardware line from

Complex

multiplications

(imaginary part) of

a hardware line from RDB

RDB

Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)

Figure 17 Simulation waveform for PCFICH receiver

frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle

53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers

This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant

For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant

6 Results and Discussion

The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO

VLSI Design 13

clk clk

clk

clk

clk

clkDiversity[10]

Diversity[1] GND 1 o wide mux 1Address[10] Output[30]

RTL rom 2 4

3

2

1

0

e

e

e

e

Div0

Div1

Div2

Div3

SISO

SIMO

MISO

MIMO

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op

Choose

Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]

e[30]Out1[310]Out2[310]Out3[310]Out4[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Figure 18 RTL schematic for combined PCFICH architecture with diversity

Table 2 Performance of proposed architectures based on folding and superscalar method

Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power

SISODirect 125 245 125 5479 39081 25587 mdash

Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93

SIMODirect 250 494 250 10942 40278 24827 mdash

Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170

MISODirect 224 580 594 14880 43023 23243 mdash

Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208

MIMODirect 320 844 675 17380 56962 17555 mdash

Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382

Figure 19 Resource utilization graph for generalized architecture

configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements

In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time

frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations

A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture

14 VLSI Design

Table 3 Resource requirements of proposed architecture using folding method

Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019

Figure 20 Implemented device in FPGA editor

a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)

Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching

Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used

Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device

7 Conclusion

In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding

method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented

Conflict of Interests

The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests

Acknowledgments

The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully

References

[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012

[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008

[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010

[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008

[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013

[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010

[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in

VLSI Design 15

Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009

[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010

[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998

[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013

[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013

[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 6: Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

6 VLSI Design

PB-4

Detected CFI

PB-1

PB-2

PB-3

RDB-1

CFI detector

RDB

RDB-2

RDB-3

RDB-4

16 times 1 channel frequencyresponse vector

16 times 1 received signalvector y

16 times 1 precomputed signalvector s1

16 times 1 precomputed signalvector s2

16 times 1 precomputed signalvector s3

16 times 1 precomputed signalvector s4

h(0)h(1)

z

z(1)

z(2)

z(3)

z(4)

r1

r2

r3r4

Figure 7 Proposed MISO receiver architecture for PCFICH

16 times 1 channel frequency response vectors h(0) and h(1)

16 times 1 received signalvector y

h(0)lowast0 h(0)1 middot middot middot h(0)lowast14 h(0)15 h(1)lowast0 h(1)1 middot middot middot h(1)lowast14 h(1)15

C

C

y0Ylowast1

y4

Ylowast15

z0

z1

z14

z15

Figure 8 Internal architecture of MISO receiver decoding block (RDB)

the SFBC in the LTE-A standard The precoder output atantenna port 0 (119860

0) and antenna port 1 (119860

1) is shown in

Figure 6The notation ldquolowastrdquo represents the complex conjugate of the

symbol Basically in precoding a symbol 1198890from layer 0 and

a symbol 1198891from layer 1 are encoded such that the antenna

output is formulated using the orthogonal matrix given by

119860 = [11988901198891

minus119889lowast

1119889lowast

0

] (7)

VLSI Design 7

Precomputed data

Output vector from RDB1

Im

Re

Im

Re

Im

Re

Im

Re

[middot]2

[middot]2

[middot]2

[middot]2 Output r1

minus

minus

minus

minus

z(1)0

z(1)1

z(1)14

z(1)15

z0 z15z1 middot middot middot z14

Figure 9 Internal architecture of MISO processing block (PB-1)

16 times 1 channel frequency response vectorsh(00)h(01)h(10)h(11)

16 times 1 received signalvector y (0)

y (1)

16 times 1 precomputedsignal vector s(0)1

s(1)1

16 times 1 precomputedsignal vector s(0)2

s(1)2

16 times 1 precomputedsignal vector s(0)3

s(1)3

16 times 1 precomputedsignal vector s(0)4

s(1)4

r1

r2

r3

r4

z

z

(1)

z(2)

z(3)

z(4)

RDBM

RDBM-1

RDBM-2

RDBM-3

RDBM-4

PB-1

PB-2

PB-3

PB-4

CFIdetector

Detected CFI

Figure 10 Proposed MIMO receiver architecture for PCFICH

8 VLSI Design

16 times 1 receivedsignal vector

Y(0)0

Y(0)1

Y(1)lowast0

Y(1)lowast1

Y(0)14

Y(0)15

Y(1)lowast14

Y(1)lowast15

h(00)lowast0 minush(00)1 h(00)lowast14 minush(00)15 h(01)0 h(01)lowast15 h(10)0 minush(10)lowast1 h(10)14 minush(10)lowast15 minush(11)lowast0 h(11)1 minush(11)lowast14 h(11)15

Z0

Z1

Z14

Z15C

C

16 times 1 channel frequency response vectors h(00) h(01) h(10) and h(11)

h(01)lowast1 middot middot middot h(01)

14

Figure 11 Internal architecture receiver decoding block-MIMO (RDBM)

Re(y )

Im(y )

Re(h)

Im(h)

e

f

Figure 12 Multiplicands rearrangement for a single complexmultiplication block

This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as

[119910119894

119910lowast

119894+1

] = [ℎ(0)

119894minusℎ(1)

119894

ℎlowast(1)

119894+1ℎlowast(0)

119894+1

] [119889119894

119889lowast

119894+1

] + [119899119894

119899lowast

119894+1

]

for 119894 = 0 2 4 6 8 10 12 14

(8)

where ℎ(0)119894

represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive

antenna 119889119894is data symbol at 119894th subcarrier and 119899

119894is the

noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (9)

where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2

channel matrix d119894is 2 times 1 complex signal vector and n

119894is

2 times 1 noise vector The objective is to detect the elements119889119894and 119889

119894+1of the data vector d

119894 Assuming that the elements

of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z

119894is given

byz119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (10)

whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as

[119911119894

119911lowast

119894+1

] = [ℎlowast(0)

119894ℎ(1)

119894+1

minusℎlowast(1)

119894ℎ(0)

119894+1

] [119910119894

119910lowast

119894+1

]

for 119894 = 0 2 4 6 8 10 12 14(11)

The elements of decoder output are calculated as

119911119894= ℎlowast(0)

119894119910119894+ ℎ(1)

119894+1119910lowast

119894+1 for 119894 = 0 2 4 6 8 10 12 14

119911lowast

i+1 = minusℎlowast(1)

119894119910119894+ ℎ(0)

119894+1119910lowast

119894+1 for 119894 = 0 2 4 6 8 10 12 14

(12)

The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)

VLSI Design 9

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15Folded unit

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi+1

Imi Im+1

r1

Channel frequency response hi

Received signalvector y i

Precomputeddata si

minus

2l + 1 3 5 7 9 11 13 15

Figure 13 Illustration of folded architecture of RPB in SISO and SIMO

gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z

119894 119894 = 0 2 4 14 are stacked as

16 times 1 vector z = [1199111198790 119911119879

2 119911

119879

14]119879 The 16 times 1 precomputed

data vectors forCFI= 1234 are represented as s1 s2 s3and s4

respectivelyThe detailed structure of receiver decoding blocks (RDB)

is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s

1is the output 119903

1of PB-1 Similarly

1199032 1199033 and 119903

4are computed for CFI = 2 3 and 4 using PB-

2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903

4are applied to the CFI determination circuit

shown in Figure 4 to detect the CFI value

4 System Model and ImplementationArchitecture for MIMO Configuration

In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by

[[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]]

]

=

[[[

[

ℎ00

ℎ01ℎ10 ℎ11ℎlowast

01 minusℎlowast

00ℎlowast

11 minusℎlowast

10

]]]

]

[119889119894

119889lowast

119894+1

] +

[[[[[[[[

[

119899(0)

119894

119899(0)lowast

119894+1

119899(1)

119894

119899(1)lowast

119894+1

]]]]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14(13)

where ℎ119886119887

represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)

119894represents the noise in 119894th subcarrier in 119895th receive

antenna In vector form it is written as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (14)

where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel

frequency response vector at 119894th and (119894 + 1)th subcarrier d119894

is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is

4 times 1 noise vector The objective is to detect the elements 119889119894

and 119889119894+1

of the data vector d119894 Assuming that the elements of

channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by

z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)

whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as

[119911119894

119911lowast

+1

] = [ℎlowast

00minusℎlowast

10

ℎlowast

01minusℎlowast

11

ℎ01

ℎ11

minusℎ00minusℎ10

]

[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14

(16)

The decoder outputs are given by

119911119894= ℎlowast

00119910(0)

119894minus ℎlowast

10119910(0)

119894+1+ ℎ01119910(1)lowast

119894+ ℎ11119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

119911lowast

119894+1= ℎlowast

01119910(0)

119894minus ℎlowast

11119910(0)

119894+1minusℎ00119910(1)lowast

119894minusℎ10119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

(17)

10 VLSI Design

Channel frequencyresponse vector

Channel frequencyresponse vector

Received signalvector

Received signalvector

Precomputeddata vector z(k)i

from RDB-1using s1i

y i

y i

y i+1

y i+1

hi+1 hi+1

hihi

zi

Folded RDB unit

Folded PB unit

Conjugate

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi

Imi+1

Im+1

minus

r1

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO

The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10

Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)

1

s(0)2 s(0)3 and s(0)

4 respectively for antenna 0 and as s(1)

1

s(1)2 s(1)3 and s(1)

4 respectively for antenna 1 The received

signal vectors y(0)119894

and y(1)119894multiply with the four channel

estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911

119894 119894 = 0 2 4 14 are stacked as 16 times 1

vector z = [1199111198790 119911119879

2 119911

119879

14]119879 Similarly RDBM1 gives output

vector z(1) using the precomputed data vectors y(0)1

and y(1)1

and channel estimation vectors The architecture of PBs and

VLSI Design 11

clk

T1 T2 T3 T4

RPB

RPB

RPB

RPB

d

d

dd

d

d

r4

r3

r2

r1

Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879

1to 1198794)

the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903

1of PB1

Similarly 1199032 1199033 and 119903

4are computed for other CFIThe 119903

1 1199032

1199033 and 119903

4are compared to determine the minimum value by

the CFI detector shown in Figure 4

5 PCFICH Receiver Implementation Methods

The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach

51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891

with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by

119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910

(18)

It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as

119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]

minus Re 119910 Im ℎ + Im 119910Re ℎ

119891 = Re 119910 Im ℎ + Im 119910Re ℎ

(19)

Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12

52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time

The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903

1each requiring

twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911

119894 Four switches operating in system clock speed

are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911

119894 The output of the first adder

is passed to the second adder with a delay to accumulate thevalues 119911

0to 11991115into a register in subsequent clock cyclesThis

process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod

The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel

12 VLSI Design

clk

T1 T2 T3 T4 T5 T6

PB

PB

PB

PB

d

d

dd

d

d

dr4

r3

r2

r1

From other hardware lines

From other hardware lines

Complexmultiplications(real part) of a

hardware line from

Complex

multiplications

(imaginary part) of

a hardware line from RDB

RDB

Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)

Figure 17 Simulation waveform for PCFICH receiver

frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle

53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers

This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant

For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant

6 Results and Discussion

The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO

VLSI Design 13

clk clk

clk

clk

clk

clkDiversity[10]

Diversity[1] GND 1 o wide mux 1Address[10] Output[30]

RTL rom 2 4

3

2

1

0

e

e

e

e

Div0

Div1

Div2

Div3

SISO

SIMO

MISO

MIMO

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op

Choose

Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]

e[30]Out1[310]Out2[310]Out3[310]Out4[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Figure 18 RTL schematic for combined PCFICH architecture with diversity

Table 2 Performance of proposed architectures based on folding and superscalar method

Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power

SISODirect 125 245 125 5479 39081 25587 mdash

Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93

SIMODirect 250 494 250 10942 40278 24827 mdash

Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170

MISODirect 224 580 594 14880 43023 23243 mdash

Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208

MIMODirect 320 844 675 17380 56962 17555 mdash

Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382

Figure 19 Resource utilization graph for generalized architecture

configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements

In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time

frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations

A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture

14 VLSI Design

Table 3 Resource requirements of proposed architecture using folding method

Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019

Figure 20 Implemented device in FPGA editor

a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)

Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching

Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used

Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device

7 Conclusion

In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding

method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented

Conflict of Interests

The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests

Acknowledgments

The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully

References

[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012

[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008

[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010

[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008

[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013

[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010

[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in

VLSI Design 15

Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009

[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010

[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998

[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013

[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013

[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 7: Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

VLSI Design 7

Precomputed data

Output vector from RDB1

Im

Re

Im

Re

Im

Re

Im

Re

[middot]2

[middot]2

[middot]2

[middot]2 Output r1

minus

minus

minus

minus

z(1)0

z(1)1

z(1)14

z(1)15

z0 z15z1 middot middot middot z14

Figure 9 Internal architecture of MISO processing block (PB-1)

16 times 1 channel frequency response vectorsh(00)h(01)h(10)h(11)

16 times 1 received signalvector y (0)

y (1)

16 times 1 precomputedsignal vector s(0)1

s(1)1

16 times 1 precomputedsignal vector s(0)2

s(1)2

16 times 1 precomputedsignal vector s(0)3

s(1)3

16 times 1 precomputedsignal vector s(0)4

s(1)4

r1

r2

r3

r4

z

z

(1)

z(2)

z(3)

z(4)

RDBM

RDBM-1

RDBM-2

RDBM-3

RDBM-4

PB-1

PB-2

PB-3

PB-4

CFIdetector

Detected CFI

Figure 10 Proposed MIMO receiver architecture for PCFICH

8 VLSI Design

16 times 1 receivedsignal vector

Y(0)0

Y(0)1

Y(1)lowast0

Y(1)lowast1

Y(0)14

Y(0)15

Y(1)lowast14

Y(1)lowast15

h(00)lowast0 minush(00)1 h(00)lowast14 minush(00)15 h(01)0 h(01)lowast15 h(10)0 minush(10)lowast1 h(10)14 minush(10)lowast15 minush(11)lowast0 h(11)1 minush(11)lowast14 h(11)15

Z0

Z1

Z14

Z15C

C

16 times 1 channel frequency response vectors h(00) h(01) h(10) and h(11)

h(01)lowast1 middot middot middot h(01)

14

Figure 11 Internal architecture receiver decoding block-MIMO (RDBM)

Re(y )

Im(y )

Re(h)

Im(h)

e

f

Figure 12 Multiplicands rearrangement for a single complexmultiplication block

This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as

[119910119894

119910lowast

119894+1

] = [ℎ(0)

119894minusℎ(1)

119894

ℎlowast(1)

119894+1ℎlowast(0)

119894+1

] [119889119894

119889lowast

119894+1

] + [119899119894

119899lowast

119894+1

]

for 119894 = 0 2 4 6 8 10 12 14

(8)

where ℎ(0)119894

represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive

antenna 119889119894is data symbol at 119894th subcarrier and 119899

119894is the

noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (9)

where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2

channel matrix d119894is 2 times 1 complex signal vector and n

119894is

2 times 1 noise vector The objective is to detect the elements119889119894and 119889

119894+1of the data vector d

119894 Assuming that the elements

of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z

119894is given

byz119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (10)

whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as

[119911119894

119911lowast

119894+1

] = [ℎlowast(0)

119894ℎ(1)

119894+1

minusℎlowast(1)

119894ℎ(0)

119894+1

] [119910119894

119910lowast

119894+1

]

for 119894 = 0 2 4 6 8 10 12 14(11)

The elements of decoder output are calculated as

119911119894= ℎlowast(0)

119894119910119894+ ℎ(1)

119894+1119910lowast

119894+1 for 119894 = 0 2 4 6 8 10 12 14

119911lowast

i+1 = minusℎlowast(1)

119894119910119894+ ℎ(0)

119894+1119910lowast

119894+1 for 119894 = 0 2 4 6 8 10 12 14

(12)

The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)

VLSI Design 9

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15Folded unit

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi+1

Imi Im+1

r1

Channel frequency response hi

Received signalvector y i

Precomputeddata si

minus

2l + 1 3 5 7 9 11 13 15

Figure 13 Illustration of folded architecture of RPB in SISO and SIMO

gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z

119894 119894 = 0 2 4 14 are stacked as

16 times 1 vector z = [1199111198790 119911119879

2 119911

119879

14]119879 The 16 times 1 precomputed

data vectors forCFI= 1234 are represented as s1 s2 s3and s4

respectivelyThe detailed structure of receiver decoding blocks (RDB)

is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s

1is the output 119903

1of PB-1 Similarly

1199032 1199033 and 119903

4are computed for CFI = 2 3 and 4 using PB-

2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903

4are applied to the CFI determination circuit

shown in Figure 4 to detect the CFI value

4 System Model and ImplementationArchitecture for MIMO Configuration

In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by

[[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]]

]

=

[[[

[

ℎ00

ℎ01ℎ10 ℎ11ℎlowast

01 minusℎlowast

00ℎlowast

11 minusℎlowast

10

]]]

]

[119889119894

119889lowast

119894+1

] +

[[[[[[[[

[

119899(0)

119894

119899(0)lowast

119894+1

119899(1)

119894

119899(1)lowast

119894+1

]]]]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14(13)

where ℎ119886119887

represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)

119894represents the noise in 119894th subcarrier in 119895th receive

antenna In vector form it is written as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (14)

where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel

frequency response vector at 119894th and (119894 + 1)th subcarrier d119894

is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is

4 times 1 noise vector The objective is to detect the elements 119889119894

and 119889119894+1

of the data vector d119894 Assuming that the elements of

channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by

z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)

whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as

[119911119894

119911lowast

+1

] = [ℎlowast

00minusℎlowast

10

ℎlowast

01minusℎlowast

11

ℎ01

ℎ11

minusℎ00minusℎ10

]

[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14

(16)

The decoder outputs are given by

119911119894= ℎlowast

00119910(0)

119894minus ℎlowast

10119910(0)

119894+1+ ℎ01119910(1)lowast

119894+ ℎ11119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

119911lowast

119894+1= ℎlowast

01119910(0)

119894minus ℎlowast

11119910(0)

119894+1minusℎ00119910(1)lowast

119894minusℎ10119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

(17)

10 VLSI Design

Channel frequencyresponse vector

Channel frequencyresponse vector

Received signalvector

Received signalvector

Precomputeddata vector z(k)i

from RDB-1using s1i

y i

y i

y i+1

y i+1

hi+1 hi+1

hihi

zi

Folded RDB unit

Folded PB unit

Conjugate

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi

Imi+1

Im+1

minus

r1

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO

The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10

Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)

1

s(0)2 s(0)3 and s(0)

4 respectively for antenna 0 and as s(1)

1

s(1)2 s(1)3 and s(1)

4 respectively for antenna 1 The received

signal vectors y(0)119894

and y(1)119894multiply with the four channel

estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911

119894 119894 = 0 2 4 14 are stacked as 16 times 1

vector z = [1199111198790 119911119879

2 119911

119879

14]119879 Similarly RDBM1 gives output

vector z(1) using the precomputed data vectors y(0)1

and y(1)1

and channel estimation vectors The architecture of PBs and

VLSI Design 11

clk

T1 T2 T3 T4

RPB

RPB

RPB

RPB

d

d

dd

d

d

r4

r3

r2

r1

Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879

1to 1198794)

the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903

1of PB1

Similarly 1199032 1199033 and 119903

4are computed for other CFIThe 119903

1 1199032

1199033 and 119903

4are compared to determine the minimum value by

the CFI detector shown in Figure 4

5 PCFICH Receiver Implementation Methods

The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach

51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891

with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by

119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910

(18)

It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as

119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]

minus Re 119910 Im ℎ + Im 119910Re ℎ

119891 = Re 119910 Im ℎ + Im 119910Re ℎ

(19)

Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12

52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time

The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903

1each requiring

twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911

119894 Four switches operating in system clock speed

are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911

119894 The output of the first adder

is passed to the second adder with a delay to accumulate thevalues 119911

0to 11991115into a register in subsequent clock cyclesThis

process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod

The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel

12 VLSI Design

clk

T1 T2 T3 T4 T5 T6

PB

PB

PB

PB

d

d

dd

d

d

dr4

r3

r2

r1

From other hardware lines

From other hardware lines

Complexmultiplications(real part) of a

hardware line from

Complex

multiplications

(imaginary part) of

a hardware line from RDB

RDB

Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)

Figure 17 Simulation waveform for PCFICH receiver

frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle

53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers

This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant

For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant

6 Results and Discussion

The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO

VLSI Design 13

clk clk

clk

clk

clk

clkDiversity[10]

Diversity[1] GND 1 o wide mux 1Address[10] Output[30]

RTL rom 2 4

3

2

1

0

e

e

e

e

Div0

Div1

Div2

Div3

SISO

SIMO

MISO

MIMO

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op

Choose

Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]

e[30]Out1[310]Out2[310]Out3[310]Out4[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Figure 18 RTL schematic for combined PCFICH architecture with diversity

Table 2 Performance of proposed architectures based on folding and superscalar method

Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power

SISODirect 125 245 125 5479 39081 25587 mdash

Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93

SIMODirect 250 494 250 10942 40278 24827 mdash

Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170

MISODirect 224 580 594 14880 43023 23243 mdash

Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208

MIMODirect 320 844 675 17380 56962 17555 mdash

Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382

Figure 19 Resource utilization graph for generalized architecture

configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements

In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time

frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations

A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture

14 VLSI Design

Table 3 Resource requirements of proposed architecture using folding method

Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019

Figure 20 Implemented device in FPGA editor

a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)

Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching

Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used

Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device

7 Conclusion

In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding

method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented

Conflict of Interests

The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests

Acknowledgments

The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully

References

[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012

[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008

[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010

[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008

[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013

[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010

[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in

VLSI Design 15

Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009

[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010

[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998

[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013

[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013

[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 8: Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

8 VLSI Design

16 times 1 receivedsignal vector

Y(0)0

Y(0)1

Y(1)lowast0

Y(1)lowast1

Y(0)14

Y(0)15

Y(1)lowast14

Y(1)lowast15

h(00)lowast0 minush(00)1 h(00)lowast14 minush(00)15 h(01)0 h(01)lowast15 h(10)0 minush(10)lowast1 h(10)14 minush(10)lowast15 minush(11)lowast0 h(11)1 minush(11)lowast14 h(11)15

Z0

Z1

Z14

Z15C

C

16 times 1 channel frequency response vectors h(00) h(01) h(10) and h(11)

h(01)lowast1 middot middot middot h(01)

14

Figure 11 Internal architecture receiver decoding block-MIMO (RDBM)

Re(y )

Im(y )

Re(h)

Im(h)

e

f

Figure 12 Multiplicands rearrangement for a single complexmultiplication block

This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as

[119910119894

119910lowast

119894+1

] = [ℎ(0)

119894minusℎ(1)

119894

ℎlowast(1)

119894+1ℎlowast(0)

119894+1

] [119889119894

119889lowast

119894+1

] + [119899119894

119899lowast

119894+1

]

for 119894 = 0 2 4 6 8 10 12 14

(8)

where ℎ(0)119894

represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive

antenna 119889119894is data symbol at 119894th subcarrier and 119899

119894is the

noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (9)

where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2

channel matrix d119894is 2 times 1 complex signal vector and n

119894is

2 times 1 noise vector The objective is to detect the elements119889119894and 119889

119894+1of the data vector d

119894 Assuming that the elements

of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z

119894is given

byz119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (10)

whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as

[119911119894

119911lowast

119894+1

] = [ℎlowast(0)

119894ℎ(1)

119894+1

minusℎlowast(1)

119894ℎ(0)

119894+1

] [119910119894

119910lowast

119894+1

]

for 119894 = 0 2 4 6 8 10 12 14(11)

The elements of decoder output are calculated as

119911119894= ℎlowast(0)

119894119910119894+ ℎ(1)

119894+1119910lowast

119894+1 for 119894 = 0 2 4 6 8 10 12 14

119911lowast

i+1 = minusℎlowast(1)

119894119910119894+ ℎ(0)

119894+1119910lowast

119894+1 for 119894 = 0 2 4 6 8 10 12 14

(12)

The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)

VLSI Design 9

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15Folded unit

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi+1

Imi Im+1

r1

Channel frequency response hi

Received signalvector y i

Precomputeddata si

minus

2l + 1 3 5 7 9 11 13 15

Figure 13 Illustration of folded architecture of RPB in SISO and SIMO

gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z

119894 119894 = 0 2 4 14 are stacked as

16 times 1 vector z = [1199111198790 119911119879

2 119911

119879

14]119879 The 16 times 1 precomputed

data vectors forCFI= 1234 are represented as s1 s2 s3and s4

respectivelyThe detailed structure of receiver decoding blocks (RDB)

is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s

1is the output 119903

1of PB-1 Similarly

1199032 1199033 and 119903

4are computed for CFI = 2 3 and 4 using PB-

2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903

4are applied to the CFI determination circuit

shown in Figure 4 to detect the CFI value

4 System Model and ImplementationArchitecture for MIMO Configuration

In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by

[[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]]

]

=

[[[

[

ℎ00

ℎ01ℎ10 ℎ11ℎlowast

01 minusℎlowast

00ℎlowast

11 minusℎlowast

10

]]]

]

[119889119894

119889lowast

119894+1

] +

[[[[[[[[

[

119899(0)

119894

119899(0)lowast

119894+1

119899(1)

119894

119899(1)lowast

119894+1

]]]]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14(13)

where ℎ119886119887

represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)

119894represents the noise in 119894th subcarrier in 119895th receive

antenna In vector form it is written as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (14)

where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel

frequency response vector at 119894th and (119894 + 1)th subcarrier d119894

is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is

4 times 1 noise vector The objective is to detect the elements 119889119894

and 119889119894+1

of the data vector d119894 Assuming that the elements of

channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by

z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)

whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as

[119911119894

119911lowast

+1

] = [ℎlowast

00minusℎlowast

10

ℎlowast

01minusℎlowast

11

ℎ01

ℎ11

minusℎ00minusℎ10

]

[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14

(16)

The decoder outputs are given by

119911119894= ℎlowast

00119910(0)

119894minus ℎlowast

10119910(0)

119894+1+ ℎ01119910(1)lowast

119894+ ℎ11119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

119911lowast

119894+1= ℎlowast

01119910(0)

119894minus ℎlowast

11119910(0)

119894+1minusℎ00119910(1)lowast

119894minusℎ10119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

(17)

10 VLSI Design

Channel frequencyresponse vector

Channel frequencyresponse vector

Received signalvector

Received signalvector

Precomputeddata vector z(k)i

from RDB-1using s1i

y i

y i

y i+1

y i+1

hi+1 hi+1

hihi

zi

Folded RDB unit

Folded PB unit

Conjugate

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi

Imi+1

Im+1

minus

r1

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO

The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10

Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)

1

s(0)2 s(0)3 and s(0)

4 respectively for antenna 0 and as s(1)

1

s(1)2 s(1)3 and s(1)

4 respectively for antenna 1 The received

signal vectors y(0)119894

and y(1)119894multiply with the four channel

estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911

119894 119894 = 0 2 4 14 are stacked as 16 times 1

vector z = [1199111198790 119911119879

2 119911

119879

14]119879 Similarly RDBM1 gives output

vector z(1) using the precomputed data vectors y(0)1

and y(1)1

and channel estimation vectors The architecture of PBs and

VLSI Design 11

clk

T1 T2 T3 T4

RPB

RPB

RPB

RPB

d

d

dd

d

d

r4

r3

r2

r1

Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879

1to 1198794)

the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903

1of PB1

Similarly 1199032 1199033 and 119903

4are computed for other CFIThe 119903

1 1199032

1199033 and 119903

4are compared to determine the minimum value by

the CFI detector shown in Figure 4

5 PCFICH Receiver Implementation Methods

The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach

51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891

with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by

119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910

(18)

It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as

119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]

minus Re 119910 Im ℎ + Im 119910Re ℎ

119891 = Re 119910 Im ℎ + Im 119910Re ℎ

(19)

Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12

52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time

The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903

1each requiring

twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911

119894 Four switches operating in system clock speed

are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911

119894 The output of the first adder

is passed to the second adder with a delay to accumulate thevalues 119911

0to 11991115into a register in subsequent clock cyclesThis

process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod

The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel

12 VLSI Design

clk

T1 T2 T3 T4 T5 T6

PB

PB

PB

PB

d

d

dd

d

d

dr4

r3

r2

r1

From other hardware lines

From other hardware lines

Complexmultiplications(real part) of a

hardware line from

Complex

multiplications

(imaginary part) of

a hardware line from RDB

RDB

Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)

Figure 17 Simulation waveform for PCFICH receiver

frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle

53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers

This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant

For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant

6 Results and Discussion

The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO

VLSI Design 13

clk clk

clk

clk

clk

clkDiversity[10]

Diversity[1] GND 1 o wide mux 1Address[10] Output[30]

RTL rom 2 4

3

2

1

0

e

e

e

e

Div0

Div1

Div2

Div3

SISO

SIMO

MISO

MIMO

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op

Choose

Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]

e[30]Out1[310]Out2[310]Out3[310]Out4[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Figure 18 RTL schematic for combined PCFICH architecture with diversity

Table 2 Performance of proposed architectures based on folding and superscalar method

Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power

SISODirect 125 245 125 5479 39081 25587 mdash

Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93

SIMODirect 250 494 250 10942 40278 24827 mdash

Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170

MISODirect 224 580 594 14880 43023 23243 mdash

Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208

MIMODirect 320 844 675 17380 56962 17555 mdash

Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382

Figure 19 Resource utilization graph for generalized architecture

configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements

In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time

frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations

A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture

14 VLSI Design

Table 3 Resource requirements of proposed architecture using folding method

Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019

Figure 20 Implemented device in FPGA editor

a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)

Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching

Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used

Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device

7 Conclusion

In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding

method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented

Conflict of Interests

The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests

Acknowledgments

The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully

References

[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012

[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008

[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010

[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008

[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013

[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010

[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in

VLSI Design 15

Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009

[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010

[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998

[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013

[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013

[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 9: Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

VLSI Design 9

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15Folded unit

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi+1

Imi Im+1

r1

Channel frequency response hi

Received signalvector y i

Precomputeddata si

minus

2l + 1 3 5 7 9 11 13 15

Figure 13 Illustration of folded architecture of RPB in SISO and SIMO

gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z

119894 119894 = 0 2 4 14 are stacked as

16 times 1 vector z = [1199111198790 119911119879

2 119911

119879

14]119879 The 16 times 1 precomputed

data vectors forCFI= 1234 are represented as s1 s2 s3and s4

respectivelyThe detailed structure of receiver decoding blocks (RDB)

is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s

1is the output 119903

1of PB-1 Similarly

1199032 1199033 and 119903

4are computed for CFI = 2 3 and 4 using PB-

2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903

4are applied to the CFI determination circuit

shown in Figure 4 to detect the CFI value

4 System Model and ImplementationArchitecture for MIMO Configuration

In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by

[[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]]

]

=

[[[

[

ℎ00

ℎ01ℎ10 ℎ11ℎlowast

01 minusℎlowast

00ℎlowast

11 minusℎlowast

10

]]]

]

[119889119894

119889lowast

119894+1

] +

[[[[[[[[

[

119899(0)

119894

119899(0)lowast

119894+1

119899(1)

119894

119899(1)lowast

119894+1

]]]]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14(13)

where ℎ119886119887

represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)

119894represents the noise in 119894th subcarrier in 119895th receive

antenna In vector form it is written as

y119894= Heff119894d119894 + n

119894 for 119894 = 0 2 4 6 8 10 12 14 (14)

where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel

frequency response vector at 119894th and (119894 + 1)th subcarrier d119894

is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is

4 times 1 noise vector The objective is to detect the elements 119889119894

and 119889119894+1

of the data vector d119894 Assuming that the elements of

channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by

z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)

whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as

[119911119894

119911lowast

+1

] = [ℎlowast

00minusℎlowast

10

ℎlowast

01minusℎlowast

11

ℎ01

ℎ11

minusℎ00minusℎ10

]

[[[[[

[

119910(0)

119894

119910(0)

119894+1

119910(1)lowast

119894

119910(1)lowast

119894+1

]]]]]

]

for 119894 = 0 2 4 6 8 10 12 14

(16)

The decoder outputs are given by

119911119894= ℎlowast

00119910(0)

119894minus ℎlowast

10119910(0)

119894+1+ ℎ01119910(1)lowast

119894+ ℎ11119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

119911lowast

119894+1= ℎlowast

01119910(0)

119894minus ℎlowast

11119910(0)

119894+1minusℎ00119910(1)lowast

119894minusℎ10119910(1)lowast

119894+1

for 119894 = 0 2 4 6 8 10 12 14

(17)

10 VLSI Design

Channel frequencyresponse vector

Channel frequencyresponse vector

Received signalvector

Received signalvector

Precomputeddata vector z(k)i

from RDB-1using s1i

y i

y i

y i+1

y i+1

hi+1 hi+1

hihi

zi

Folded RDB unit

Folded PB unit

Conjugate

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi

Imi+1

Im+1

minus

r1

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO

The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10

Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)

1

s(0)2 s(0)3 and s(0)

4 respectively for antenna 0 and as s(1)

1

s(1)2 s(1)3 and s(1)

4 respectively for antenna 1 The received

signal vectors y(0)119894

and y(1)119894multiply with the four channel

estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911

119894 119894 = 0 2 4 14 are stacked as 16 times 1

vector z = [1199111198790 119911119879

2 119911

119879

14]119879 Similarly RDBM1 gives output

vector z(1) using the precomputed data vectors y(0)1

and y(1)1

and channel estimation vectors The architecture of PBs and

VLSI Design 11

clk

T1 T2 T3 T4

RPB

RPB

RPB

RPB

d

d

dd

d

d

r4

r3

r2

r1

Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879

1to 1198794)

the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903

1of PB1

Similarly 1199032 1199033 and 119903

4are computed for other CFIThe 119903

1 1199032

1199033 and 119903

4are compared to determine the minimum value by

the CFI detector shown in Figure 4

5 PCFICH Receiver Implementation Methods

The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach

51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891

with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by

119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910

(18)

It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as

119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]

minus Re 119910 Im ℎ + Im 119910Re ℎ

119891 = Re 119910 Im ℎ + Im 119910Re ℎ

(19)

Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12

52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time

The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903

1each requiring

twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911

119894 Four switches operating in system clock speed

are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911

119894 The output of the first adder

is passed to the second adder with a delay to accumulate thevalues 119911

0to 11991115into a register in subsequent clock cyclesThis

process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod

The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel

12 VLSI Design

clk

T1 T2 T3 T4 T5 T6

PB

PB

PB

PB

d

d

dd

d

d

dr4

r3

r2

r1

From other hardware lines

From other hardware lines

Complexmultiplications(real part) of a

hardware line from

Complex

multiplications

(imaginary part) of

a hardware line from RDB

RDB

Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)

Figure 17 Simulation waveform for PCFICH receiver

frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle

53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers

This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant

For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant

6 Results and Discussion

The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO

VLSI Design 13

clk clk

clk

clk

clk

clkDiversity[10]

Diversity[1] GND 1 o wide mux 1Address[10] Output[30]

RTL rom 2 4

3

2

1

0

e

e

e

e

Div0

Div1

Div2

Div3

SISO

SIMO

MISO

MIMO

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op

Choose

Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]

e[30]Out1[310]Out2[310]Out3[310]Out4[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Figure 18 RTL schematic for combined PCFICH architecture with diversity

Table 2 Performance of proposed architectures based on folding and superscalar method

Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power

SISODirect 125 245 125 5479 39081 25587 mdash

Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93

SIMODirect 250 494 250 10942 40278 24827 mdash

Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170

MISODirect 224 580 594 14880 43023 23243 mdash

Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208

MIMODirect 320 844 675 17380 56962 17555 mdash

Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382

Figure 19 Resource utilization graph for generalized architecture

configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements

In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time

frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations

A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture

14 VLSI Design

Table 3 Resource requirements of proposed architecture using folding method

Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019

Figure 20 Implemented device in FPGA editor

a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)

Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching

Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used

Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device

7 Conclusion

In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding

method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented

Conflict of Interests

The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests

Acknowledgments

The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully

References

[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012

[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008

[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010

[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008

[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013

[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010

[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in

VLSI Design 15

Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009

[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010

[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998

[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013

[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013

[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 10: Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

10 VLSI Design

Channel frequencyresponse vector

Channel frequencyresponse vector

Received signalvector

Received signalvector

Precomputeddata vector z(k)i

from RDB-1using s1i

y i

y i

y i+1

y i+1

hi+1 hi+1

hihi

zi

Folded RDB unit

Folded PB unit

Conjugate

DRei

Imi

Rei

Rei

Rei+1

Rei+1

Imi

Imi

Imi+1

Im+1

minus

r1

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 1 3 5 7 9 11 13 15

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 0 2 4 6 8 10 12 14

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 1 3 5 15

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

2l + 0 2 4 14

Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO

The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10

Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)

1

s(0)2 s(0)3 and s(0)

4 respectively for antenna 0 and as s(1)

1

s(1)2 s(1)3 and s(1)

4 respectively for antenna 1 The received

signal vectors y(0)119894

and y(1)119894multiply with the four channel

estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911

119894 119894 = 0 2 4 14 are stacked as 16 times 1

vector z = [1199111198790 119911119879

2 119911

119879

14]119879 Similarly RDBM1 gives output

vector z(1) using the precomputed data vectors y(0)1

and y(1)1

and channel estimation vectors The architecture of PBs and

VLSI Design 11

clk

T1 T2 T3 T4

RPB

RPB

RPB

RPB

d

d

dd

d

d

r4

r3

r2

r1

Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879

1to 1198794)

the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903

1of PB1

Similarly 1199032 1199033 and 119903

4are computed for other CFIThe 119903

1 1199032

1199033 and 119903

4are compared to determine the minimum value by

the CFI detector shown in Figure 4

5 PCFICH Receiver Implementation Methods

The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach

51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891

with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by

119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910

(18)

It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as

119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]

minus Re 119910 Im ℎ + Im 119910Re ℎ

119891 = Re 119910 Im ℎ + Im 119910Re ℎ

(19)

Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12

52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time

The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903

1each requiring

twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911

119894 Four switches operating in system clock speed

are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911

119894 The output of the first adder

is passed to the second adder with a delay to accumulate thevalues 119911

0to 11991115into a register in subsequent clock cyclesThis

process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod

The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel

12 VLSI Design

clk

T1 T2 T3 T4 T5 T6

PB

PB

PB

PB

d

d

dd

d

d

dr4

r3

r2

r1

From other hardware lines

From other hardware lines

Complexmultiplications(real part) of a

hardware line from

Complex

multiplications

(imaginary part) of

a hardware line from RDB

RDB

Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)

Figure 17 Simulation waveform for PCFICH receiver

frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle

53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers

This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant

For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant

6 Results and Discussion

The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO

VLSI Design 13

clk clk

clk

clk

clk

clkDiversity[10]

Diversity[1] GND 1 o wide mux 1Address[10] Output[30]

RTL rom 2 4

3

2

1

0

e

e

e

e

Div0

Div1

Div2

Div3

SISO

SIMO

MISO

MIMO

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op

Choose

Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]

e[30]Out1[310]Out2[310]Out3[310]Out4[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Figure 18 RTL schematic for combined PCFICH architecture with diversity

Table 2 Performance of proposed architectures based on folding and superscalar method

Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power

SISODirect 125 245 125 5479 39081 25587 mdash

Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93

SIMODirect 250 494 250 10942 40278 24827 mdash

Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170

MISODirect 224 580 594 14880 43023 23243 mdash

Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208

MIMODirect 320 844 675 17380 56962 17555 mdash

Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382

Figure 19 Resource utilization graph for generalized architecture

configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements

In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time

frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations

A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture

14 VLSI Design

Table 3 Resource requirements of proposed architecture using folding method

Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019

Figure 20 Implemented device in FPGA editor

a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)

Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching

Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used

Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device

7 Conclusion

In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding

method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented

Conflict of Interests

The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests

Acknowledgments

The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully

References

[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012

[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008

[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010

[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008

[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013

[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010

[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in

VLSI Design 15

Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009

[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010

[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998

[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013

[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013

[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 11: Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

VLSI Design 11

clk

T1 T2 T3 T4

RPB

RPB

RPB

RPB

d

d

dd

d

d

r4

r3

r2

r1

Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879

1to 1198794)

the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903

1of PB1

Similarly 1199032 1199033 and 119903

4are computed for other CFIThe 119903

1 1199032

1199033 and 119903

4are compared to determine the minimum value by

the CFI detector shown in Figure 4

5 PCFICH Receiver Implementation Methods

The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach

51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891

with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by

119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910

(18)

It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as

119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]

minus Re 119910 Im ℎ + Im 119910Re ℎ

119891 = Re 119910 Im ℎ + Im 119910Re ℎ

(19)

Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12

52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time

The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903

1each requiring

twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911

119894 Four switches operating in system clock speed

are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911

119894 The output of the first adder

is passed to the second adder with a delay to accumulate thevalues 119911

0to 11991115into a register in subsequent clock cyclesThis

process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod

The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel

12 VLSI Design

clk

T1 T2 T3 T4 T5 T6

PB

PB

PB

PB

d

d

dd

d

d

dr4

r3

r2

r1

From other hardware lines

From other hardware lines

Complexmultiplications(real part) of a

hardware line from

Complex

multiplications

(imaginary part) of

a hardware line from RDB

RDB

Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)

Figure 17 Simulation waveform for PCFICH receiver

frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle

53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers

This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant

For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant

6 Results and Discussion

The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO

VLSI Design 13

clk clk

clk

clk

clk

clkDiversity[10]

Diversity[1] GND 1 o wide mux 1Address[10] Output[30]

RTL rom 2 4

3

2

1

0

e

e

e

e

Div0

Div1

Div2

Div3

SISO

SIMO

MISO

MIMO

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op

Choose

Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]

e[30]Out1[310]Out2[310]Out3[310]Out4[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Figure 18 RTL schematic for combined PCFICH architecture with diversity

Table 2 Performance of proposed architectures based on folding and superscalar method

Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power

SISODirect 125 245 125 5479 39081 25587 mdash

Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93

SIMODirect 250 494 250 10942 40278 24827 mdash

Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170

MISODirect 224 580 594 14880 43023 23243 mdash

Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208

MIMODirect 320 844 675 17380 56962 17555 mdash

Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382

Figure 19 Resource utilization graph for generalized architecture

configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements

In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time

frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations

A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture

14 VLSI Design

Table 3 Resource requirements of proposed architecture using folding method

Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019

Figure 20 Implemented device in FPGA editor

a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)

Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching

Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used

Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device

7 Conclusion

In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding

method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented

Conflict of Interests

The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests

Acknowledgments

The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully

References

[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012

[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008

[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010

[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008

[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013

[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010

[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in

VLSI Design 15

Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009

[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010

[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998

[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013

[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013

[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 12: Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

12 VLSI Design

clk

T1 T2 T3 T4 T5 T6

PB

PB

PB

PB

d

d

dd

d

d

dr4

r3

r2

r1

From other hardware lines

From other hardware lines

Complexmultiplications(real part) of a

hardware line from

Complex

multiplications

(imaginary part) of

a hardware line from RDB

RDB

Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)

Figure 17 Simulation waveform for PCFICH receiver

frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle

53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers

This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant

For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant

6 Results and Discussion

The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO

VLSI Design 13

clk clk

clk

clk

clk

clkDiversity[10]

Diversity[1] GND 1 o wide mux 1Address[10] Output[30]

RTL rom 2 4

3

2

1

0

e

e

e

e

Div0

Div1

Div2

Div3

SISO

SIMO

MISO

MIMO

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op

Choose

Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]

e[30]Out1[310]Out2[310]Out3[310]Out4[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Figure 18 RTL schematic for combined PCFICH architecture with diversity

Table 2 Performance of proposed architectures based on folding and superscalar method

Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power

SISODirect 125 245 125 5479 39081 25587 mdash

Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93

SIMODirect 250 494 250 10942 40278 24827 mdash

Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170

MISODirect 224 580 594 14880 43023 23243 mdash

Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208

MIMODirect 320 844 675 17380 56962 17555 mdash

Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382

Figure 19 Resource utilization graph for generalized architecture

configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements

In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time

frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations

A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture

14 VLSI Design

Table 3 Resource requirements of proposed architecture using folding method

Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019

Figure 20 Implemented device in FPGA editor

a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)

Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching

Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used

Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device

7 Conclusion

In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding

method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented

Conflict of Interests

The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests

Acknowledgments

The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully

References

[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012

[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008

[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010

[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008

[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013

[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010

[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in

VLSI Design 15

Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009

[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010

[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998

[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013

[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013

[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 13: Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

VLSI Design 13

clk clk

clk

clk

clk

clkDiversity[10]

Diversity[1] GND 1 o wide mux 1Address[10] Output[30]

RTL rom 2 4

3

2

1

0

e

e

e

e

Div0

Div1

Div2

Div3

SISO

SIMO

MISO

MIMO

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op

Choose

Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]

e[30]Out1[310]Out2[310]Out3[310]Out4[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]

Figure 18 RTL schematic for combined PCFICH architecture with diversity

Table 2 Performance of proposed architectures based on folding and superscalar method

Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power

SISODirect 125 245 125 5479 39081 25587 mdash

Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93

SIMODirect 250 494 250 10942 40278 24827 mdash

Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170

MISODirect 224 580 594 14880 43023 23243 mdash

Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208

MIMODirect 320 844 675 17380 56962 17555 mdash

Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382

Figure 19 Resource utilization graph for generalized architecture

configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements

In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time

frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations

A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture

14 VLSI Design

Table 3 Resource requirements of proposed architecture using folding method

Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019

Figure 20 Implemented device in FPGA editor

a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)

Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching

Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used

Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device

7 Conclusion

In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding

method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented

Conflict of Interests

The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests

Acknowledgments

The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully

References

[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012

[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008

[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010

[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008

[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013

[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010

[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in

VLSI Design 15

Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009

[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010

[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998

[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013

[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013

[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 14: Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

14 VLSI Design

Table 3 Resource requirements of proposed architecture using folding method

Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019

Figure 20 Implemented device in FPGA editor

a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)

Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching

Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used

Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device

7 Conclusion

In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding

method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented

Conflict of Interests

The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests

Acknowledgments

The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully

References

[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012

[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008

[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010

[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008

[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013

[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010

[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in

VLSI Design 15

Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009

[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010

[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998

[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013

[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013

[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 15: Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

VLSI Design 15

Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009

[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010

[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998

[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013

[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013

[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 16: Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of