Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Research ArticleNovel Receiver Architecture for LTE-A Downlink PhysicalControl Format Indicator Channel with Diversity
S Syed Ameer Abbas1 S J Thiruvengadam2 and S Susithra1
1 Department of Electronics and Communication Engineering Mepco Schlenk Engineering College Sivakasi 626 005 India2Department of Electronics and Communication Engineering Thiagarajar College of Engineering Madurai 625 015 India
Correspondence should be addressed to S Syed Ameer Abbas abbas mepcoyahoocom
Received 14 November 2013 Revised 27 April 2014 Accepted 5 May 2014 Published 5 June 2014
Academic Editor Chien-In Henry Chen
Copyright copy 2014 S Syed Ameer Abbas et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited
Physical control format indicator channel (PCFICH) carries the control information about the number of orthogonal frequencydivision multiplexing (OFDM) symbols used for transmission of control information in long term evolution-advanced (LTE-A) downlink system In this paper two novel low complexity receiver architectures are proposed to implement the maximumlikelihood- (ML-) based algorithm which decodes the CFI value in field programmable gate array (FPGA) at user equipment (UE)The performance of the proposed architectures is analyzed in terms of the timing cycles operational resource requirement andresource complexity In LTE-A base station and UE have multiple antenna ports to provide transmit and receive diversities Theproposed architectures are implemented in Virtex-6 xc6vlx240tff1156-1 FPGA device for various antenna configurations at basestation and UE When multiple antenna ports are used at base station transmit diversity is obtained by applying the concept ofspace frequency block code (SFBC) It is shown that the proposed architectures use minimum number of operational units inFPGA compared to the traditional direct method of implementation
1 Introduction
The goal of third generation partnership project (3GPP) longterm evolution-advanced (LTE-A) wireless standard is toincrease the capacity and speed of wireless data communi-cation The LTE-A physical layer is a highly efficient meansof conveying both data and control information betweenan enhanced base station popularly known as eNodeB andmobile user equipment (UE) It supports both frequencydivision duplex (FDD) and time division duplex (TDD)configurations in uplink and downlink operations Furtherit provides a wide range of system bandwidths in order tooperate in a large number of different spectrum allocations[1]
LTE-A standard has six physical channels for downlinkThey are physical broadcast channel (PBCH) physical down-link shared channel (PDSCH) physical multicast channel(PMCH) physical downlink control channel (PDCCH)physical hybrid automatic repeat request (ARQ) indicatorchannel (PHICH) and physical control format indicator
channel (PCFICH) PBCH carries the basic system informa-tion for the other channels to be configured and operatedin the LTE-A grid The PDSCH is the main data-bearingchannel PMCH is defined for future use In LTE-A thecontrol signals are transmitted at the start of each subframein the LTE-A grid PDCCH is used to carry the schedulinginformation of different types such as downlink resourcescheduling and uplink power control instructions PHICHis used to send the acknowledgementnegative acknowledge-ment bit to UEs to indicate whether the uplink user datais correctly received or not PCFICH carries the controlinformation about the number of orthogonal frequency divi-sion multiplexing (OFDM) symbols used for transmissionof downlink control information The high data rate inLTE-A requires high processing demands on all layers ofthe system which includes high digital signal processing(DSP) hardware processing in the physical layer Further thehardware implementation of receiver structures of variousphysical channels in LTE-A becomes a challenging task as thecomputational complexity increases
Hindawi Publishing CorporationVLSI DesignVolume 2014 Article ID 825183 15 pageshttpdxdoiorg1011552014825183
2 VLSI Design
In [2] receivers were designed for a 2 times 2 antenna systemand for quadrature phase shift keying (QPSK) modulationand quadrature amplitude modulation (16-QAM and 64-QAM) Though successive interference cancellation (SIC)receiver meets the timing requirements in the LTE systemit is complex and the K-best list sphere detector (K-LSD)receiver has high latency In [3] field programmable gatearray (FPGA) and application specific integrated circuit(ASIC) implementations of receivers based on the linearminimummean-square error (LMMSE) the K-LSD iterativesuccessive interference cancellation (SIC) detector and theiterative K-LSD algorithms are carried out for spatial multi-plexing based LTE-A system The SIC algorithm is found toperformworse than the K-LSDwhen theMIMO channels arehighly correlated while the performance difference dimin-ishes when the correlation decreases The ASIC receiversare designed to meet the decoding throughput requirementsin LTE and the K-LSD is found to be the most complexreceiver although it gives the best reliable data transmissionthroughput It is shown that the receiver architecture whichcould be reconfigured to use a simple or a more complexdetector as the channel conditions change would achieve thebest performancewhile consuming the least amount of powerin the receiver FPGA implementation of MIMO detectorbased on two typical sphere decoding algorithms namely theViterbo-Boutros (VB) algorithm and the Schnorr-Euchner(SE) algorithm is carried out in [4] In this implementationmethod three levels of parallelism are explored to improvethe decoding rate the concurrent execution of the channelmatrix preprocessing on an embedded processor and thedecoding functions on customized hardware modules theparallel decoding of realimaginary parts for complex con-stellation and the concurrent execution of multiple stepsduring the closest lattice point search The implementationof low-complexity codebook searching engine is proposedto support both LTE and LTE-A operations [5] In [6]VLSI implementation of a low-complexity multiple inputmultiple output (MIMO) symbol detector based on a novelMIMOdetection algorithm calledmodified fixed-complexitysoft-output (MFCSO) detection is presented It includes amicrocode-controlled channel preprocessing unit separatechannel memory and a pipelined detection unit MATLAB-based downlink physical-layer simulator for LTE only forresearch applications is presented [7] In [8] maximumlikelihood- (ML-) based receiver structures are developedfor decoding the downlink control channels PCFICH andPHICH in LTE wireless standard and the performance ofthe receivers has been analyzed for various configurationsThe analytical results were validated against computer sim-ulations but hardware implementation of the structures wasnot coded or synthesized In [9] direct implementation ofreceive algorithms was carried out in FPGA for downlinkcontrol channels in LTE However most of these workseither propose architectures for FPGA implementation oranalyze the performance of various receiver structures in ageneralized manner The objective of this paper is to proposenovel architectures for FPGA implementation of transmit andreceive processing of downlink PCFICH channel in LTE-Astandard in particular
Table 1 CFI 32-bit block code
CFI ⟨11988731 119887
0⟩
1 01101101101101101101101101101101
2 10110110110110110110110110110110
3 11011011011011011011011011011011
4 00000000000000000000000000000000
11 Transmit and Receive Processing of PCFICH In PCFICHthe control format indicator (CFI) contains a 32-bit codeword that represents the value of CFI as 1 2 3 or 4 The CFIinforms the UE about the number of OFDM symbols usedfor the transmission of PDCCH information in a subframeThe 32-bit code word corresponding to the value of CFIis scrambled and QPSK modulated The resultant 16 QPSKcomplex symbols are mapped to the resource elements of thefirst OFDM symbol of every subframe after layer mappingand precoding to obtain transmit diversity when two ormore antenna ports are used at eNodeB [10] The 32-bit codewords for the four possible values of CFI are given in Table 1A general block diagram of the transmitter and receiverprocessing of PCFICH is shown in Figure 1
The OFDM signal is transmitted through a frequencyselective fading channel It is assumed that the number ofreceive antenna ports at UE is 119870 At each receive antennaport of theUE resource-element demapping follows the cyclicprefix removal and fast fourier transformation (FFT) The16times1 receive signal vector at each antenna port is equalized infrequency domain at each subcarrier using the corresponding16 times 1 channel frequency response vector The outputs offrequency domain equalizer from each antenna port aresummed up The resultant 16 times 1 complex vector is appliedto the maximum likelihood (ML) detector for detecting theCFI value The objective of this paper is to synthesize andimplement the receiver architecture for PCFICH
The paper is structured as follows Section 2 explains thesystem model and basic implementation architectures forsingle input single output (SISO) and single input multipleoutput (SIMO) configurations The system model and basicimplementation architecture for multiple input single output(MISO) and multiple input multiple output (MIMO) con-figurations are described in Sections 3 and 4 respectivelyThe proposed implementation architectures using foldingand superscalar methods are given in Section 5 for SISOSIMO MISO and MIMO configurations Section 6 analyzesthe performance of the proposed architectures and Section 7concludes the paper with remarks on future work
2 System Model and ImplementationArchitecture for SISO and SIMOConfigurations
The received signal model for SISO configuration of PCFICHis given by
y = h ∘ d (119898) + w (1)
VLSI Design 3
Maximum likelihood detector
CFI (1234)
Detected PCFICH information
signal vector
Mapping to 32 bits
Resource elementmapper
QPSK modulator
Baseband transmit processing at base station eNodeB
Receive processing at user equipment
OFDM symbol generator
Transmission channels
Input signal at antenna port 0
Input signal at Resource element demapper
Frequency domain equalization of each subcarrier
received signal vector
response vectorvectors for CFI 1 2 3 4
Resource element demapper
Frequency domain equalization of each subcarrier
received signal vector
Layermapping Precoding
Layers
Transmit antenna ports
middotmiddotmiddot
16 times 1
16 times 1 channel frequency
middotmiddotmiddot
middotmiddotmiddot
response vector16 times 1 channel frequency
16 times 1 precomputed data
sum
16 times 1 transmit
d(1) d(2) d(3) d(4)antenna port K minus 1
middot middot middot
16 times 1
Figure 1 Block diagram of transmitter and receiver processing
Precomputed data vectors
Detected CFI
RPB
CFI-1
RPB
CFI-2
RPB
CFI-3
RPB
CFI-4
CFI detector
d(1)
d(2)
d(3)
d(4)
16 times 1 channel frequency response vector h16 times 1 received signal vector y
r1
r2
r3
r4
Figure 2 Basic architecture for SISO configuration
where y = [1199100 1199101 119910
15]119879 is a 16 times 1 received signal
vector h = [ℎ0 ℎ1 ℎ
15]119879 is a 16 times 1 channel frequency
response vector d(119898) = [119889(119898)
0 119889(119898)
1 119889
(119898)
15]119879
is a 16 times 1complex QPSK symbol vector corresponding to CFI value
from the set 1 2 3 4 ldquo∘rdquo represents the element by elementmultiplication and w is a 16 times 1 additive white noise vectorand its elements are zero mean Gaussian random numberswith unit variance The objective is to detect the value ofCFI from the received signal vector y assuming the channelfrequency response vector h to be known Using maximumlikelihood (ML) principle CFI is detected as
CFI = argmin119898
10038171003817100381710038171003817y minus h ∘ d(119898)10038171003817100381710038171003817
2
(2)
Figure 2 shows the basic architecture for estimating CFIusing (2) in SISO configurationThe received signal vector yand the channel frequency response vector h are provided asinput to the four receiver processing blocks (RPB) along withprecomputed data vectorsd(1)d(2)d(3) andd(4)The internaldiagram for RPB CFI-1 is shown in Figure 3 It computesthe expression y minus h ∘ d(1)
2
assuming the CFI = 1 In RPB-m the precomputed data vector d(119898) is multiplied elementby element with the channel frequency response vector Theresultant (16 times 1) vector is subtracted from the (16 times 1)received signal vector y The sum of squared magnitude ofeach element in the resultant vector is the output of RPB
The inputs to the CFI detector are the 16-bit outputs ofRPBs 119903
1 1199032 1199033 and 119903
4 The CFI detector determines which
RPB output has minimum value The internal diagram forCFI detector circuit which has 4 comparator modules (CM)is shown in Figure 4 In CM-1 input 119903
2and onersquos complement
4 VLSI Design
d(1)0
d(1)1
d(1)15
minus
minus
minus
16 times 1 channel frequency
16 times 1 received signal vector
Output r1
response vector
[middot]2
[middot]2
[middot]2
h0 y0h1 middot middot middot h15 y1 middot middot middot y15
Re
Im
Re
Im
Re
Im
Precomputed data vectors
Figure 3 Internal architecture of receiver processing block (RPB CFI-1)
CFI
Sum
Sum
Carry
Sum
Carry
Sum
00011011
Generate
Multiplexer control input
CM-1
CM-3
CM-2
CM-4
16
16
1616
16
16
16
16
r1
r2
r3
r4
Carry1
Carry2
Cr1
Cr2
Cr3
Cr4
Sr1
Sr2
Sr3
Sr4
CFI = 1
CFI = 2
CFI = 3
CFI = 4
Figure 4 Internal architecture of CFI detector
of input 1199031are added If carry is generated then 119903
1is less than
1199032 The outputs Cr
1and Sr
1of the CM-1 are defined as
C1199031=
1199031
if carry1= ldquo1rdquo (119903
1lt 1199032)
0 if carry1= ldquo0rdquo (119903
1gt 1199032)
S1199031=
1199032
if carry1= ldquo0rdquo (119903
1gt 1199032)
0 if carry1= ldquo1rdquo (119903
1lt 1199032)
(3)
InCM-2 input 1199034and onersquos complement of input 119903
3are added
If carry is generated then 1199033is less than 119903
4 The outputs Cr
2
and Sr2of CM-2 are defined as
C1199032=
1199033
if carry2= ldquo1rdquo (119903
3lt 1199034)
0 if carry2= ldquo0rdquo (119903
3gt 1199034)
S1199032=
1199034
if carry2= ldquo0rdquo (119903
3gt 1199034)
0 if carry2= ldquo1rdquo (119903
3lt 1199034)
(4)
VLSI Design 5
Antenna ldquo0rdquo Antenna ldquo1rdquo
Precomputed data vectorsRPB
RPB
RPB
RPB
RPB
RPB
RPB
RPB
CFI detector Detected CFI
Precomputed data vectors
16 times 1 channel frequencyresponse vector h(0)16 times 1 received signalvector y (0)
16 times 1 channel frequencyresponse vector h(1)16 times 1 received signalvector y (1)
r(0)1
r(0)2
r(0)3
r(0)4
r1 r2 r3 r4
CFI(1)1
CFI(1)2
CFI(1)3
CFI(1)4
d(1)
d(2)
d(3)
d(4)
r(1)1
r(1)2
r(1)3
r(1)4
CFI(0)1
CFI(0)2
CFI(0)3
CFI(0)4
d(1)
d(2)
d(3)
d(4)
Figure 5 Basic architecture for SIMO configuration
A0
A1
OFDM subcarriers
d15 d14 d13 d12 d11 d10 d9 d8 d7 d6 d5 d4 d3 d2 d1 d0
dlowast14 minusdlowast15 dlowast12 minusdlowast13 dlowast10 minusdlowast11 dlowast8 minusdlowast9 dlowast6 minusdlowast7 dlowast4 minusdlowast5 dlowast2 minusdlowast3 dlowast0 minusdlowast1
Figure 6 Subcarrier mapping for SFBC for 2-antenna system
The multiplexer control input is activated based on theoutputs from CM-3 and CM-4 One of the four outputs Cr
3
Sr3 Cr4 and Sr
4would be ldquo1rdquo based on the minimum value
of four inputs 1199031 1199032 1199033 and 119903
4 respectively Based on this 00
01 10 or 11 in the multiplexer control unit would be activatedto obtain the detected CFI value
In SIMO the 16times1 receive signal vector at the 119896th receiveantenna is modeled as
y(119896) = h(119896) ∘ d(119898) + w119896 119896 = 0 1 2 119870 minus 1 (5)
where ldquo119870rdquo represents the number of receive antennas at UEh(119896) is 16 times 1 channel frequency response vector between thetransmit antenna and 119896th receive antenna and w
119896is 16 times 1
noise vector at 119896th receive antenna Now the objective is todetect the value ofCFI from the received signal vectors at eachreceive antenna assuming the channel frequency responsevectors at each receive antenna are knownThemaximal ratiocombining is carried out at the receiver Using maximumlikelihood (ML) principle CFI is estimated as [9]
CFI = min119898=1234
119870
sum
119896=1
10038171003817100381710038171003817y(119896) minus (h(119896) ∘ d(119898))10038171003817100381710038171003817
2
(6)
The basic architecture for estimating CFI using (6) in1 times 2 SIMO configuration shown in Figure 5 is similar to thebasic architecture of SISO configuration The received signalvector y(119896) and the channel frequency response vector h(119896)are provided as input to the four receiver processing blocks(RPB-CFI(119896)
119898) at 119896th receive antenna along with precomputed
data vectors d(1) d(2) d(3) and d(4)The outputs from themthRPB at 0th receive antenna 119903(0)
119898and 1st receive antenna 119903(1)
119898are
added to get themth input 119903119898of the CFI detector circuit
3 System Model and ImplementationArchitecture for MISO Configuration
In MISO and MIMO configurations space frequency blockcode (SFBC) based layer mapping and precoding are carriedout to obtain transmit diversity when two or more antennaports are used at eNodeB as per the 3GPP LTE wirelessstandard [1 11] It is assumed that 2 antenna ports are usedat eNodeB The 16 times 1 complex symbol vector output ofthe modulation mapper is applied to the layer mapper The8 times 1 symbol vectors at layer 0 and layer 1 are given by[1198890 1198892 1198894 11988961198898 11988910 11988912 and 119889
14] and [119889
1 1198893 1198895 1198897
1198899 11988911 11988913 and 119889
15] The precoding is carried out using
6 VLSI Design
PB-4
Detected CFI
PB-1
PB-2
PB-3
RDB-1
CFI detector
RDB
RDB-2
RDB-3
RDB-4
16 times 1 channel frequencyresponse vector
16 times 1 received signalvector y
16 times 1 precomputed signalvector s1
16 times 1 precomputed signalvector s2
16 times 1 precomputed signalvector s3
16 times 1 precomputed signalvector s4
h(0)h(1)
z
z(1)
z(2)
z(3)
z(4)
r1
r2
r3r4
Figure 7 Proposed MISO receiver architecture for PCFICH
16 times 1 channel frequency response vectors h(0) and h(1)
16 times 1 received signalvector y
h(0)lowast0 h(0)1 middot middot middot h(0)lowast14 h(0)15 h(1)lowast0 h(1)1 middot middot middot h(1)lowast14 h(1)15
C
C
y0Ylowast1
y4
Ylowast15
z0
z1
z14
z15
Figure 8 Internal architecture of MISO receiver decoding block (RDB)
the SFBC in the LTE-A standard The precoder output atantenna port 0 (119860
0) and antenna port 1 (119860
1) is shown in
Figure 6The notation ldquolowastrdquo represents the complex conjugate of the
symbol Basically in precoding a symbol 1198890from layer 0 and
a symbol 1198891from layer 1 are encoded such that the antenna
output is formulated using the orthogonal matrix given by
119860 = [11988901198891
minus119889lowast
1119889lowast
0
] (7)
VLSI Design 7
Precomputed data
Output vector from RDB1
Im
Re
Im
Re
Im
Re
Im
Re
[middot]2
[middot]2
[middot]2
[middot]2 Output r1
minus
minus
minus
minus
z(1)0
z(1)1
z(1)14
z(1)15
z0 z15z1 middot middot middot z14
Figure 9 Internal architecture of MISO processing block (PB-1)
16 times 1 channel frequency response vectorsh(00)h(01)h(10)h(11)
16 times 1 received signalvector y (0)
y (1)
16 times 1 precomputedsignal vector s(0)1
s(1)1
16 times 1 precomputedsignal vector s(0)2
s(1)2
16 times 1 precomputedsignal vector s(0)3
s(1)3
16 times 1 precomputedsignal vector s(0)4
s(1)4
r1
r2
r3
r4
z
z
(1)
z(2)
z(3)
z(4)
RDBM
RDBM-1
RDBM-2
RDBM-3
RDBM-4
PB-1
PB-2
PB-3
PB-4
CFIdetector
Detected CFI
Figure 10 Proposed MIMO receiver architecture for PCFICH
8 VLSI Design
16 times 1 receivedsignal vector
Y(0)0
Y(0)1
Y(1)lowast0
Y(1)lowast1
Y(0)14
Y(0)15
Y(1)lowast14
Y(1)lowast15
h(00)lowast0 minush(00)1 h(00)lowast14 minush(00)15 h(01)0 h(01)lowast15 h(10)0 minush(10)lowast1 h(10)14 minush(10)lowast15 minush(11)lowast0 h(11)1 minush(11)lowast14 h(11)15
Z0
Z1
Z14
Z15C
C
16 times 1 channel frequency response vectors h(00) h(01) h(10) and h(11)
h(01)lowast1 middot middot middot h(01)
14
Figure 11 Internal architecture receiver decoding block-MIMO (RDBM)
Re(y )
Im(y )
Re(h)
Im(h)
e
f
Figure 12 Multiplicands rearrangement for a single complexmultiplication block
This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as
[119910119894
119910lowast
119894+1
] = [ℎ(0)
119894minusℎ(1)
119894
ℎlowast(1)
119894+1ℎlowast(0)
119894+1
] [119889119894
119889lowast
119894+1
] + [119899119894
119899lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14
(8)
where ℎ(0)119894
represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive
antenna 119889119894is data symbol at 119894th subcarrier and 119899
119894is the
noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (9)
where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2
channel matrix d119894is 2 times 1 complex signal vector and n
119894is
2 times 1 noise vector The objective is to detect the elements119889119894and 119889
119894+1of the data vector d
119894 Assuming that the elements
of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z
119894is given
byz119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (10)
whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as
[119911119894
119911lowast
119894+1
] = [ℎlowast(0)
119894ℎ(1)
119894+1
minusℎlowast(1)
119894ℎ(0)
119894+1
] [119910119894
119910lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14(11)
The elements of decoder output are calculated as
119911119894= ℎlowast(0)
119894119910119894+ ℎ(1)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
119911lowast
i+1 = minusℎlowast(1)
119894119910119894+ ℎ(0)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
(12)
The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Choose
Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]
e[30]Out1[310]Out2[310]Out3[310]Out4[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Figure 18 RTL schematic for combined PCFICH architecture with diversity
Table 2 Performance of proposed architectures based on folding and superscalar method
Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power
SISODirect 125 245 125 5479 39081 25587 mdash
Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93
SIMODirect 250 494 250 10942 40278 24827 mdash
Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170
MISODirect 224 580 594 14880 43023 23243 mdash
Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208
MIMODirect 320 844 675 17380 56962 17555 mdash
Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
2 VLSI Design
In [2] receivers were designed for a 2 times 2 antenna systemand for quadrature phase shift keying (QPSK) modulationand quadrature amplitude modulation (16-QAM and 64-QAM) Though successive interference cancellation (SIC)receiver meets the timing requirements in the LTE systemit is complex and the K-best list sphere detector (K-LSD)receiver has high latency In [3] field programmable gatearray (FPGA) and application specific integrated circuit(ASIC) implementations of receivers based on the linearminimummean-square error (LMMSE) the K-LSD iterativesuccessive interference cancellation (SIC) detector and theiterative K-LSD algorithms are carried out for spatial multi-plexing based LTE-A system The SIC algorithm is found toperformworse than the K-LSDwhen theMIMO channels arehighly correlated while the performance difference dimin-ishes when the correlation decreases The ASIC receiversare designed to meet the decoding throughput requirementsin LTE and the K-LSD is found to be the most complexreceiver although it gives the best reliable data transmissionthroughput It is shown that the receiver architecture whichcould be reconfigured to use a simple or a more complexdetector as the channel conditions change would achieve thebest performancewhile consuming the least amount of powerin the receiver FPGA implementation of MIMO detectorbased on two typical sphere decoding algorithms namely theViterbo-Boutros (VB) algorithm and the Schnorr-Euchner(SE) algorithm is carried out in [4] In this implementationmethod three levels of parallelism are explored to improvethe decoding rate the concurrent execution of the channelmatrix preprocessing on an embedded processor and thedecoding functions on customized hardware modules theparallel decoding of realimaginary parts for complex con-stellation and the concurrent execution of multiple stepsduring the closest lattice point search The implementationof low-complexity codebook searching engine is proposedto support both LTE and LTE-A operations [5] In [6]VLSI implementation of a low-complexity multiple inputmultiple output (MIMO) symbol detector based on a novelMIMOdetection algorithm calledmodified fixed-complexitysoft-output (MFCSO) detection is presented It includes amicrocode-controlled channel preprocessing unit separatechannel memory and a pipelined detection unit MATLAB-based downlink physical-layer simulator for LTE only forresearch applications is presented [7] In [8] maximumlikelihood- (ML-) based receiver structures are developedfor decoding the downlink control channels PCFICH andPHICH in LTE wireless standard and the performance ofthe receivers has been analyzed for various configurationsThe analytical results were validated against computer sim-ulations but hardware implementation of the structures wasnot coded or synthesized In [9] direct implementation ofreceive algorithms was carried out in FPGA for downlinkcontrol channels in LTE However most of these workseither propose architectures for FPGA implementation oranalyze the performance of various receiver structures in ageneralized manner The objective of this paper is to proposenovel architectures for FPGA implementation of transmit andreceive processing of downlink PCFICH channel in LTE-Astandard in particular
Table 1 CFI 32-bit block code
CFI ⟨11988731 119887
0⟩
1 01101101101101101101101101101101
2 10110110110110110110110110110110
3 11011011011011011011011011011011
4 00000000000000000000000000000000
11 Transmit and Receive Processing of PCFICH In PCFICHthe control format indicator (CFI) contains a 32-bit codeword that represents the value of CFI as 1 2 3 or 4 The CFIinforms the UE about the number of OFDM symbols usedfor the transmission of PDCCH information in a subframeThe 32-bit code word corresponding to the value of CFIis scrambled and QPSK modulated The resultant 16 QPSKcomplex symbols are mapped to the resource elements of thefirst OFDM symbol of every subframe after layer mappingand precoding to obtain transmit diversity when two ormore antenna ports are used at eNodeB [10] The 32-bit codewords for the four possible values of CFI are given in Table 1A general block diagram of the transmitter and receiverprocessing of PCFICH is shown in Figure 1
The OFDM signal is transmitted through a frequencyselective fading channel It is assumed that the number ofreceive antenna ports at UE is 119870 At each receive antennaport of theUE resource-element demapping follows the cyclicprefix removal and fast fourier transformation (FFT) The16times1 receive signal vector at each antenna port is equalized infrequency domain at each subcarrier using the corresponding16 times 1 channel frequency response vector The outputs offrequency domain equalizer from each antenna port aresummed up The resultant 16 times 1 complex vector is appliedto the maximum likelihood (ML) detector for detecting theCFI value The objective of this paper is to synthesize andimplement the receiver architecture for PCFICH
The paper is structured as follows Section 2 explains thesystem model and basic implementation architectures forsingle input single output (SISO) and single input multipleoutput (SIMO) configurations The system model and basicimplementation architecture for multiple input single output(MISO) and multiple input multiple output (MIMO) con-figurations are described in Sections 3 and 4 respectivelyThe proposed implementation architectures using foldingand superscalar methods are given in Section 5 for SISOSIMO MISO and MIMO configurations Section 6 analyzesthe performance of the proposed architectures and Section 7concludes the paper with remarks on future work
2 System Model and ImplementationArchitecture for SISO and SIMOConfigurations
The received signal model for SISO configuration of PCFICHis given by
y = h ∘ d (119898) + w (1)
VLSI Design 3
Maximum likelihood detector
CFI (1234)
Detected PCFICH information
signal vector
Mapping to 32 bits
Resource elementmapper
QPSK modulator
Baseband transmit processing at base station eNodeB
Receive processing at user equipment
OFDM symbol generator
Transmission channels
Input signal at antenna port 0
Input signal at Resource element demapper
Frequency domain equalization of each subcarrier
received signal vector
response vectorvectors for CFI 1 2 3 4
Resource element demapper
Frequency domain equalization of each subcarrier
received signal vector
Layermapping Precoding
Layers
Transmit antenna ports
middotmiddotmiddot
16 times 1
16 times 1 channel frequency
middotmiddotmiddot
middotmiddotmiddot
response vector16 times 1 channel frequency
16 times 1 precomputed data
sum
16 times 1 transmit
d(1) d(2) d(3) d(4)antenna port K minus 1
middot middot middot
16 times 1
Figure 1 Block diagram of transmitter and receiver processing
Precomputed data vectors
Detected CFI
RPB
CFI-1
RPB
CFI-2
RPB
CFI-3
RPB
CFI-4
CFI detector
d(1)
d(2)
d(3)
d(4)
16 times 1 channel frequency response vector h16 times 1 received signal vector y
r1
r2
r3
r4
Figure 2 Basic architecture for SISO configuration
where y = [1199100 1199101 119910
15]119879 is a 16 times 1 received signal
vector h = [ℎ0 ℎ1 ℎ
15]119879 is a 16 times 1 channel frequency
response vector d(119898) = [119889(119898)
0 119889(119898)
1 119889
(119898)
15]119879
is a 16 times 1complex QPSK symbol vector corresponding to CFI value
from the set 1 2 3 4 ldquo∘rdquo represents the element by elementmultiplication and w is a 16 times 1 additive white noise vectorand its elements are zero mean Gaussian random numberswith unit variance The objective is to detect the value ofCFI from the received signal vector y assuming the channelfrequency response vector h to be known Using maximumlikelihood (ML) principle CFI is detected as
CFI = argmin119898
10038171003817100381710038171003817y minus h ∘ d(119898)10038171003817100381710038171003817
2
(2)
Figure 2 shows the basic architecture for estimating CFIusing (2) in SISO configurationThe received signal vector yand the channel frequency response vector h are provided asinput to the four receiver processing blocks (RPB) along withprecomputed data vectorsd(1)d(2)d(3) andd(4)The internaldiagram for RPB CFI-1 is shown in Figure 3 It computesthe expression y minus h ∘ d(1)
2
assuming the CFI = 1 In RPB-m the precomputed data vector d(119898) is multiplied elementby element with the channel frequency response vector Theresultant (16 times 1) vector is subtracted from the (16 times 1)received signal vector y The sum of squared magnitude ofeach element in the resultant vector is the output of RPB
The inputs to the CFI detector are the 16-bit outputs ofRPBs 119903
1 1199032 1199033 and 119903
4 The CFI detector determines which
RPB output has minimum value The internal diagram forCFI detector circuit which has 4 comparator modules (CM)is shown in Figure 4 In CM-1 input 119903
2and onersquos complement
4 VLSI Design
d(1)0
d(1)1
d(1)15
minus
minus
minus
16 times 1 channel frequency
16 times 1 received signal vector
Output r1
response vector
[middot]2
[middot]2
[middot]2
h0 y0h1 middot middot middot h15 y1 middot middot middot y15
Re
Im
Re
Im
Re
Im
Precomputed data vectors
Figure 3 Internal architecture of receiver processing block (RPB CFI-1)
CFI
Sum
Sum
Carry
Sum
Carry
Sum
00011011
Generate
Multiplexer control input
CM-1
CM-3
CM-2
CM-4
16
16
1616
16
16
16
16
r1
r2
r3
r4
Carry1
Carry2
Cr1
Cr2
Cr3
Cr4
Sr1
Sr2
Sr3
Sr4
CFI = 1
CFI = 2
CFI = 3
CFI = 4
Figure 4 Internal architecture of CFI detector
of input 1199031are added If carry is generated then 119903
1is less than
1199032 The outputs Cr
1and Sr
1of the CM-1 are defined as
C1199031=
1199031
if carry1= ldquo1rdquo (119903
1lt 1199032)
0 if carry1= ldquo0rdquo (119903
1gt 1199032)
S1199031=
1199032
if carry1= ldquo0rdquo (119903
1gt 1199032)
0 if carry1= ldquo1rdquo (119903
1lt 1199032)
(3)
InCM-2 input 1199034and onersquos complement of input 119903
3are added
If carry is generated then 1199033is less than 119903
4 The outputs Cr
2
and Sr2of CM-2 are defined as
C1199032=
1199033
if carry2= ldquo1rdquo (119903
3lt 1199034)
0 if carry2= ldquo0rdquo (119903
3gt 1199034)
S1199032=
1199034
if carry2= ldquo0rdquo (119903
3gt 1199034)
0 if carry2= ldquo1rdquo (119903
3lt 1199034)
(4)
VLSI Design 5
Antenna ldquo0rdquo Antenna ldquo1rdquo
Precomputed data vectorsRPB
RPB
RPB
RPB
RPB
RPB
RPB
RPB
CFI detector Detected CFI
Precomputed data vectors
16 times 1 channel frequencyresponse vector h(0)16 times 1 received signalvector y (0)
16 times 1 channel frequencyresponse vector h(1)16 times 1 received signalvector y (1)
r(0)1
r(0)2
r(0)3
r(0)4
r1 r2 r3 r4
CFI(1)1
CFI(1)2
CFI(1)3
CFI(1)4
d(1)
d(2)
d(3)
d(4)
r(1)1
r(1)2
r(1)3
r(1)4
CFI(0)1
CFI(0)2
CFI(0)3
CFI(0)4
d(1)
d(2)
d(3)
d(4)
Figure 5 Basic architecture for SIMO configuration
A0
A1
OFDM subcarriers
d15 d14 d13 d12 d11 d10 d9 d8 d7 d6 d5 d4 d3 d2 d1 d0
dlowast14 minusdlowast15 dlowast12 minusdlowast13 dlowast10 minusdlowast11 dlowast8 minusdlowast9 dlowast6 minusdlowast7 dlowast4 minusdlowast5 dlowast2 minusdlowast3 dlowast0 minusdlowast1
Figure 6 Subcarrier mapping for SFBC for 2-antenna system
The multiplexer control input is activated based on theoutputs from CM-3 and CM-4 One of the four outputs Cr
3
Sr3 Cr4 and Sr
4would be ldquo1rdquo based on the minimum value
of four inputs 1199031 1199032 1199033 and 119903
4 respectively Based on this 00
01 10 or 11 in the multiplexer control unit would be activatedto obtain the detected CFI value
In SIMO the 16times1 receive signal vector at the 119896th receiveantenna is modeled as
y(119896) = h(119896) ∘ d(119898) + w119896 119896 = 0 1 2 119870 minus 1 (5)
where ldquo119870rdquo represents the number of receive antennas at UEh(119896) is 16 times 1 channel frequency response vector between thetransmit antenna and 119896th receive antenna and w
119896is 16 times 1
noise vector at 119896th receive antenna Now the objective is todetect the value ofCFI from the received signal vectors at eachreceive antenna assuming the channel frequency responsevectors at each receive antenna are knownThemaximal ratiocombining is carried out at the receiver Using maximumlikelihood (ML) principle CFI is estimated as [9]
CFI = min119898=1234
119870
sum
119896=1
10038171003817100381710038171003817y(119896) minus (h(119896) ∘ d(119898))10038171003817100381710038171003817
2
(6)
The basic architecture for estimating CFI using (6) in1 times 2 SIMO configuration shown in Figure 5 is similar to thebasic architecture of SISO configuration The received signalvector y(119896) and the channel frequency response vector h(119896)are provided as input to the four receiver processing blocks(RPB-CFI(119896)
119898) at 119896th receive antenna along with precomputed
data vectors d(1) d(2) d(3) and d(4)The outputs from themthRPB at 0th receive antenna 119903(0)
119898and 1st receive antenna 119903(1)
119898are
added to get themth input 119903119898of the CFI detector circuit
3 System Model and ImplementationArchitecture for MISO Configuration
In MISO and MIMO configurations space frequency blockcode (SFBC) based layer mapping and precoding are carriedout to obtain transmit diversity when two or more antennaports are used at eNodeB as per the 3GPP LTE wirelessstandard [1 11] It is assumed that 2 antenna ports are usedat eNodeB The 16 times 1 complex symbol vector output ofthe modulation mapper is applied to the layer mapper The8 times 1 symbol vectors at layer 0 and layer 1 are given by[1198890 1198892 1198894 11988961198898 11988910 11988912 and 119889
14] and [119889
1 1198893 1198895 1198897
1198899 11988911 11988913 and 119889
15] The precoding is carried out using
6 VLSI Design
PB-4
Detected CFI
PB-1
PB-2
PB-3
RDB-1
CFI detector
RDB
RDB-2
RDB-3
RDB-4
16 times 1 channel frequencyresponse vector
16 times 1 received signalvector y
16 times 1 precomputed signalvector s1
16 times 1 precomputed signalvector s2
16 times 1 precomputed signalvector s3
16 times 1 precomputed signalvector s4
h(0)h(1)
z
z(1)
z(2)
z(3)
z(4)
r1
r2
r3r4
Figure 7 Proposed MISO receiver architecture for PCFICH
16 times 1 channel frequency response vectors h(0) and h(1)
16 times 1 received signalvector y
h(0)lowast0 h(0)1 middot middot middot h(0)lowast14 h(0)15 h(1)lowast0 h(1)1 middot middot middot h(1)lowast14 h(1)15
C
C
y0Ylowast1
y4
Ylowast15
z0
z1
z14
z15
Figure 8 Internal architecture of MISO receiver decoding block (RDB)
the SFBC in the LTE-A standard The precoder output atantenna port 0 (119860
0) and antenna port 1 (119860
1) is shown in
Figure 6The notation ldquolowastrdquo represents the complex conjugate of the
symbol Basically in precoding a symbol 1198890from layer 0 and
a symbol 1198891from layer 1 are encoded such that the antenna
output is formulated using the orthogonal matrix given by
119860 = [11988901198891
minus119889lowast
1119889lowast
0
] (7)
VLSI Design 7
Precomputed data
Output vector from RDB1
Im
Re
Im
Re
Im
Re
Im
Re
[middot]2
[middot]2
[middot]2
[middot]2 Output r1
minus
minus
minus
minus
z(1)0
z(1)1
z(1)14
z(1)15
z0 z15z1 middot middot middot z14
Figure 9 Internal architecture of MISO processing block (PB-1)
16 times 1 channel frequency response vectorsh(00)h(01)h(10)h(11)
16 times 1 received signalvector y (0)
y (1)
16 times 1 precomputedsignal vector s(0)1
s(1)1
16 times 1 precomputedsignal vector s(0)2
s(1)2
16 times 1 precomputedsignal vector s(0)3
s(1)3
16 times 1 precomputedsignal vector s(0)4
s(1)4
r1
r2
r3
r4
z
z
(1)
z(2)
z(3)
z(4)
RDBM
RDBM-1
RDBM-2
RDBM-3
RDBM-4
PB-1
PB-2
PB-3
PB-4
CFIdetector
Detected CFI
Figure 10 Proposed MIMO receiver architecture for PCFICH
8 VLSI Design
16 times 1 receivedsignal vector
Y(0)0
Y(0)1
Y(1)lowast0
Y(1)lowast1
Y(0)14
Y(0)15
Y(1)lowast14
Y(1)lowast15
h(00)lowast0 minush(00)1 h(00)lowast14 minush(00)15 h(01)0 h(01)lowast15 h(10)0 minush(10)lowast1 h(10)14 minush(10)lowast15 minush(11)lowast0 h(11)1 minush(11)lowast14 h(11)15
Z0
Z1
Z14
Z15C
C
16 times 1 channel frequency response vectors h(00) h(01) h(10) and h(11)
h(01)lowast1 middot middot middot h(01)
14
Figure 11 Internal architecture receiver decoding block-MIMO (RDBM)
Re(y )
Im(y )
Re(h)
Im(h)
e
f
Figure 12 Multiplicands rearrangement for a single complexmultiplication block
This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as
[119910119894
119910lowast
119894+1
] = [ℎ(0)
119894minusℎ(1)
119894
ℎlowast(1)
119894+1ℎlowast(0)
119894+1
] [119889119894
119889lowast
119894+1
] + [119899119894
119899lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14
(8)
where ℎ(0)119894
represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive
antenna 119889119894is data symbol at 119894th subcarrier and 119899
119894is the
noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (9)
where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2
channel matrix d119894is 2 times 1 complex signal vector and n
119894is
2 times 1 noise vector The objective is to detect the elements119889119894and 119889
119894+1of the data vector d
119894 Assuming that the elements
of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z
119894is given
byz119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (10)
whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as
[119911119894
119911lowast
119894+1
] = [ℎlowast(0)
119894ℎ(1)
119894+1
minusℎlowast(1)
119894ℎ(0)
119894+1
] [119910119894
119910lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14(11)
The elements of decoder output are calculated as
119911119894= ℎlowast(0)
119894119910119894+ ℎ(1)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
119911lowast
i+1 = minusℎlowast(1)
119894119910119894+ ℎ(0)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
(12)
The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Choose
Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]
e[30]Out1[310]Out2[310]Out3[310]Out4[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Figure 18 RTL schematic for combined PCFICH architecture with diversity
Table 2 Performance of proposed architectures based on folding and superscalar method
Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power
SISODirect 125 245 125 5479 39081 25587 mdash
Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93
SIMODirect 250 494 250 10942 40278 24827 mdash
Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170
MISODirect 224 580 594 14880 43023 23243 mdash
Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208
MIMODirect 320 844 675 17380 56962 17555 mdash
Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
VLSI Design 3
Maximum likelihood detector
CFI (1234)
Detected PCFICH information
signal vector
Mapping to 32 bits
Resource elementmapper
QPSK modulator
Baseband transmit processing at base station eNodeB
Receive processing at user equipment
OFDM symbol generator
Transmission channels
Input signal at antenna port 0
Input signal at Resource element demapper
Frequency domain equalization of each subcarrier
received signal vector
response vectorvectors for CFI 1 2 3 4
Resource element demapper
Frequency domain equalization of each subcarrier
received signal vector
Layermapping Precoding
Layers
Transmit antenna ports
middotmiddotmiddot
16 times 1
16 times 1 channel frequency
middotmiddotmiddot
middotmiddotmiddot
response vector16 times 1 channel frequency
16 times 1 precomputed data
sum
16 times 1 transmit
d(1) d(2) d(3) d(4)antenna port K minus 1
middot middot middot
16 times 1
Figure 1 Block diagram of transmitter and receiver processing
Precomputed data vectors
Detected CFI
RPB
CFI-1
RPB
CFI-2
RPB
CFI-3
RPB
CFI-4
CFI detector
d(1)
d(2)
d(3)
d(4)
16 times 1 channel frequency response vector h16 times 1 received signal vector y
r1
r2
r3
r4
Figure 2 Basic architecture for SISO configuration
where y = [1199100 1199101 119910
15]119879 is a 16 times 1 received signal
vector h = [ℎ0 ℎ1 ℎ
15]119879 is a 16 times 1 channel frequency
response vector d(119898) = [119889(119898)
0 119889(119898)
1 119889
(119898)
15]119879
is a 16 times 1complex QPSK symbol vector corresponding to CFI value
from the set 1 2 3 4 ldquo∘rdquo represents the element by elementmultiplication and w is a 16 times 1 additive white noise vectorand its elements are zero mean Gaussian random numberswith unit variance The objective is to detect the value ofCFI from the received signal vector y assuming the channelfrequency response vector h to be known Using maximumlikelihood (ML) principle CFI is detected as
CFI = argmin119898
10038171003817100381710038171003817y minus h ∘ d(119898)10038171003817100381710038171003817
2
(2)
Figure 2 shows the basic architecture for estimating CFIusing (2) in SISO configurationThe received signal vector yand the channel frequency response vector h are provided asinput to the four receiver processing blocks (RPB) along withprecomputed data vectorsd(1)d(2)d(3) andd(4)The internaldiagram for RPB CFI-1 is shown in Figure 3 It computesthe expression y minus h ∘ d(1)
2
assuming the CFI = 1 In RPB-m the precomputed data vector d(119898) is multiplied elementby element with the channel frequency response vector Theresultant (16 times 1) vector is subtracted from the (16 times 1)received signal vector y The sum of squared magnitude ofeach element in the resultant vector is the output of RPB
The inputs to the CFI detector are the 16-bit outputs ofRPBs 119903
1 1199032 1199033 and 119903
4 The CFI detector determines which
RPB output has minimum value The internal diagram forCFI detector circuit which has 4 comparator modules (CM)is shown in Figure 4 In CM-1 input 119903
2and onersquos complement
4 VLSI Design
d(1)0
d(1)1
d(1)15
minus
minus
minus
16 times 1 channel frequency
16 times 1 received signal vector
Output r1
response vector
[middot]2
[middot]2
[middot]2
h0 y0h1 middot middot middot h15 y1 middot middot middot y15
Re
Im
Re
Im
Re
Im
Precomputed data vectors
Figure 3 Internal architecture of receiver processing block (RPB CFI-1)
CFI
Sum
Sum
Carry
Sum
Carry
Sum
00011011
Generate
Multiplexer control input
CM-1
CM-3
CM-2
CM-4
16
16
1616
16
16
16
16
r1
r2
r3
r4
Carry1
Carry2
Cr1
Cr2
Cr3
Cr4
Sr1
Sr2
Sr3
Sr4
CFI = 1
CFI = 2
CFI = 3
CFI = 4
Figure 4 Internal architecture of CFI detector
of input 1199031are added If carry is generated then 119903
1is less than
1199032 The outputs Cr
1and Sr
1of the CM-1 are defined as
C1199031=
1199031
if carry1= ldquo1rdquo (119903
1lt 1199032)
0 if carry1= ldquo0rdquo (119903
1gt 1199032)
S1199031=
1199032
if carry1= ldquo0rdquo (119903
1gt 1199032)
0 if carry1= ldquo1rdquo (119903
1lt 1199032)
(3)
InCM-2 input 1199034and onersquos complement of input 119903
3are added
If carry is generated then 1199033is less than 119903
4 The outputs Cr
2
and Sr2of CM-2 are defined as
C1199032=
1199033
if carry2= ldquo1rdquo (119903
3lt 1199034)
0 if carry2= ldquo0rdquo (119903
3gt 1199034)
S1199032=
1199034
if carry2= ldquo0rdquo (119903
3gt 1199034)
0 if carry2= ldquo1rdquo (119903
3lt 1199034)
(4)
VLSI Design 5
Antenna ldquo0rdquo Antenna ldquo1rdquo
Precomputed data vectorsRPB
RPB
RPB
RPB
RPB
RPB
RPB
RPB
CFI detector Detected CFI
Precomputed data vectors
16 times 1 channel frequencyresponse vector h(0)16 times 1 received signalvector y (0)
16 times 1 channel frequencyresponse vector h(1)16 times 1 received signalvector y (1)
r(0)1
r(0)2
r(0)3
r(0)4
r1 r2 r3 r4
CFI(1)1
CFI(1)2
CFI(1)3
CFI(1)4
d(1)
d(2)
d(3)
d(4)
r(1)1
r(1)2
r(1)3
r(1)4
CFI(0)1
CFI(0)2
CFI(0)3
CFI(0)4
d(1)
d(2)
d(3)
d(4)
Figure 5 Basic architecture for SIMO configuration
A0
A1
OFDM subcarriers
d15 d14 d13 d12 d11 d10 d9 d8 d7 d6 d5 d4 d3 d2 d1 d0
dlowast14 minusdlowast15 dlowast12 minusdlowast13 dlowast10 minusdlowast11 dlowast8 minusdlowast9 dlowast6 minusdlowast7 dlowast4 minusdlowast5 dlowast2 minusdlowast3 dlowast0 minusdlowast1
Figure 6 Subcarrier mapping for SFBC for 2-antenna system
The multiplexer control input is activated based on theoutputs from CM-3 and CM-4 One of the four outputs Cr
3
Sr3 Cr4 and Sr
4would be ldquo1rdquo based on the minimum value
of four inputs 1199031 1199032 1199033 and 119903
4 respectively Based on this 00
01 10 or 11 in the multiplexer control unit would be activatedto obtain the detected CFI value
In SIMO the 16times1 receive signal vector at the 119896th receiveantenna is modeled as
y(119896) = h(119896) ∘ d(119898) + w119896 119896 = 0 1 2 119870 minus 1 (5)
where ldquo119870rdquo represents the number of receive antennas at UEh(119896) is 16 times 1 channel frequency response vector between thetransmit antenna and 119896th receive antenna and w
119896is 16 times 1
noise vector at 119896th receive antenna Now the objective is todetect the value ofCFI from the received signal vectors at eachreceive antenna assuming the channel frequency responsevectors at each receive antenna are knownThemaximal ratiocombining is carried out at the receiver Using maximumlikelihood (ML) principle CFI is estimated as [9]
CFI = min119898=1234
119870
sum
119896=1
10038171003817100381710038171003817y(119896) minus (h(119896) ∘ d(119898))10038171003817100381710038171003817
2
(6)
The basic architecture for estimating CFI using (6) in1 times 2 SIMO configuration shown in Figure 5 is similar to thebasic architecture of SISO configuration The received signalvector y(119896) and the channel frequency response vector h(119896)are provided as input to the four receiver processing blocks(RPB-CFI(119896)
119898) at 119896th receive antenna along with precomputed
data vectors d(1) d(2) d(3) and d(4)The outputs from themthRPB at 0th receive antenna 119903(0)
119898and 1st receive antenna 119903(1)
119898are
added to get themth input 119903119898of the CFI detector circuit
3 System Model and ImplementationArchitecture for MISO Configuration
In MISO and MIMO configurations space frequency blockcode (SFBC) based layer mapping and precoding are carriedout to obtain transmit diversity when two or more antennaports are used at eNodeB as per the 3GPP LTE wirelessstandard [1 11] It is assumed that 2 antenna ports are usedat eNodeB The 16 times 1 complex symbol vector output ofthe modulation mapper is applied to the layer mapper The8 times 1 symbol vectors at layer 0 and layer 1 are given by[1198890 1198892 1198894 11988961198898 11988910 11988912 and 119889
14] and [119889
1 1198893 1198895 1198897
1198899 11988911 11988913 and 119889
15] The precoding is carried out using
6 VLSI Design
PB-4
Detected CFI
PB-1
PB-2
PB-3
RDB-1
CFI detector
RDB
RDB-2
RDB-3
RDB-4
16 times 1 channel frequencyresponse vector
16 times 1 received signalvector y
16 times 1 precomputed signalvector s1
16 times 1 precomputed signalvector s2
16 times 1 precomputed signalvector s3
16 times 1 precomputed signalvector s4
h(0)h(1)
z
z(1)
z(2)
z(3)
z(4)
r1
r2
r3r4
Figure 7 Proposed MISO receiver architecture for PCFICH
16 times 1 channel frequency response vectors h(0) and h(1)
16 times 1 received signalvector y
h(0)lowast0 h(0)1 middot middot middot h(0)lowast14 h(0)15 h(1)lowast0 h(1)1 middot middot middot h(1)lowast14 h(1)15
C
C
y0Ylowast1
y4
Ylowast15
z0
z1
z14
z15
Figure 8 Internal architecture of MISO receiver decoding block (RDB)
the SFBC in the LTE-A standard The precoder output atantenna port 0 (119860
0) and antenna port 1 (119860
1) is shown in
Figure 6The notation ldquolowastrdquo represents the complex conjugate of the
symbol Basically in precoding a symbol 1198890from layer 0 and
a symbol 1198891from layer 1 are encoded such that the antenna
output is formulated using the orthogonal matrix given by
119860 = [11988901198891
minus119889lowast
1119889lowast
0
] (7)
VLSI Design 7
Precomputed data
Output vector from RDB1
Im
Re
Im
Re
Im
Re
Im
Re
[middot]2
[middot]2
[middot]2
[middot]2 Output r1
minus
minus
minus
minus
z(1)0
z(1)1
z(1)14
z(1)15
z0 z15z1 middot middot middot z14
Figure 9 Internal architecture of MISO processing block (PB-1)
16 times 1 channel frequency response vectorsh(00)h(01)h(10)h(11)
16 times 1 received signalvector y (0)
y (1)
16 times 1 precomputedsignal vector s(0)1
s(1)1
16 times 1 precomputedsignal vector s(0)2
s(1)2
16 times 1 precomputedsignal vector s(0)3
s(1)3
16 times 1 precomputedsignal vector s(0)4
s(1)4
r1
r2
r3
r4
z
z
(1)
z(2)
z(3)
z(4)
RDBM
RDBM-1
RDBM-2
RDBM-3
RDBM-4
PB-1
PB-2
PB-3
PB-4
CFIdetector
Detected CFI
Figure 10 Proposed MIMO receiver architecture for PCFICH
8 VLSI Design
16 times 1 receivedsignal vector
Y(0)0
Y(0)1
Y(1)lowast0
Y(1)lowast1
Y(0)14
Y(0)15
Y(1)lowast14
Y(1)lowast15
h(00)lowast0 minush(00)1 h(00)lowast14 minush(00)15 h(01)0 h(01)lowast15 h(10)0 minush(10)lowast1 h(10)14 minush(10)lowast15 minush(11)lowast0 h(11)1 minush(11)lowast14 h(11)15
Z0
Z1
Z14
Z15C
C
16 times 1 channel frequency response vectors h(00) h(01) h(10) and h(11)
h(01)lowast1 middot middot middot h(01)
14
Figure 11 Internal architecture receiver decoding block-MIMO (RDBM)
Re(y )
Im(y )
Re(h)
Im(h)
e
f
Figure 12 Multiplicands rearrangement for a single complexmultiplication block
This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as
[119910119894
119910lowast
119894+1
] = [ℎ(0)
119894minusℎ(1)
119894
ℎlowast(1)
119894+1ℎlowast(0)
119894+1
] [119889119894
119889lowast
119894+1
] + [119899119894
119899lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14
(8)
where ℎ(0)119894
represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive
antenna 119889119894is data symbol at 119894th subcarrier and 119899
119894is the
noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (9)
where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2
channel matrix d119894is 2 times 1 complex signal vector and n
119894is
2 times 1 noise vector The objective is to detect the elements119889119894and 119889
119894+1of the data vector d
119894 Assuming that the elements
of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z
119894is given
byz119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (10)
whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as
[119911119894
119911lowast
119894+1
] = [ℎlowast(0)
119894ℎ(1)
119894+1
minusℎlowast(1)
119894ℎ(0)
119894+1
] [119910119894
119910lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14(11)
The elements of decoder output are calculated as
119911119894= ℎlowast(0)
119894119910119894+ ℎ(1)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
119911lowast
i+1 = minusℎlowast(1)
119894119910119894+ ℎ(0)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
(12)
The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Choose
Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]
e[30]Out1[310]Out2[310]Out3[310]Out4[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Figure 18 RTL schematic for combined PCFICH architecture with diversity
Table 2 Performance of proposed architectures based on folding and superscalar method
Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power
SISODirect 125 245 125 5479 39081 25587 mdash
Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93
SIMODirect 250 494 250 10942 40278 24827 mdash
Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170
MISODirect 224 580 594 14880 43023 23243 mdash
Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208
MIMODirect 320 844 675 17380 56962 17555 mdash
Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
4 VLSI Design
d(1)0
d(1)1
d(1)15
minus
minus
minus
16 times 1 channel frequency
16 times 1 received signal vector
Output r1
response vector
[middot]2
[middot]2
[middot]2
h0 y0h1 middot middot middot h15 y1 middot middot middot y15
Re
Im
Re
Im
Re
Im
Precomputed data vectors
Figure 3 Internal architecture of receiver processing block (RPB CFI-1)
CFI
Sum
Sum
Carry
Sum
Carry
Sum
00011011
Generate
Multiplexer control input
CM-1
CM-3
CM-2
CM-4
16
16
1616
16
16
16
16
r1
r2
r3
r4
Carry1
Carry2
Cr1
Cr2
Cr3
Cr4
Sr1
Sr2
Sr3
Sr4
CFI = 1
CFI = 2
CFI = 3
CFI = 4
Figure 4 Internal architecture of CFI detector
of input 1199031are added If carry is generated then 119903
1is less than
1199032 The outputs Cr
1and Sr
1of the CM-1 are defined as
C1199031=
1199031
if carry1= ldquo1rdquo (119903
1lt 1199032)
0 if carry1= ldquo0rdquo (119903
1gt 1199032)
S1199031=
1199032
if carry1= ldquo0rdquo (119903
1gt 1199032)
0 if carry1= ldquo1rdquo (119903
1lt 1199032)
(3)
InCM-2 input 1199034and onersquos complement of input 119903
3are added
If carry is generated then 1199033is less than 119903
4 The outputs Cr
2
and Sr2of CM-2 are defined as
C1199032=
1199033
if carry2= ldquo1rdquo (119903
3lt 1199034)
0 if carry2= ldquo0rdquo (119903
3gt 1199034)
S1199032=
1199034
if carry2= ldquo0rdquo (119903
3gt 1199034)
0 if carry2= ldquo1rdquo (119903
3lt 1199034)
(4)
VLSI Design 5
Antenna ldquo0rdquo Antenna ldquo1rdquo
Precomputed data vectorsRPB
RPB
RPB
RPB
RPB
RPB
RPB
RPB
CFI detector Detected CFI
Precomputed data vectors
16 times 1 channel frequencyresponse vector h(0)16 times 1 received signalvector y (0)
16 times 1 channel frequencyresponse vector h(1)16 times 1 received signalvector y (1)
r(0)1
r(0)2
r(0)3
r(0)4
r1 r2 r3 r4
CFI(1)1
CFI(1)2
CFI(1)3
CFI(1)4
d(1)
d(2)
d(3)
d(4)
r(1)1
r(1)2
r(1)3
r(1)4
CFI(0)1
CFI(0)2
CFI(0)3
CFI(0)4
d(1)
d(2)
d(3)
d(4)
Figure 5 Basic architecture for SIMO configuration
A0
A1
OFDM subcarriers
d15 d14 d13 d12 d11 d10 d9 d8 d7 d6 d5 d4 d3 d2 d1 d0
dlowast14 minusdlowast15 dlowast12 minusdlowast13 dlowast10 minusdlowast11 dlowast8 minusdlowast9 dlowast6 minusdlowast7 dlowast4 minusdlowast5 dlowast2 minusdlowast3 dlowast0 minusdlowast1
Figure 6 Subcarrier mapping for SFBC for 2-antenna system
The multiplexer control input is activated based on theoutputs from CM-3 and CM-4 One of the four outputs Cr
3
Sr3 Cr4 and Sr
4would be ldquo1rdquo based on the minimum value
of four inputs 1199031 1199032 1199033 and 119903
4 respectively Based on this 00
01 10 or 11 in the multiplexer control unit would be activatedto obtain the detected CFI value
In SIMO the 16times1 receive signal vector at the 119896th receiveantenna is modeled as
y(119896) = h(119896) ∘ d(119898) + w119896 119896 = 0 1 2 119870 minus 1 (5)
where ldquo119870rdquo represents the number of receive antennas at UEh(119896) is 16 times 1 channel frequency response vector between thetransmit antenna and 119896th receive antenna and w
119896is 16 times 1
noise vector at 119896th receive antenna Now the objective is todetect the value ofCFI from the received signal vectors at eachreceive antenna assuming the channel frequency responsevectors at each receive antenna are knownThemaximal ratiocombining is carried out at the receiver Using maximumlikelihood (ML) principle CFI is estimated as [9]
CFI = min119898=1234
119870
sum
119896=1
10038171003817100381710038171003817y(119896) minus (h(119896) ∘ d(119898))10038171003817100381710038171003817
2
(6)
The basic architecture for estimating CFI using (6) in1 times 2 SIMO configuration shown in Figure 5 is similar to thebasic architecture of SISO configuration The received signalvector y(119896) and the channel frequency response vector h(119896)are provided as input to the four receiver processing blocks(RPB-CFI(119896)
119898) at 119896th receive antenna along with precomputed
data vectors d(1) d(2) d(3) and d(4)The outputs from themthRPB at 0th receive antenna 119903(0)
119898and 1st receive antenna 119903(1)
119898are
added to get themth input 119903119898of the CFI detector circuit
3 System Model and ImplementationArchitecture for MISO Configuration
In MISO and MIMO configurations space frequency blockcode (SFBC) based layer mapping and precoding are carriedout to obtain transmit diversity when two or more antennaports are used at eNodeB as per the 3GPP LTE wirelessstandard [1 11] It is assumed that 2 antenna ports are usedat eNodeB The 16 times 1 complex symbol vector output ofthe modulation mapper is applied to the layer mapper The8 times 1 symbol vectors at layer 0 and layer 1 are given by[1198890 1198892 1198894 11988961198898 11988910 11988912 and 119889
14] and [119889
1 1198893 1198895 1198897
1198899 11988911 11988913 and 119889
15] The precoding is carried out using
6 VLSI Design
PB-4
Detected CFI
PB-1
PB-2
PB-3
RDB-1
CFI detector
RDB
RDB-2
RDB-3
RDB-4
16 times 1 channel frequencyresponse vector
16 times 1 received signalvector y
16 times 1 precomputed signalvector s1
16 times 1 precomputed signalvector s2
16 times 1 precomputed signalvector s3
16 times 1 precomputed signalvector s4
h(0)h(1)
z
z(1)
z(2)
z(3)
z(4)
r1
r2
r3r4
Figure 7 Proposed MISO receiver architecture for PCFICH
16 times 1 channel frequency response vectors h(0) and h(1)
16 times 1 received signalvector y
h(0)lowast0 h(0)1 middot middot middot h(0)lowast14 h(0)15 h(1)lowast0 h(1)1 middot middot middot h(1)lowast14 h(1)15
C
C
y0Ylowast1
y4
Ylowast15
z0
z1
z14
z15
Figure 8 Internal architecture of MISO receiver decoding block (RDB)
the SFBC in the LTE-A standard The precoder output atantenna port 0 (119860
0) and antenna port 1 (119860
1) is shown in
Figure 6The notation ldquolowastrdquo represents the complex conjugate of the
symbol Basically in precoding a symbol 1198890from layer 0 and
a symbol 1198891from layer 1 are encoded such that the antenna
output is formulated using the orthogonal matrix given by
119860 = [11988901198891
minus119889lowast
1119889lowast
0
] (7)
VLSI Design 7
Precomputed data
Output vector from RDB1
Im
Re
Im
Re
Im
Re
Im
Re
[middot]2
[middot]2
[middot]2
[middot]2 Output r1
minus
minus
minus
minus
z(1)0
z(1)1
z(1)14
z(1)15
z0 z15z1 middot middot middot z14
Figure 9 Internal architecture of MISO processing block (PB-1)
16 times 1 channel frequency response vectorsh(00)h(01)h(10)h(11)
16 times 1 received signalvector y (0)
y (1)
16 times 1 precomputedsignal vector s(0)1
s(1)1
16 times 1 precomputedsignal vector s(0)2
s(1)2
16 times 1 precomputedsignal vector s(0)3
s(1)3
16 times 1 precomputedsignal vector s(0)4
s(1)4
r1
r2
r3
r4
z
z
(1)
z(2)
z(3)
z(4)
RDBM
RDBM-1
RDBM-2
RDBM-3
RDBM-4
PB-1
PB-2
PB-3
PB-4
CFIdetector
Detected CFI
Figure 10 Proposed MIMO receiver architecture for PCFICH
8 VLSI Design
16 times 1 receivedsignal vector
Y(0)0
Y(0)1
Y(1)lowast0
Y(1)lowast1
Y(0)14
Y(0)15
Y(1)lowast14
Y(1)lowast15
h(00)lowast0 minush(00)1 h(00)lowast14 minush(00)15 h(01)0 h(01)lowast15 h(10)0 minush(10)lowast1 h(10)14 minush(10)lowast15 minush(11)lowast0 h(11)1 minush(11)lowast14 h(11)15
Z0
Z1
Z14
Z15C
C
16 times 1 channel frequency response vectors h(00) h(01) h(10) and h(11)
h(01)lowast1 middot middot middot h(01)
14
Figure 11 Internal architecture receiver decoding block-MIMO (RDBM)
Re(y )
Im(y )
Re(h)
Im(h)
e
f
Figure 12 Multiplicands rearrangement for a single complexmultiplication block
This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as
[119910119894
119910lowast
119894+1
] = [ℎ(0)
119894minusℎ(1)
119894
ℎlowast(1)
119894+1ℎlowast(0)
119894+1
] [119889119894
119889lowast
119894+1
] + [119899119894
119899lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14
(8)
where ℎ(0)119894
represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive
antenna 119889119894is data symbol at 119894th subcarrier and 119899
119894is the
noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (9)
where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2
channel matrix d119894is 2 times 1 complex signal vector and n
119894is
2 times 1 noise vector The objective is to detect the elements119889119894and 119889
119894+1of the data vector d
119894 Assuming that the elements
of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z
119894is given
byz119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (10)
whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as
[119911119894
119911lowast
119894+1
] = [ℎlowast(0)
119894ℎ(1)
119894+1
minusℎlowast(1)
119894ℎ(0)
119894+1
] [119910119894
119910lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14(11)
The elements of decoder output are calculated as
119911119894= ℎlowast(0)
119894119910119894+ ℎ(1)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
119911lowast
i+1 = minusℎlowast(1)
119894119910119894+ ℎ(0)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
(12)
The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Choose
Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]
e[30]Out1[310]Out2[310]Out3[310]Out4[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Figure 18 RTL schematic for combined PCFICH architecture with diversity
Table 2 Performance of proposed architectures based on folding and superscalar method
Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power
SISODirect 125 245 125 5479 39081 25587 mdash
Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93
SIMODirect 250 494 250 10942 40278 24827 mdash
Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170
MISODirect 224 580 594 14880 43023 23243 mdash
Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208
MIMODirect 320 844 675 17380 56962 17555 mdash
Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
VLSI Design 5
Antenna ldquo0rdquo Antenna ldquo1rdquo
Precomputed data vectorsRPB
RPB
RPB
RPB
RPB
RPB
RPB
RPB
CFI detector Detected CFI
Precomputed data vectors
16 times 1 channel frequencyresponse vector h(0)16 times 1 received signalvector y (0)
16 times 1 channel frequencyresponse vector h(1)16 times 1 received signalvector y (1)
r(0)1
r(0)2
r(0)3
r(0)4
r1 r2 r3 r4
CFI(1)1
CFI(1)2
CFI(1)3
CFI(1)4
d(1)
d(2)
d(3)
d(4)
r(1)1
r(1)2
r(1)3
r(1)4
CFI(0)1
CFI(0)2
CFI(0)3
CFI(0)4
d(1)
d(2)
d(3)
d(4)
Figure 5 Basic architecture for SIMO configuration
A0
A1
OFDM subcarriers
d15 d14 d13 d12 d11 d10 d9 d8 d7 d6 d5 d4 d3 d2 d1 d0
dlowast14 minusdlowast15 dlowast12 minusdlowast13 dlowast10 minusdlowast11 dlowast8 minusdlowast9 dlowast6 minusdlowast7 dlowast4 minusdlowast5 dlowast2 minusdlowast3 dlowast0 minusdlowast1
Figure 6 Subcarrier mapping for SFBC for 2-antenna system
The multiplexer control input is activated based on theoutputs from CM-3 and CM-4 One of the four outputs Cr
3
Sr3 Cr4 and Sr
4would be ldquo1rdquo based on the minimum value
of four inputs 1199031 1199032 1199033 and 119903
4 respectively Based on this 00
01 10 or 11 in the multiplexer control unit would be activatedto obtain the detected CFI value
In SIMO the 16times1 receive signal vector at the 119896th receiveantenna is modeled as
y(119896) = h(119896) ∘ d(119898) + w119896 119896 = 0 1 2 119870 minus 1 (5)
where ldquo119870rdquo represents the number of receive antennas at UEh(119896) is 16 times 1 channel frequency response vector between thetransmit antenna and 119896th receive antenna and w
119896is 16 times 1
noise vector at 119896th receive antenna Now the objective is todetect the value ofCFI from the received signal vectors at eachreceive antenna assuming the channel frequency responsevectors at each receive antenna are knownThemaximal ratiocombining is carried out at the receiver Using maximumlikelihood (ML) principle CFI is estimated as [9]
CFI = min119898=1234
119870
sum
119896=1
10038171003817100381710038171003817y(119896) minus (h(119896) ∘ d(119898))10038171003817100381710038171003817
2
(6)
The basic architecture for estimating CFI using (6) in1 times 2 SIMO configuration shown in Figure 5 is similar to thebasic architecture of SISO configuration The received signalvector y(119896) and the channel frequency response vector h(119896)are provided as input to the four receiver processing blocks(RPB-CFI(119896)
119898) at 119896th receive antenna along with precomputed
data vectors d(1) d(2) d(3) and d(4)The outputs from themthRPB at 0th receive antenna 119903(0)
119898and 1st receive antenna 119903(1)
119898are
added to get themth input 119903119898of the CFI detector circuit
3 System Model and ImplementationArchitecture for MISO Configuration
In MISO and MIMO configurations space frequency blockcode (SFBC) based layer mapping and precoding are carriedout to obtain transmit diversity when two or more antennaports are used at eNodeB as per the 3GPP LTE wirelessstandard [1 11] It is assumed that 2 antenna ports are usedat eNodeB The 16 times 1 complex symbol vector output ofthe modulation mapper is applied to the layer mapper The8 times 1 symbol vectors at layer 0 and layer 1 are given by[1198890 1198892 1198894 11988961198898 11988910 11988912 and 119889
14] and [119889
1 1198893 1198895 1198897
1198899 11988911 11988913 and 119889
15] The precoding is carried out using
6 VLSI Design
PB-4
Detected CFI
PB-1
PB-2
PB-3
RDB-1
CFI detector
RDB
RDB-2
RDB-3
RDB-4
16 times 1 channel frequencyresponse vector
16 times 1 received signalvector y
16 times 1 precomputed signalvector s1
16 times 1 precomputed signalvector s2
16 times 1 precomputed signalvector s3
16 times 1 precomputed signalvector s4
h(0)h(1)
z
z(1)
z(2)
z(3)
z(4)
r1
r2
r3r4
Figure 7 Proposed MISO receiver architecture for PCFICH
16 times 1 channel frequency response vectors h(0) and h(1)
16 times 1 received signalvector y
h(0)lowast0 h(0)1 middot middot middot h(0)lowast14 h(0)15 h(1)lowast0 h(1)1 middot middot middot h(1)lowast14 h(1)15
C
C
y0Ylowast1
y4
Ylowast15
z0
z1
z14
z15
Figure 8 Internal architecture of MISO receiver decoding block (RDB)
the SFBC in the LTE-A standard The precoder output atantenna port 0 (119860
0) and antenna port 1 (119860
1) is shown in
Figure 6The notation ldquolowastrdquo represents the complex conjugate of the
symbol Basically in precoding a symbol 1198890from layer 0 and
a symbol 1198891from layer 1 are encoded such that the antenna
output is formulated using the orthogonal matrix given by
119860 = [11988901198891
minus119889lowast
1119889lowast
0
] (7)
VLSI Design 7
Precomputed data
Output vector from RDB1
Im
Re
Im
Re
Im
Re
Im
Re
[middot]2
[middot]2
[middot]2
[middot]2 Output r1
minus
minus
minus
minus
z(1)0
z(1)1
z(1)14
z(1)15
z0 z15z1 middot middot middot z14
Figure 9 Internal architecture of MISO processing block (PB-1)
16 times 1 channel frequency response vectorsh(00)h(01)h(10)h(11)
16 times 1 received signalvector y (0)
y (1)
16 times 1 precomputedsignal vector s(0)1
s(1)1
16 times 1 precomputedsignal vector s(0)2
s(1)2
16 times 1 precomputedsignal vector s(0)3
s(1)3
16 times 1 precomputedsignal vector s(0)4
s(1)4
r1
r2
r3
r4
z
z
(1)
z(2)
z(3)
z(4)
RDBM
RDBM-1
RDBM-2
RDBM-3
RDBM-4
PB-1
PB-2
PB-3
PB-4
CFIdetector
Detected CFI
Figure 10 Proposed MIMO receiver architecture for PCFICH
8 VLSI Design
16 times 1 receivedsignal vector
Y(0)0
Y(0)1
Y(1)lowast0
Y(1)lowast1
Y(0)14
Y(0)15
Y(1)lowast14
Y(1)lowast15
h(00)lowast0 minush(00)1 h(00)lowast14 minush(00)15 h(01)0 h(01)lowast15 h(10)0 minush(10)lowast1 h(10)14 minush(10)lowast15 minush(11)lowast0 h(11)1 minush(11)lowast14 h(11)15
Z0
Z1
Z14
Z15C
C
16 times 1 channel frequency response vectors h(00) h(01) h(10) and h(11)
h(01)lowast1 middot middot middot h(01)
14
Figure 11 Internal architecture receiver decoding block-MIMO (RDBM)
Re(y )
Im(y )
Re(h)
Im(h)
e
f
Figure 12 Multiplicands rearrangement for a single complexmultiplication block
This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as
[119910119894
119910lowast
119894+1
] = [ℎ(0)
119894minusℎ(1)
119894
ℎlowast(1)
119894+1ℎlowast(0)
119894+1
] [119889119894
119889lowast
119894+1
] + [119899119894
119899lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14
(8)
where ℎ(0)119894
represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive
antenna 119889119894is data symbol at 119894th subcarrier and 119899
119894is the
noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (9)
where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2
channel matrix d119894is 2 times 1 complex signal vector and n
119894is
2 times 1 noise vector The objective is to detect the elements119889119894and 119889
119894+1of the data vector d
119894 Assuming that the elements
of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z
119894is given
byz119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (10)
whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as
[119911119894
119911lowast
119894+1
] = [ℎlowast(0)
119894ℎ(1)
119894+1
minusℎlowast(1)
119894ℎ(0)
119894+1
] [119910119894
119910lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14(11)
The elements of decoder output are calculated as
119911119894= ℎlowast(0)
119894119910119894+ ℎ(1)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
119911lowast
i+1 = minusℎlowast(1)
119894119910119894+ ℎ(0)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
(12)
The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Choose
Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]
e[30]Out1[310]Out2[310]Out3[310]Out4[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Figure 18 RTL schematic for combined PCFICH architecture with diversity
Table 2 Performance of proposed architectures based on folding and superscalar method
Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power
SISODirect 125 245 125 5479 39081 25587 mdash
Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93
SIMODirect 250 494 250 10942 40278 24827 mdash
Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170
MISODirect 224 580 594 14880 43023 23243 mdash
Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208
MIMODirect 320 844 675 17380 56962 17555 mdash
Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
6 VLSI Design
PB-4
Detected CFI
PB-1
PB-2
PB-3
RDB-1
CFI detector
RDB
RDB-2
RDB-3
RDB-4
16 times 1 channel frequencyresponse vector
16 times 1 received signalvector y
16 times 1 precomputed signalvector s1
16 times 1 precomputed signalvector s2
16 times 1 precomputed signalvector s3
16 times 1 precomputed signalvector s4
h(0)h(1)
z
z(1)
z(2)
z(3)
z(4)
r1
r2
r3r4
Figure 7 Proposed MISO receiver architecture for PCFICH
16 times 1 channel frequency response vectors h(0) and h(1)
16 times 1 received signalvector y
h(0)lowast0 h(0)1 middot middot middot h(0)lowast14 h(0)15 h(1)lowast0 h(1)1 middot middot middot h(1)lowast14 h(1)15
C
C
y0Ylowast1
y4
Ylowast15
z0
z1
z14
z15
Figure 8 Internal architecture of MISO receiver decoding block (RDB)
the SFBC in the LTE-A standard The precoder output atantenna port 0 (119860
0) and antenna port 1 (119860
1) is shown in
Figure 6The notation ldquolowastrdquo represents the complex conjugate of the
symbol Basically in precoding a symbol 1198890from layer 0 and
a symbol 1198891from layer 1 are encoded such that the antenna
output is formulated using the orthogonal matrix given by
119860 = [11988901198891
minus119889lowast
1119889lowast
0
] (7)
VLSI Design 7
Precomputed data
Output vector from RDB1
Im
Re
Im
Re
Im
Re
Im
Re
[middot]2
[middot]2
[middot]2
[middot]2 Output r1
minus
minus
minus
minus
z(1)0
z(1)1
z(1)14
z(1)15
z0 z15z1 middot middot middot z14
Figure 9 Internal architecture of MISO processing block (PB-1)
16 times 1 channel frequency response vectorsh(00)h(01)h(10)h(11)
16 times 1 received signalvector y (0)
y (1)
16 times 1 precomputedsignal vector s(0)1
s(1)1
16 times 1 precomputedsignal vector s(0)2
s(1)2
16 times 1 precomputedsignal vector s(0)3
s(1)3
16 times 1 precomputedsignal vector s(0)4
s(1)4
r1
r2
r3
r4
z
z
(1)
z(2)
z(3)
z(4)
RDBM
RDBM-1
RDBM-2
RDBM-3
RDBM-4
PB-1
PB-2
PB-3
PB-4
CFIdetector
Detected CFI
Figure 10 Proposed MIMO receiver architecture for PCFICH
8 VLSI Design
16 times 1 receivedsignal vector
Y(0)0
Y(0)1
Y(1)lowast0
Y(1)lowast1
Y(0)14
Y(0)15
Y(1)lowast14
Y(1)lowast15
h(00)lowast0 minush(00)1 h(00)lowast14 minush(00)15 h(01)0 h(01)lowast15 h(10)0 minush(10)lowast1 h(10)14 minush(10)lowast15 minush(11)lowast0 h(11)1 minush(11)lowast14 h(11)15
Z0
Z1
Z14
Z15C
C
16 times 1 channel frequency response vectors h(00) h(01) h(10) and h(11)
h(01)lowast1 middot middot middot h(01)
14
Figure 11 Internal architecture receiver decoding block-MIMO (RDBM)
Re(y )
Im(y )
Re(h)
Im(h)
e
f
Figure 12 Multiplicands rearrangement for a single complexmultiplication block
This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as
[119910119894
119910lowast
119894+1
] = [ℎ(0)
119894minusℎ(1)
119894
ℎlowast(1)
119894+1ℎlowast(0)
119894+1
] [119889119894
119889lowast
119894+1
] + [119899119894
119899lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14
(8)
where ℎ(0)119894
represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive
antenna 119889119894is data symbol at 119894th subcarrier and 119899
119894is the
noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (9)
where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2
channel matrix d119894is 2 times 1 complex signal vector and n
119894is
2 times 1 noise vector The objective is to detect the elements119889119894and 119889
119894+1of the data vector d
119894 Assuming that the elements
of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z
119894is given
byz119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (10)
whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as
[119911119894
119911lowast
119894+1
] = [ℎlowast(0)
119894ℎ(1)
119894+1
minusℎlowast(1)
119894ℎ(0)
119894+1
] [119910119894
119910lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14(11)
The elements of decoder output are calculated as
119911119894= ℎlowast(0)
119894119910119894+ ℎ(1)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
119911lowast
i+1 = minusℎlowast(1)
119894119910119894+ ℎ(0)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
(12)
The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Choose
Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]
e[30]Out1[310]Out2[310]Out3[310]Out4[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Figure 18 RTL schematic for combined PCFICH architecture with diversity
Table 2 Performance of proposed architectures based on folding and superscalar method
Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power
SISODirect 125 245 125 5479 39081 25587 mdash
Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93
SIMODirect 250 494 250 10942 40278 24827 mdash
Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170
MISODirect 224 580 594 14880 43023 23243 mdash
Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208
MIMODirect 320 844 675 17380 56962 17555 mdash
Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
VLSI Design 7
Precomputed data
Output vector from RDB1
Im
Re
Im
Re
Im
Re
Im
Re
[middot]2
[middot]2
[middot]2
[middot]2 Output r1
minus
minus
minus
minus
z(1)0
z(1)1
z(1)14
z(1)15
z0 z15z1 middot middot middot z14
Figure 9 Internal architecture of MISO processing block (PB-1)
16 times 1 channel frequency response vectorsh(00)h(01)h(10)h(11)
16 times 1 received signalvector y (0)
y (1)
16 times 1 precomputedsignal vector s(0)1
s(1)1
16 times 1 precomputedsignal vector s(0)2
s(1)2
16 times 1 precomputedsignal vector s(0)3
s(1)3
16 times 1 precomputedsignal vector s(0)4
s(1)4
r1
r2
r3
r4
z
z
(1)
z(2)
z(3)
z(4)
RDBM
RDBM-1
RDBM-2
RDBM-3
RDBM-4
PB-1
PB-2
PB-3
PB-4
CFIdetector
Detected CFI
Figure 10 Proposed MIMO receiver architecture for PCFICH
8 VLSI Design
16 times 1 receivedsignal vector
Y(0)0
Y(0)1
Y(1)lowast0
Y(1)lowast1
Y(0)14
Y(0)15
Y(1)lowast14
Y(1)lowast15
h(00)lowast0 minush(00)1 h(00)lowast14 minush(00)15 h(01)0 h(01)lowast15 h(10)0 minush(10)lowast1 h(10)14 minush(10)lowast15 minush(11)lowast0 h(11)1 minush(11)lowast14 h(11)15
Z0
Z1
Z14
Z15C
C
16 times 1 channel frequency response vectors h(00) h(01) h(10) and h(11)
h(01)lowast1 middot middot middot h(01)
14
Figure 11 Internal architecture receiver decoding block-MIMO (RDBM)
Re(y )
Im(y )
Re(h)
Im(h)
e
f
Figure 12 Multiplicands rearrangement for a single complexmultiplication block
This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as
[119910119894
119910lowast
119894+1
] = [ℎ(0)
119894minusℎ(1)
119894
ℎlowast(1)
119894+1ℎlowast(0)
119894+1
] [119889119894
119889lowast
119894+1
] + [119899119894
119899lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14
(8)
where ℎ(0)119894
represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive
antenna 119889119894is data symbol at 119894th subcarrier and 119899
119894is the
noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (9)
where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2
channel matrix d119894is 2 times 1 complex signal vector and n
119894is
2 times 1 noise vector The objective is to detect the elements119889119894and 119889
119894+1of the data vector d
119894 Assuming that the elements
of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z
119894is given
byz119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (10)
whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as
[119911119894
119911lowast
119894+1
] = [ℎlowast(0)
119894ℎ(1)
119894+1
minusℎlowast(1)
119894ℎ(0)
119894+1
] [119910119894
119910lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14(11)
The elements of decoder output are calculated as
119911119894= ℎlowast(0)
119894119910119894+ ℎ(1)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
119911lowast
i+1 = minusℎlowast(1)
119894119910119894+ ℎ(0)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
(12)
The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Choose
Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]
e[30]Out1[310]Out2[310]Out3[310]Out4[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Figure 18 RTL schematic for combined PCFICH architecture with diversity
Table 2 Performance of proposed architectures based on folding and superscalar method
Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power
SISODirect 125 245 125 5479 39081 25587 mdash
Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93
SIMODirect 250 494 250 10942 40278 24827 mdash
Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170
MISODirect 224 580 594 14880 43023 23243 mdash
Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208
MIMODirect 320 844 675 17380 56962 17555 mdash
Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
8 VLSI Design
16 times 1 receivedsignal vector
Y(0)0
Y(0)1
Y(1)lowast0
Y(1)lowast1
Y(0)14
Y(0)15
Y(1)lowast14
Y(1)lowast15
h(00)lowast0 minush(00)1 h(00)lowast14 minush(00)15 h(01)0 h(01)lowast15 h(10)0 minush(10)lowast1 h(10)14 minush(10)lowast15 minush(11)lowast0 h(11)1 minush(11)lowast14 h(11)15
Z0
Z1
Z14
Z15C
C
16 times 1 channel frequency response vectors h(00) h(01) h(10) and h(11)
h(01)lowast1 middot middot middot h(01)
14
Figure 11 Internal architecture receiver decoding block-MIMO (RDBM)
Re(y )
Im(y )
Re(h)
Im(h)
e
f
Figure 12 Multiplicands rearrangement for a single complexmultiplication block
This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as
[119910119894
119910lowast
119894+1
] = [ℎ(0)
119894minusℎ(1)
119894
ℎlowast(1)
119894+1ℎlowast(0)
119894+1
] [119889119894
119889lowast
119894+1
] + [119899119894
119899lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14
(8)
where ℎ(0)119894
represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive
antenna 119889119894is data symbol at 119894th subcarrier and 119899
119894is the
noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (9)
where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2
channel matrix d119894is 2 times 1 complex signal vector and n
119894is
2 times 1 noise vector The objective is to detect the elements119889119894and 119889
119894+1of the data vector d
119894 Assuming that the elements
of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z
119894is given
byz119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (10)
whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as
[119911119894
119911lowast
119894+1
] = [ℎlowast(0)
119894ℎ(1)
119894+1
minusℎlowast(1)
119894ℎ(0)
119894+1
] [119910119894
119910lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14(11)
The elements of decoder output are calculated as
119911119894= ℎlowast(0)
119894119910119894+ ℎ(1)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
119911lowast
i+1 = minusℎlowast(1)
119894119910119894+ ℎ(0)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
(12)
The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Choose
Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]
e[30]Out1[310]Out2[310]Out3[310]Out4[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Figure 18 RTL schematic for combined PCFICH architecture with diversity
Table 2 Performance of proposed architectures based on folding and superscalar method
Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power
SISODirect 125 245 125 5479 39081 25587 mdash
Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93
SIMODirect 250 494 250 10942 40278 24827 mdash
Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170
MISODirect 224 580 594 14880 43023 23243 mdash
Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208
MIMODirect 320 844 675 17380 56962 17555 mdash
Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
z119894= H119867eff119894 y119894 for 119894 = 0 2 4 6 8 10 12 14 (15)
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Choose
Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]
e[30]Out1[310]Out2[310]Out3[310]Out4[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Figure 18 RTL schematic for combined PCFICH architecture with diversity
Table 2 Performance of proposed architectures based on folding and superscalar method
Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power
SISODirect 125 245 125 5479 39081 25587 mdash
Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93
SIMODirect 250 494 250 10942 40278 24827 mdash
Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170
MISODirect 224 580 594 14880 43023 23243 mdash
Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208
MIMODirect 320 844 675 17380 56962 17555 mdash
Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Choose
Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]
e[30]Out1[310]Out2[310]Out3[310]Out4[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Figure 18 RTL schematic for combined PCFICH architecture with diversity
Table 2 Performance of proposed architectures based on folding and superscalar method
Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power
SISODirect 125 245 125 5479 39081 25587 mdash
Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93
SIMODirect 250 494 250 10942 40278 24827 mdash
Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170
MISODirect 224 580 594 14880 43023 23243 mdash
Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208
MIMODirect 320 844 675 17380 56962 17555 mdash
Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Choose
Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]
e[30]Out1[310]Out2[310]Out3[310]Out4[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Figure 18 RTL schematic for combined PCFICH architecture with diversity
Table 2 Performance of proposed architectures based on folding and superscalar method
Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power
SISODirect 125 245 125 5479 39081 25587 mdash
Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93
SIMODirect 250 494 250 10942 40278 24827 mdash
Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170
MISODirect 224 580 594 14880 43023 23243 mdash
Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208
MIMODirect 320 844 675 17380 56962 17555 mdash
Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Choose
Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]
e[30]Out1[310]Out2[310]Out3[310]Out4[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Figure 18 RTL schematic for combined PCFICH architecture with diversity
Table 2 Performance of proposed architectures based on folding and superscalar method
Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power
SISODirect 125 245 125 5479 39081 25587 mdash
Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93
SIMODirect 250 494 250 10942 40278 24827 mdash
Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170
MISODirect 224 580 594 14880 43023 23243 mdash
Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208
MIMODirect 320 844 675 17380 56962 17555 mdash
Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Choose
Count CFI11[490]Count CFI12[490]Count CFI13[490]Count CFI14[490]Count CFI21[490]Count CFI22[490]Count CFI23[490]Count CFI24[490]Count CFI31[490]Count CFI32[490]Count CFI33[490]Count CFI34[490]Count CFI41[490]Count CFI42[490]Count CFI43[490]Count CFI44[490]
e[30]Out1[310]Out2[310]Out3[310]Out4[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310]
Figure 18 RTL schematic for combined PCFICH architecture with diversity
Table 2 Performance of proposed architectures based on folding and superscalar method
Diversity Method Multipliers Adders DSP elements LUTs Total delay (ns) Speed (MHz) Dynamic power
SISODirect 125 245 125 5479 39081 25587 mdash
Folding (16T) 8 81 16 2561 114448 8737 84Superscalar (4T) 32 182 66 2731 69333 14423 93
SIMODirect 250 494 250 10942 40278 24827 mdash
Folding (16T) 16 165 32 5117 130704 7651 159Superscalar (4T) 64 318 130 5623 53873 18562 170
MISODirect 224 580 594 14880 43023 23243 mdash
Folding (16T) 14 101 39 3950 255264 3917 173Superscalar (6T) 96 338 196 6156 80495 12423 208
MIMODirect 320 844 675 17380 56962 17555 mdash
Folding (16T) 20 155 46 4395 256528 3898 374Superscalar (6T) 160 465 262 6932 85822 11652 382
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of