Synchronization in Wireless Communicationsdownloads.hindawi.com › journals › specialissues › 310602.pdfContents SynchronizationinWirelessCommunications, Heidi Steendam, Mounir

EURASIP Journal on Wireless Communications and Networking

Synchronization in Wireless Communications

Guest Editors: Heidi Steendam, Mounir Ghogho, Marco Luise, Erdal Panayirci, and Erchin Serpedin


EURASIP Journal onWireless Communications and Networking


Guest Editors: Heidi Steendam, Mounir Ghogho, Marco Luise,Erdal Panayirci, and Erchin Serpedin

Copyright © 2009 Hindawi Publishing Corporation. All rights reserved.

This is a special issue published in volume 2009 of “EURASIP Journal on Wireless Communications and Networking.” All articles areopen access articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

Editor-in-ChiefLuc Vandendorpe, Universite catholique de Louvain, Belgium

Associate Editors

Thushara Abhayapala, AustraliaMohamed H. Ahmed, CanadaFarid Ahmed, USACarles Anton-Haro, SpainAnthony C. Boucouvalas, GreeceLin Cai, CanadaYuh-Shyan Chen, TaiwanPascal Chevalier, FranceChia-Chin Chong, South KoreaSoura Dasgupta, USAIbrahim Develi, TurkeyPetar M. Djuric, USAMischa Dohler, SpainAbraham O. Fapojuwo, CanadaMichael Gastpar, USAAlex Gershman, GermanyWolfgang Gerstacker, GermanyDavid Gesbert, FranceFary Ghassemlooy, UK

Christian Hartmann, GermanyStefan Kaiser, GermanyGeorge K. Karagiannidis, GreeceChi Chung Ko, SingaporeVisa Koivunen, FinlandNicholas Kolokotronis, GreeceRichard Kozick, USASangarapillai Lambotharan, UKVincent Lau, Hong KongDavid I. Laurenson, UKTho Le-Ngoc, CanadaWei Li, USATongtong Li, USAZhiqiang Liu, USASteve McLaughlin, UKSudip Misra, IndiaIngrid Moerman, BelgiumMarc Moonen, BelgiumEric Moulines, France

Sayandev Mukherjee, USAKameswara Rao Namuduri, USAAmiya Nayak, CanadaClaude Oestges, BelgiumA. Pandharipande, The NetherlandsPhillip Regalia, FranceA. Lee Swindlehurst, USAGeorge S. Tombras, GreeceLang Tong, USAAthanasios Vasilakos, GreecePing Wang, CanadaWeidong Xiang, USAYang Xiao, USAXueshi Yang, USALawrence Yeung, Hong KongDongmei Zhao, CanadaWeihua Zhuang, Canada

Contents

Synchronization in Wireless Communications, Heidi Steendam, Mounir Ghogho, Marco Luise,Erdal Panayirci, and Erchin SerpedinVolume 2009, Article ID 568369, 3 pages

Robust Frame Synchronization for Low Signal-to-Noise Ratio Channels Using Energy-CorrectedDifferential Correlation, Dong-Uk Lee, Pansoo Kim, and Wonjin SungVolume 2009, Article ID 345989, 8 pages

Feedforward Data-Aided Phase Noise Estimation from a DCT Basis Expansion, Jabran Bhattiand Marc MoeneclaeyVolume 2009, Article ID 568570, 11 pages

Monte Carlo Solutions for Blind Phase Noise Estimation, Frederik Simoens, Dieter Duyck, Hakan Cırpan,Erdal Panayırcı, and Marc MoeneclaeyVolume 2009, Article ID 296028, 11 pages

Digital Receiver Design for Transmitted Reference Ultra-Wideband Systems, Yiyin Wang, Geert Leus,and Alle-Jan van der VeenVolume 2009, Article ID 315264, 17 pages

Autocorrelation Properties of OFDM Timing Synchronization Waveforms Employing Pilot Subcarriers,Oktay Ureten and Selcuk TasscıogluVolume 2009, Article ID 538978, 14 pages

Time and Frequency Synchronisation in 4G OFDM Systems, Adrian LangowskiVolume 2009, Article ID 641292, 9 pages

Impact of Carrier Frequency Offsets on Block-IFDMA Systems, E. P. Simon, V. Degardin, and M. LienardVolume 2009, Article ID 483128, 7 pages

Effects of Carrier Frequency Offset, Timing Offset, and Channel Spread Factor on the Performance ofHexagonal Multicarrier Modulation Systems, Kui Xu and Yuehong ShenVolume 2009, Article ID 802425, 8 pages

Multiple CFOs in OFDM-SDMA Uplink: Interference Analysis and Compensation, Malte Schellmannand Volker JungnickelVolume 2009, Article ID 909075, 14 pages

A Practical Scheme for Frequency Offset Estimation in MIMO-OFDM Systems, Michele Morelli,Marco Moretti, and Giuseppe ImbarlinaVolume 2009, Article ID 821819, 9 pages

Estimation of CFO and Channels in Phase-Shift Orthogonal Pilot-Aided OFDM Systems withTransmitter Diversity, Carlos Ribeiro and Atılio GameiroVolume 2009, Article ID 436756, 10 pages

Turbo Processing for Joint Channel Estimation, Synchronization, and Decoding in CodedMIMO-OFDM Systems, Hung Nguyen-Le, Tho Le-Ngoc, and Chi Chung KoVolume 2009, Article ID 206524, 12 pages

Biologically Inspired Intercellular Slot Synchronization, Alexander Tyrrell and Gunther AuerVolume 2009, Article ID 854087, 12 pages

Discrete-Time Second-Order Distributed Consensus Time Synchronization Algorithm for WirelessSensor Networks, Gang Xiong and Shalinee KishoreVolume 2009, Article ID 623537, 12 pages

Hindawi Publishing CorporationEURASIP Journal on Wireless Communications and NetworkingVolume 2009, Article ID 568369, 3 pagesdoi:10.1155/2009/568369

Editorial


Heidi Steendam,1 Mounir Ghogho,2 Marco Luise (EURASIP Member),3 Erdal Panayirci,4

and Erchin Serpedin (EURASIP Member)5

1 Department of Telecommunications and Information Processing, Ghent University, 9000 Gent, Belgium2 School of Electronic and Electrical Engineering, Leeds University, Leeds LS2 9JT, UK3 Department of Information Engineering, University of Pisa, 56122 Pisa, Italy4 Department of Electronics Engineering, Kadir Has University, 34083 Istanbul, Turkey5 Department of Electrical Engineering, Texas A&M University, College Station, TX 77840, USA

Correspondence should be addressed to Heidi Steendam, [email protected]

Received 26 March 2009; Accepted 26 March 2009

Copyright © 2009 Heidi Steendam et al. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

The last decade has witnessed an immense increase ofwireless communications services in order to keep pace withthe ever increasing demand for higher data rates combinedwith higher mobility. To satisfy this demand for higherdata rates, the throughput over the existing transmissionmedia had to be increased. Several techniques were proposedto boost up the data rate: multicarrier systems to combatselective fading, ultra-wideband (UWB) communicationssystems to share the spectrum with other users, MIMOtransmissions to increase the capacity of wireless links,iteratively decodable codes (e.g., turbo codes and LDPCcodes) to improve the quality of the link, cognitive radios,and so forth.

To function properly, the receiver must synchronize withthe incoming signal. The accuracy of the synchronizationwill determine whether the communication system is ableto perform well. The receiver needs to determine at whichtime instants the incoming signal has to be sampled (timingsynchronization). In addition, for bandpass communica-tions, the receiver needs to adapt the frequency and phaseof its local carrier oscillator with those of the received signal(carrier synchronization). However, most of the existingcommunication systems operate under hostile conditions:low SNR, strong fading, and (multiuser) interference, whichmakes the acquisition of the synchronization parametersburdensome. Therefore, synchronization is considered ingeneral as a challenging task.

The objective of this special issue (whose preparationwas also carried out under the auspices of the EC Network

of Excellence in Wireless Communications NEWCOM++)was to gather recent advances in the area of synchronizationof wireless systems, spanning from theoretical analysis ofsynchronization schemes to practical implementation issues,from optimal synchronizers to low-complexity ad hoc syn-chronizers.

In this overview of the topics that are addressed in thisspecial issue, we first consider narrowband single-carriersystems, where narrow band means that the RF bandwidth ofthe system is comparable with the symbol transmission rateof the link. This is, for example, typical for a satellite link. Inthe paper by Lee et al. the frame synchronization problem ina DVB-S2 link was investigated. The link works at low SNRand uses forward error correction for data detection. Further,the incoming signal is disturbed by a large clock frequencyoffset. Under these hostile circumstances, the traditionalcorrelation method, that looks for the synchronizationsequence available in the frame header to obtain framesynchronization, gives rise to poor performance. To solve thisproblem, and to make the frame synchronizer more robust,the authors modify the correlation-based estimator with anadditional correction term depending on the signal energy.

Besides of time synchronization, phase estimation ofthe RF carrier used for transmission is also crucial forcoherent detection. However, in mass production, to keepthe cost of the devices as low as possible, cheap oscillators areused. These low-cost oscillators inherently have instabilities,causing random perturbations in the phase. The resultingphase noise causes a degradation of the system performance.

2 EURASIP Journal on Wireless Communications and Networking

This phase noise can be tracked by feedback algorithms,like the phase-locked loop, but these algorithms give riseto long transients, such that they are not suitable for bursttransmissions. In the paper by Bhatti and Moeneclaey, afeedforward algorithm is proposed where the phase noiseis decomposed into its spectral components using a DCTtransform. The phase noise is estimated from pilots bydetermining a few of these DCT coefficients. The paper ofSimoens et al. tackles the phase noise problem in a differentway. The authors start from the optimal joint estimationof the unknown data and the phase noise. The unknowndistribution of the phase noise, needed for this estimation,is obtained in a probabilistic way by applying Monte Carlomethods. Although several approximations are made toreduce the complexity of the algorithm, its performance isclose to optimal, both for uncoded and coded systems.

In contrast with narrowband systems, ultra-widebandcommunication occupies a bandwidth that is much largerthan the transmission rate. The data is modulated on veryshort pulses, making timing synchronization a complicatedtask. In the paper by Wang et al. a pilot-aided two-stagesynchronization strategy is proposed. In the first stage,sample-level timing is obtained together with an estimateof the channel, and in the second stage, symbol-levelsynchronization is pursued by looking for the header.

Next, we shift our attention to multicarrier-basedbroadband transmission systems. Multicarrier modulation isknown to be robust to frequency selective channels. However,they are also highly sensitive to carrier frequency offsets,coming, for example, from Doppler shifts, and to phasenoise. To have tolerable BER performance degradation, thecarrier frequency offset must be sufficiently smaller thanthe carrier spacing of the multicarrier system, which inturn is (because of the large number of carriers that istypically modulated) much smaller than the bandwidthof the multicarrier system. Several of the papers in thisspecial issue indeed deal with this crucial carrier frequencysynchronization but let us first start with the paper fromUreten and Tasıoglu, which is concerned with the design oftiming synchronization waveforms. To avoid the overhead ofa separate synchronization sequence, a system is consideredwhere the pilots are embedded in the frequency domainby replacing some of the data carriers by pilot tones. Theauthors consider both uniform and nonuniform positioningof the pilot tones. With the uniform positioning, the designof the synchronization waveform, that is obtained by con-sidering the time domain signal corresponding to the pilottones, is simple and easy to analyze. However, because of thelarge-side lobes in the autocorrelation function related to thissynchronization waveform, the timing synchronization willsuffer from ambiguities. With the non-uniform positioning,the synchronization waveform becomes aperiodic, such thatthe autocorrelation function has lower sidelobes and thusresults in more precise timing synchronization.

Also the paper by Langowski deals with the design of pilotsequences, although in contrast with the previous paper, thepilot sequence is transmitted as a preamble to the data signal.The author proposes a pilot sequence that is symmetric in thetime domain and derive an algorithm that is not only able

to obtain the coarse timing estimate, but also the fractionalfrequency offset with respect to the carrier spacing. Therobustness of the proposed algorithm to a frequency selectivechannel was one of the main concerns of the author. After theinitial synchronization based on the pilot sequence, trackingis achieved with a newly designed nondata aided algorithm.

Not only synchronization for standard multicarrier tech-niques are considered, also several variants of the multi-carrier technique are studied. Block interleaved frequencydivision multiple access (B-IFDMA) is a variation of theOFDMA technique. In IFDMA, compression and repetitionare applied on the data and different users are assigneddifferent chip sequences. Before modulating the chips on thecarriers, chip interleaving is applied. Therefore, IFDMA canbe regarded as unitary precoded OFDMA with interleavedsubcarriers. On the other hand, IFDMA can also be seen as avariant on the CDMA technique with orthogonal signaturesequences. Similarly as OFDMA, this IFDMA techniqueturns out to be very sensitive to carrier frequency offsets. Tomake the technique more robust to carrier frequency offsets,the data of a user is transmitted on blocks of subcarriers thatare equidistantly distributed over the available bandwidth,resulting in B-IFDMA. The paper by Simon et al. investigatesthe sensitivity of two variants of the B-IFDMA system, that is,joint DFT B-IFDMA and added-signal B-IFDMA, to carrierfrequency offsets.

Another variant on the multicarrier technique is hexag-onal multicarrier modulation. In this technique, the carrierfrequencies in odd time slots are shifted over half a carrierspacing as compared to the carrier frequencies in the eventime slots. The positions of the carriers in the time-frequencydomain can therefore be considered as lying on a hexagonallattice, in contrast to the rectangular lattice of standardmulticarrier modulation. The analysis of the sensitivity tocarrier frequency offset, timing offset, and a frequencyselective channel in the paper by Xu and Shen shows thathexagonal multicarrier modulation is more robust to theseimpairments than standard multicarrier modulation.

During the last ten years, researchers have put largeefforts in increasing the capacity of wireless systems byequipping devices with more than one antenna-element,resulting in a multiple input multiple output (MIMO)system. By relying on spatial multiplexing, the number ofusers increases with the number of antenna-elements. Alter-natively, one can choose to exploit the spatial diversity of theMIMO channel by using space-time codes, which introduceredundancy in both the spatial and the time domain toincrease the reliability of the transmission link. When MIMOsystems are used in frequency selective channels, OFDMis considered as the transmission technique of preference,because it facilitates the equalization process. Of course, it isobvious that synchronization in MIMO systems is even morecomplex than in single-antenna systems, as the number ofsynchronization parameters to be estimated increases withthe number of antennas.

In the paper by Schellmann and Jungnickel, a spacial-division multiple access (SDMA) technique is considered incombination with OFDM. In the uplink, the multiantennabasestation receives the signals from the different users,

EURASIP Journal on Wireless Communications and Networking 3

transmitted on the same frequency resources. As these signalsare generated by the carrier oscillators from the differentusers, each signal is affected by a different carrier frequencyoffset, impairing the orthogonality between the differentusers. The authors analyze the effect of the carrier frequencyoffsets on the performance. Assuming coarse carrier fre-quency synchronization is obtained by using the informationfrom the downlink signal, a low-complexity compensationtechnique for fine carrier frequency synchronization in theuplink is proposed.

Many of the algorithms in the literature for synchro-nization are based on ad hoc methods. Although maximumlikelihood (ML) estimation methods will give rise to betterperformance than ad hoc algorithms and can perform closerto the theoretical Cramer Rao lower bound on the meansquared error, their complexity is typically much higher.However, approximations on the ML method offer good sub-optimal algorithms. In the paper by Morelli et al. the pilotsubcarriers are selected such that the training sequences havea repetitive structure in the time domain. A low-complexityfrequency offset estimation algorithm is proposed, where theinteger part (with respect to the carrier spacing) of the carrierfrequency offset is estimated based on an approximationof the ML method, whereas the fractional frequency offsetestimate is obtained from a correlation-based approach.

In the paper of Ribeiro and Gameiro, a similar problemis tackled. The pilot symbols are regularly spread overthe OFDM symbols to be able to estimate the channelcoefficients between the different transmit and receiveantennas. To minimize the pilot overhead, the same pilotsubcarriers are used for the different transmit antennas.The pilot symbols per transmit antenna are phase-shifted toreduce the amount of cochannel interference. Based on thispilot structure, the authors propose an algorithm to jointlyestimate the CFO and the channel.

In the two previous papers, pilot tones were embeddedin the multicarrier signal to estimate the channel and CFOin a data-aided way. In the paper by Nguyen-Le et al.,an algorithm to jointly estimate the CFO, timing, andchannel impulse response is discussed for turbo-coded bursttransmission. The estimates are obtained iteratively in asoft decision-directed way, where information is exchangedbetween the joint estimator and the turbo decoder. No pilotsare transmitted during the data segment, but a preamblecontaining pilots is added to derive initial estimates.

As a last item, we consider timing synchronization innetworks. When the timing in the different cells of a cellularnetwork is aligned to a common reference instant, thethroughput is increased as compared to an asynchronousnetwork. This slot synchronization can be obtained by usingthe global positioning system (GPS) to acquire a referenceclock, or to use the backbone connection. Both methodshave drawbacks: the first method needs a GPS receiver ateach basestation, and the second one does not providesufficiently accuracy. The paper by Tyrrell and Auer describesa decentralized solution to obtain slot synchronization, asolution that is based on synchronization in biologicalsystems. In this method, two synchronization words are usedto synchronize: one transmitted by the basestations, and one

transmitted by the user stations, and each group helps theother to synchronize. Even when the basestations are locatedhundreds of kilometers apart, introducing large propagationdelays, the decentralized slot synchronizer is able to obtain atiming accuracy of a fraction of the propagation delay.

The paper by Xiong and Kishore considers global timesynchronization in wireless sensor networks. One class ofalgorithms that is used for this time synchronization is thedistributed consensus time synchronization method, wherea global consensus is obtained by averaging the pairwiselocal time information in the different network nodes. Inmost algorithms, only the current timing information isconsidered, resulting in a first-order system. The paper in thisspecial issue extends the first-order system to a second-ordersystem, where also the timing from the previous iterationis taken into account, resulting in a faster convergence andhigher accuracy than a first-order system.

As a conclusion of this Editorial, we would like to expressour appreciation to the efforts of the authors, who haveenthusiastically responded to the call for papers, and thereviewers, who helped us to select the papers in this specialissue. Without them, this special issue would have neverexisted. We hope that this special issue helps the reader tohave a better idea of the current issues in synchronization forwireless systems. The topics of this special issue cover a broadrange of applications; they can stimulate improvements inpresent transmission systems and can help in the realizationof future ones. As the transmission systems have becomemore and more complex as compared to 20 years ago, alsothe synchronization algorithms have grown more complexand diverse. This trend has introduced the expectation thatthe next 20 years, research on synchronization will be assuccessful as today.

Heidi SteendamMounir Ghogho

Marco LuiseErdal PanayirciErchin Serpedin


Research Article

Robust Frame Synchronization for Low Signal-to-Noise RatioChannels Using Energy-Corrected Differential Correlation

Dong-Uk Lee,1 Pansoo Kim,2 and Wonjin Sung1

1 Department of Electronic Engineering, Sogang University, Seoul 121-742, South Korea2 Satellite & Wireless Convergence Research Department, Electronics and Telecommunications Research Institute,Daejeon 305-700, South Korea

Correspondence should be addressed to Wonjin Sung, [email protected]

Received 1 July 2008; Revised 21 December 2008; Accepted 1 February 2009

Recommended by Marco Luise

Recent standards for wireless transmission require reliable synchronization for channels with low signal-to-noise ratio (SNR) aswell as with a large amount of frequency offset, which necessitates a robust correlator structure for the initial frame synchronizationprocess. In this paper, a new correlation strategy especially targeted for low SNR regions is proposed and its performance isanalyzed. By utilizing a modified energy correction term, the proposed method effectively reduces the variance of the decisionvariable to enhance the detection performance. Most importantly, the method is demonstrated to outperform all previouslyreported schemes by a significant margin, for SNRs below 5 dB regardless of the existence of the frequency offsets. A variationof the proposed method is also presented for further enhancement over the channels with small frequency errors. The particularapplication considered for the performance verification is the second generation digital video broadcasting system for satellites(DVB-S2).

Copyright © 2009 Dong-Uk Lee et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

Reliable synchronization is one of the key factors deter-mining the transmission performance in communicationchannels, and various schemes for time, frequency, and phaseestimation for imperfect communication links have beenproposed and implemented. Although time and frequencyestimation can be jointly performed at an increased com-plexity, frequency synchronization is usually preceded bythe symbol and frame synchronization. A classic result oncoherent detection for the frame synchronization has beenpresented by Massey [1], which suggests the usage of adata correction term for the optimal maximum likelihood(ML) statistics. The result has subsequently been verified,extended to specific modulation schemes, and approximatedto suboptimal solutions [2–5]. In many practical receivers,the suboptimal metrics become preferred choices due to theirsimplicity and reasonable performance. The approximationof ML statistics using the low SNR assumption leads to asimplified computation of the correction term in the formof received signal energy [1, 2, 5].

While these coherent detection schemes provide opti-mal or near-optimal performance in the static additivewhite Gaussian noise (AWGN) channel, a performanceloss is experienced when the frequency error exists in thechannel. Under such circumstances, differential correlationmetrics provide robustness to frequency and phase errors.In particular, detection methods in [6] are derived fromthe approximated ML metrics, and give a lower detectionerror probability than other known schemes. Reduced-complexity schemes are also extensively studied [7–10].Differential postdetection integration (DPDI) techniques [8]are shown to provide a good performance-complexity trade-off, and generalized DPDI including average postdetectionintegration (APDI) schemes has also been proposed andanalyzed [9, 10].

On the other hand, the requirements for the initialsynchronization performance are becoming more stringent.As advanced transmission schemes including efficient mod-ulation and powerful error-correction coding are devel-oped, target operating SNRs tend to decrease to allowdata transmission even in hostile channel environment


and to maximize the bandwidth usage. The recent DVB-S2 standard [11, 12] adopted the low-density parity-check(LDPC) coding for adaptive coding and modulation (ACM),which includes high-density amplitude phase shift keying(APSK) as well as conventional phase shift keying (PSK).These techniques lowered the minimum operating SNRdown to −2.35 dB for the lowest ACM level. Since initialsynchronization should reliably be performed for all ACMlevels, it is important to verify the detector performance atthis low SNR range.

In this paper, we propose detection strategies for robustframe synchronization under the effects of severe noise andfrequency offsets and verify the corresponding performance.The proposed detector is constructed via appropriate adjust-ments of the correction term, and variational methods forfurther performance enhancement are also suggested. It isdemonstrated that the methods provide a substantial gainover existing schemes for the SNR range of interest. The orga-nization of the paper is as follows. The signal model, framestructure, and channel conditions used for performanceevaluation are introduced in Section 2. In Section 3, a briefreview and comparison of existing decision metrics for theframe synchronization are given, and the proposed method ispresented. In Section 4, we discuss properties and parameteroptimization issues of the proposed method. The detectionperformance is evaluated for different channel conditions toquantify the gain in Section 5, and the conclusions are givenin Section 6.

2. Signal Model

We consider successive transmission of frames of length Lover the AWGN channel, and each frame includes a synchro-nization sequence of length N in its header. Figure 1 showsthe frame structure for the DVB-S2 standard. The frameconsists of the physical layer header (PL Header) followedby the forward error correction frame (FEC Frame), whichis an LDPC encoded sequence of payload data symbols. ThePL Header includes the synchronization sequence named asthe Start-of-Frame (SoF), consisting of N = 26 symbols, andthe physical layer signaling code (PLSC) used for the frame-specific information indication such as the modulation type,coding rate, and the length of the frame. The PL Headersymbols are modulated using π/2-binary PSK, whereas theFEC Frame symbols are modulated by PSK or APSK. Morespecifically, one of quadrature PSK (QPSK), 8-PSK, 16-APSK, or 32-APSK modulation schemes is used based on thetransmission channel condition.

Assuming perfect symbol-time synchronization is per-formed at the receiver, the discrete sample at the kth symboltime can be expressed as follows:

rk = bkej(2πk f0Ts+φ0) + nk, (1)

where bk is the modulated symbol with normalized powerE|dk|2 = 1 and nk is the AWGN sample with varianceσ2n . Then the received SNR Es/N0 is determined as 1/σ2

n .Parameters f0 and φ0, respectively, denote the frequencyoffset and the phase offset, and Ts is the symbol duration.

PLheader

FECframe

PLheader

FECframe · · · PL

headerFEC

frame

SoF PLSC Data symbols

Nsymbols

L−Nsymbols

Figure 1: The frame structure for DVB-S2 which includes the start-of-frame (SoF) synchronization sequence of length N = 26.

We assume that f0 takes one of the values from [− fm, + fm]where fm represents the maximum amount of frequencyoffset, and φ0 takes one of the values from [−π, +π]. Theframe detection includes the correlation of N consecutivereceived samples with the synchronization sequence symbolss0, s1, . . . , sN−1.

For the performance evaluation, the SoF of length N =26 is used as the synchronization sequence. A particularattention is given to mid to low SNR values, includingthe minimum required operational SNR of −2.35 dB forDVB-S2. The maximum frequency offset is 20% of thetransmission bandwidth; that is, the normalized maximumfrequency offset is given by fmTs = 0.2. Unless otherwisestated, the frequency offset is uniformly generated from therange [−0.2/Ts, +0.2/Ts].

3. Correlation Methods forFrame Synchronization

A simple yet effective solution for the frame synchronizationutilizes the direct correlation between the sequence ofreceived samples and the synchronization sequence [4], andthe correlation value ck at the kth symbol time is obtainedusing N consecutive received samples rk, rk+1, . . . , rk+N−1 asfollows:

ck =N−1∑

i=0

r∗i+ksi (2)

and the frame boundary is detected using the variable zk =|ck|. The decision can either be made via hypothesis testingusing a threshold value or by choosing the location at whichthe maximum value of zk occurs. It has been reported byMassey [1] that the ML detection requires the decisionvariable zk = |ck| − ek, where ek is called the correction term.For small values of SNR Es/N0 � 1, it has been shown thatthe correction term takes the form of

ek =√EsN0

N−1∑

i=0

|ri+k|2, (3)

which accounts for the energy correction of the receivedsequence. It has also been confirmed by Gansman et al. [5]that the correction term of the ML detector reduces to afunction of the received signal energy when the low SNRapproximation is applied.


To effectively mitigate the influence of frequency offsets,frame synchronization detectors based on the differentialcorrelator structure have been developed. The differentialversion of the coherent correlation in (2) can be written asfollows:

dk =N−1∑

i=1

r∗i+ksiri+k−1s∗i−1 (4)

and more generally, the n-span differential correlation isdefined as follows:

dk(n) =N−1∑

i=nr∗i+ksiri+k−ns

∗i−n. (5)

Such correlation has been utilized in the decision variablezk = dk(0) + 2

∑mn=1|dk(n)| suggested in [9], where parameter

m (<L) determines the performance versus complexity trade-off. Related discussion can also be found in [10], whichpresents the variable zk = ∑m

n=1|dk(n)| for enhanceddetection performance under a severe effect of frequencyoffset.

Although these detection schemes offer a reasonableperformance at decreased complexity, improvement canbe made by using the decision variable derived from theML criterion. By approximating the Bessel function inthe likelihood function to a fourth-degree polynomial, thedecision variable is derived as [6]

[C1] zk =N−1∑

n=1

{|dk(n)|2 −

N−1∑

i=n|ri+k|2|ri+k−n|2

}(6)

and subsequently modified to

[C2] zk =N−1∑

n=1

{|dk(n)| −

N−1∑

i=n|ri+k| |ri+k−n|

}(7)

by dropping the squares, which results in performanceenhancement over the original for many cases of practicalinterest. The frame detectors using the decision variablesin (6) and (7) will, respectively, be called C1 and C2detectors. At an increased complexity caused by additionalcomputation of the correction terms, C1 and C2 detectorsoutperform all the other aforementioned detection strategiesin the presence of frequency offsets.

Our proposal of new detector structures is motivatedby the facts that (i) a significant performance improvementoccurs by modifying the correction term in (7) from that in(6) to perform the “energy correction operation” (i.e., thecorrection term has the unit of energy) which is in agreementof the related discussions in [1, 5], and (ii) dropping thesquares from the received power |ri|2 is not the only possibleway of such modification and alternative variations mayexist. By defining

εk(n) =N−1∑

i=n|ri+k|2|ri+k−n|2 (8)

and by taking the square-root of the correction term in (6),we obtain a new decision variable

[L1] zk =N−1∑

n=1

{|dk(n)| −

√εk(n)

}(9)

for the frame detection and call the corresponding detectorL1. Multiple appearances of identical received samples in thedecision variable result in highly correlated statistical char-acteristics, and an exact analytic evaluation of the statisticsfor each decision variable seems difficult. Nevertheless, theadvantage of the proposed variable is immediately observablefrom the distributions experimentally obtained, as discussedin the following.

4. Properties and Variations ofProposed Correlation

4.1. Distribution of the Decision Variables. The detectionperformance is strongly related to the statistical distributionof the decision variable. Reliable detection is expected whenthe values of the decision variable at the synchronous (i.e.,frame boundary) and asynchronous symbol positions are(i) sufficiently separated and (ii) distributed with smallvariances. It can be shown from the expression in (7) thatneglecting the effect of noise,

E[zk] =

⎧⎪⎨⎪⎩−N(N − 1)

2, (asynchronous)

0 (synchronous)(10)

for the C2 decision variable. Similarly, the L1 decisionvariable in (9) satisfies

E[zk]

=

⎧⎪⎪⎪⎨⎪⎪⎪⎩

− (8N − 3)√

(4N − 3)− 524

, (asynchronous)

N(N − 1)2

− (8N − 3)√

(4N − 3)− 524

(synchronous).

(11)

Thus for both C2 and L1 decision variables, the differencebetween the averages for synchronous and asynchronousdistributions is equal to N(N − 1)/2. As the SNR decreases,not only the difference between the averages becomessmaller, but variances increase to result in a significantoverlap of synchronous and asynchronous distributions.Figure 2 gives such an illustration by showing the probabilitydensity functions (PDFs) for both the synchronous andasynchronous decision variables, obtained from repeatedsimulations of C2 and L1 variables at −2.35 dB SNR.Although it can be verified that the separation between theaverages of two distributions is identical for both C2 and L1detectors, the variance of L1 decision variable is sufficientlysmaller than the variance of C2 decision variable. This givesa smaller overlapped area between two distributions, whichenhances the detection performance. Figure 3 compares thevariances of C2 and L1 for different SNR values, whichconfirms that the proposed L1 decision variable has a strictlysmaller variance for the low SNR range.


4002000−200−400−600−800−1000

zk

C2 (asynchronous)C2 (synchronous)

0

0.005

0.01

0.015

PD

F

(a)

4002000−200−400−600−800−1000

zk

L1 (asynchronous)L1 (synchronous)

0

0.005

0.01

0.015

PD

F

(b)

Figure 2: Probability density functions of the decision variables for(a) C2 detector and (b) L1 detector at −2.35 dB SNR ( fmTs = 0.2).

543210−1−2−3−4−5

SNR (dB)

C2 (asynchronous)C2 (synchronous)

L1 (asynchronous)L1 (synchronous)

0

0.5

1

1.5

2

2.5

3×104

Var

ian

ceofz k

Figure 3: Variance of the decision variables for C2 and L1 detectorsover different SNR values ( fmTs = 0.2).

4.2. Utilization of the Vector Sum. Instead of summing theirmagnitudes, n-span differential correlation values can becoherently combined to produce another modified decisionstatistics. Such combining corresponds to the summationof vectors representing complex values {dk(n)}. The “L2”detector in proposition uses the decision variable associatedwith the vector sum of n-span differential correlation

[L2] zk =∣∣∣∣∣

M∑

n=1

dk(n)

∣∣∣∣∣−

√√√√√M∑

n=1

εk(n), (12)

where integer parameter M needs to be appropriately chosento maximize the performance of the detection. When theframe is synchronized under the noise-free condition, dk(n)is represented by a vector with phase angle −2πn f0Ts, andthe complex sum of the n-span differential correlation in thefirst term of (9) becomes

M∑

n=1

(N − n)e− j2πn f0Ts . (13)

To maximize the magnitude of the vector sum in (13), asmaller value of M is desired as the frequency offset f0increases, since otherwise the sum of vectors with widespreadangles results in a reduced magnitude. It can be observedthat the magnitude of the vector sum begins to diminish asthe terms with angular phase exceeding π radians are added;thus parameter M needs to be chosen to satisfy M f0Ts < 0.5.As the frequency offset increases, a smaller number of n-span differential correlation values contribute to the decisionvariable, and the reliability of the statistics decreases. Thus L2exhibits improved performance over L1 when the frequencyoffset is small but is eventually outperformed by L1 as f0increases.

4.3. Weighted Energy Correction. The amount of energycorrection can further be adjusted by introducing a multi-plicative weight factor. To account for the weighted energycorrection, the decision variable for L1 is modified as follows:

[L3] zk =N−1∑

n=1

{|dk(n)| − α

√εk(n)

}(14)

and the decision variable for L2 is modified as follows:

[L4] zk =∣∣∣∣∣

M∑

n=1

dk(n)

∣∣∣∣∣− β

√√√√√M∑

n=1

εk(n) (15)

using respective weight factors α and β. Since the per-formance of L3 and L4 detectors varies based on thechoice of these parameters, proper parameter values needto be determined for the optimized performance at targetoperating points.

5. Performance Evaluation

The false alarm rate (FAR) and the misdetection probability(MDP) at a given symbol time are used as key measures forthe detection performance. Denoting the threshold for theframe boundary detection by Γ, the FAR is given by

PFA(Γ) =∫∞

Γfzk (x | H0)dx (16)

which is the probability that the decision variable exceedsthe threshold for a given asynchronous symbol index k. Here


fzk (x | H0) is the PDF for the asynchronous decision variable.The MDP is the probability that the decision variable is belowthe threshold for the synchronous symbol position and isgiven by

PMD(Γ) =∫ Γ

−∞fzk (x | H1)dx, (17)

where fzk (x | H1) is the PDF for the synchronous decisionvariable. For the evaluation of FAR, random QPSK data sym-bols are generated and noise samples are added. The noise-corrupted data symbols are correlated with the SoF symbols;then the decision variable is computed and threshold tested.In the case of MDP, a similar procedure is performed, bygenerating noise corrupted SoF symbols instead of randomdata symbols for correlation.

An effective way of identifying the detection performanceis to determine the MDP for a given constant FAR (CFAR).Figure 4 shows the MDP at CFAR of 10−3 for the L3 detector,using different values of weight α. The MDPs for the C1and C2 detectors are also plotted for comparison. It isclearly indicated in the figure that the proposed L3 detectoroutperforms both C1 and C2 detectors for appropriatelychosen values of α. The MDP for L3 lies strictly belowthose of C1 and C2 when 0.4 < α < 2.2, at the minimumoperating SNR of −2.35 dB. The performance gain of L3increases with a sufficient margin at 0 dB SNR. It is observedthat the optimal value of α depends on SNR, suggestingparameter adaptation may be desired. However, initial framesynchronization is usually done without any prior knowledgeof the channel, and we choose to select the parameteroptimized for the worst channel condition, that is, SNR of0 dB or below. For all remaining performance evaluation,α = 1.6 is used for L3. The MDP for L4 using different valuesof weight β is shown in Figure 5. Since L4 detection is mainlyapplicable to channels with no or small frequency offsets,parameter dependency of the performance is evaluated whenfm = 0. Significant performance improvement is observedfor both SNR values, and the gain over conventional schemesis more substantial for L4. For the remainder of discussion,β = 6.0 is applied. Also note that α = 0 or β = 0 correspondsto no energy correction for decision variables.

The receiver operating characteristics (ROCs) are shownin Figure 6, which are obtained by evaluating MDP and FARpairs using varying values of the detection threshold for−2.35 dB and 3 dB SNRs. The ROC gain of L3 in comparisonto existing schemes under the effects of frequency offsets isshown in the figure. Although L4 does not provide any gainin this case, L4 detection can be more advantageous whenthere exists no frequency offset. Therefore the usage of L3detection is suggested for general channel conditions withunknown frequency offsets, and L4 is applicable for furtherperformance enhancement when the channel is known tohave a small frequency offset. More precisely, the normalizedfrequency offset at which L4 begins to outperform L3 can befound from the MDP curves in Figure 7, which is evaluated at0 dB SNR with 10−3 FAR. It is indicated in the figure that L4exhibits the performance gain when normalized offsets areless than 5.7% at 0 dB SNR, and L3 becomes the preferred

543210

α

C1C2L3 (proposed)

10−3

10−2

10−1

100

Mis

-det

ecti

onpr

obab

ility

−2.35 dB 0 dB

Figure 4: Misdetection probability of the L3 detector for differentcorrection weights at CFAR of 10−3 ( fmTs = 0.2).

1086420

β

C1C2L4 (proposed)

10−3

10−2

10−1

100

Mis

-det

ecti

onpr

obab

ility

−2.35 dB 0 dB

Figure 5: Misdetection probability of the L4 detector for differentcorrection weights at CFAR of 10−3 ( fmTs = 0).

choice of detection for frequency offsets larger than thosevalues.

The detection performance under different SNR valuesis shown in Figure 8, where MDPs are plotted when theCFAR is 10−3 with maximum normalized frequency offsetof 0.2. The amount of gain for L3 over other methods isshown to be substantial over the entire range of interest. Forreliable transmission to mobile users under various channelconditions, integrating mobility in satellite applications isgaining more attention [13–15], and performance verifica-tion under the effects of fading is necessary. Figure 9 gives the


10010−110−210−310−410−5

False alarm rate

C1C2

L3 (proposed)L4 (proposed)

10−5

10−4

10−3

10−2

10−1

100

Mis

-det

ecti

onpr

obab

ility −2.35 dB

3 dB

Figure 6: Receiver operating characteristics of the proposed andconventional detectors under severe frequency offsets ( fmTs = 0.2).

0.20.150.10.050

fmTs

C1C2


10−4

10−3

10−2

10−1

100

Mis

-det

ecti

onpr

obab

ility

Figure 7: Detection performance as a function of frequency offsets(FAR= 10−3, SNR = 0 dB).

performance comparison for different values of K-factors,where K = ∞ case corresponds to the static AWGN channel.As the K-factor decreases, effects of faded signal becomedominant and performance degradation occurs. However,the performance advantage of L3 over other detectors holdsfor all channel conditions.

It is worthwhile to make performance comparisons witha larger group of differential as well as fully coherentdetectors previously reported. In [16], authors utilize both

876543210−1−2−3

SNR (dB)

C1C2


10−5

10−4

10−3

10−2

10−1

100

Mis

-det

ecti

onpr

obab

ility

Figure 8: Misdetection probability at different SNR values(FAR= 10−3, fmTs = 0.2).

K = INF(Static)

K = 10K = 1(Rician)

K = 0(Rayleigh)

C1C2


10−2

10−1

100

Mis

-det

ecti

onpr

obab

ility

Figure 9: Detection performance under fading channel conditions(FAR= 10−3, fmTs = 0.2, SNR = 0 dB).

the SoF and PLSC symbols for the frame detection. Althoughthe PLSC symbols are not known to the receiver a priori,each pair of two consecutive symbols are either repeatedor inverted, thus such patterns can be exploited for thedetection. In addition, a pipelined structure for efficientimplementation of the peak search algorithm is proposedin the paper. However, the detector is based on the single-span correlation (N = 1) and no correction term isapplied, resulting in degraded performance when compared


0.20.150.10.050

fmTs

SoF + PLSCAPDIGansman

NielsenL4 (proposed)

10−4

10−3

10−2

10−1

100

Mis

-det

ecti

onpr

obab

ility

Figure 10: Performance comparison with coherent, differential,and mixed-type detectors (FAR= 10−3, SNR = 0 dB).

to L4. In Figure 10, the detector MDP of [16] at a givensymbol location is labeled by “Sof+PLSC” and shown witherror probabilities of other schemes including L4. Whileour proposed detectors are intended for any general framestructure, the utilization of the PLSC symbols can be appliedto L3 and L4 for further performance enhancement when theDVB-S2 frame structure is specifically considered.

The detector proposed by Nielsen [2] performs fullycoherent correlation which is targeted for channels withoutfrequency offsets. As indicated in Figure 10 (Nielsen), itoutperforms all other schemes at zero frequency offset.Nevertheless, very rapid performance degradation is expe-rienced as the offset increases, and more than 50% MDPoccurs when the normalized frequency offset exceeds 0.05.Gansman’s detector [5] (labeled Gansman) performs multi-span correlation with the energy correction term included.Because of the existence of the coherent correlation term inaddition to differential terms, performance tends to becomedegraded as the frequency offset increases. Another mixed-type detector is APDI [8, 9] which efficiently controls thedetection complexity and its MDP is also shown in the figure.

6. Conclusion

We have presented the correlation schemes for improveddetection performance over various channel conditions. Theproposed L3 detector is shown to provide a substantialgain over all existing detection methods, regardless ofthe existence of frequency offsets. Further performanceenhancement is achievable by using the L4 detector whenthe amount of frequency offset is relatively small. Presentedresults confirm that an appropriate energy correction in cor-relating detectors has a significant impact on the detectionperformance.

Acknowledgments

This work is supported in part by the IT R&D programof KCC/IITA 2007-S008-03, Development of 21 GHz BandSatellite Broadcasting Transmission Technology, and in part bythe Special Research Grant of Sogang University.

References

[1] J. L. Massey, “Optimum frame synchronization,” IEEE Trans-actions on Communications, vol. 20, no. 2, pp. 115–119, 1972.

[2] P. Nielsen, “Some optimum and suboptimum frame synchro-nizers for binary data in Gaussian noise,” IEEE Transactions onCommunications, vol. 21, no. 6, pp. 770–772, 1973.

[3] G.-G. Bi, “Performance of frame sync acquisition algorithmson the AWGN channel,” IEEE Transactions on Communica-tions, vol. 31, no. 10, pp. 1196–1201, 1983.

[4] G. L. Lui and H. H. Tan, “Frame synchronization for Gaussianchannels,” IEEE Transactions on Communications, vol. 35, no.8, pp. 818–829, 1987.

[5] J. A. Gansman, M. P. Fitz, and J. V. Krogmeier, “Optimum andsuboptimum frame synchronization for pilot-symbol-assistedmodulation,” IEEE Transactions on Communications, vol. 45,no. 10, pp. 1327–1337, 1997.

[6] Z. Y. Choi and Y. H. Lee, “Frame synchronization in thepresence of frequency offset,” IEEE Transactions on Commu-nications, vol. 50, no. 7, pp. 1062–1065, 2002.

[7] S. Park, D. Park, H. Park, and K. Lee, “Low-complexityfrequency-offset insensitive detection for orthogonal modula-tion,” Electronics Letters, vol. 41, no. 22, pp. 1226–1228, 2005.

[8] M. Villanti, P. Salmi, and G. E. Corazza, “Differential postdetection integration techniques for robust code acquisition,”IEEE Transactions on Communications, vol. 55, no. 11, pp.2172–2184, 2007.

[9] G. E. Corazza and R. Pedone, “Generalized and averagelikelihood ratio testing for post detection integration,” IEEETransactions on Communications, vol. 55, no. 11, pp. 2159–2171, 2007.

[10] P. Kim, G. E. Corazza, R. Pedone, M. Villanti, D.-I. Chang,and D.-G. Oh, “Enhanced frame synchronization for DVB-S2 system under a large of frequency offset,” in Proceedingsof IEEE Wireless Communications and Networking Conference(WCNC ’07), pp. 1183–1187, Hong Kong, March 2007.

[11] “Digital video broadcasting (DVB): second generation gram-ing structure, channel coding and modulation system forbroadcasting, interative service, news gathering and otherbroadband satellite application,” ETSI EN 302 307 v1.1.1,European Broadcasting Union, Geneva, Switzerland, June2004.

[12] “Digital video broadcasting (DVB): user guidelines for the sec-ond generation system for broadcasting, interactive services,news gathering and other broadband satellite applications,”ETSI EN 302 307 v1.1.1, European Broadcasting Union,Geneva, Switzerland, February 2005.

[13] G. Acar, C. Kasparis, and P. T. Thompson, “The enhancementof DVB-S2&DVB-RCS by adding additional mobile usercapability,” in Proceedings of IET Seminar on Digital VideoBroadcasting over Satellite: Present and Future, pp. 81–90,London,UK, November 2006.

[14] S. Cioni, C. P. Niebla, G. S. Granados, S. Scalise, A. Vanelli-Coralli, and M. A. V. Castro, “Advanced fade countermeasuresfor DVB-S2 systems in railway scenarios,” EURASIP Journal on


Wireless Communications and Networking, vol. 2007, Article ID49718, 17 pages, 2007.

[15] C. Morlet and A. Ginesi, “Introduction of mobility aspectsfor DVB-S2/RCS broadband systems,” in Proceedings of theInternational Workshop on Satellite and Space Communications(IWSSC ’06), pp. 93–97, Leganes, Spain, September 2006.

[16] Q. Li, X. Zeng, C. Wu, Y. Zhang, Y. Deng, and H. Jun, “Optimalframe synchronization for DVB-S2,” in Proceedings of IEEEInternational Symposium on Circuits and Systems (ISCAS ’08),pp. 956–959, Seattle, Wash, USA, May 2008.


Research Article

Feedforward Data-Aided Phase Noise Estimation froma DCT Basis Expansion

Jabran Bhatti and Marc Moeneclaey

Department of Telecommunications and Information Processing, Ghent University, 9000 Ghent, Belgium

Correspondence should be addressed to Marc Moeneclaey, [email protected]

Received 1 July 2008; Revised 5 November 2008; Accepted 25 December 2008

Recommended by Erchin Serpedin

This contribution deals with phase noise estimation from pilot symbols. The phase noise process is approximated by an expansionof discrete cosine transform (DCT) basis functions containing only a few terms.We propose a feedforward algorithm that estimatesthe DCT coefficients without requiring detailed knowledge about the phase noise statistics. We demonstrate that the resulting(linearized) mean-square phase estimation error consists of two contributions: a contribution from the additive noise, that equalsthe Cramer-Rao lower bound, and a noise independent contribution, that results from the phase noise modeling error. Weinvestigate the effect of the symbol sequence length, the pilot symbol positions, the number of pilot symbols, and the numberof estimated DCT coefficients on the estimation accuracy and on the corresponding bit error rate (BER). We propose a pilotsymbol configuration allowing to estimate any number of DCT coefficients not exceeding the number of pilot symbols, providinga considerable performance improvement as compared to other pilot symbol configurations. For large block sizes, the DCT-basedestimation algorithm substantially outperforms algorithms that estimate only the time-average or the linear trend of the carrierphase.

Copyright © 2009 J. Bhatti and M. Moeneclaey. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

1. Introduction

Phase noise refers to random perturbations in the carrierphase, caused by imperfections in both transmitter andreceiver oscillators. Compensation of this phase noise iscritical since these disturbances can considerably degrade thesystem performance. The phase noise process typically has alow-pass spectrum [1]. A description of the characteristicsof oscillator phase noise is given in [2]. Discrete-timeprocesses that have a bandwidth which is considerably lessthan the sampling frequency can often be modeled as anexpansion of suitable basis functions, that contains onlya few terms. Such a basis expansion has been successfullyapplied in the context of channel estimation and equalizationin wireless communications, where the coefficients of thechannel impulse response are low-pass processes with abandwidth that is limited by the Doppler frequency [3–5].Several methods trying to tackle the phase noise problemexist.

(i) Designing oscillators operating at low-phase noisereduces the need of accurate phase noise compensationalgorithms. This, however, leads to expensive oscillatorswhich are difficult to integrate on chip [6–8].

(ii) Phase noise can be tracked by means of a feedbackalgorithm that operates according to the principle of thephase-locked loop (PLL). As feedback algorithms give rise torather long acquisition transients, they are not well suited toburst transmission systems [9, 10].

(iii) The observation interval is divided into subintervalsand a feedforward algorithm is used to estimate within eachsubinterval the local time-average (or the linear trend) ofthe phase [9–11]. This corresponds to approximating thephase noise by a function that is constant (or linear) withineach subinterval. Such algorithms avoid the long acquisitiontransients encountered with feedback algorithms. However,in order that the piecewise constant (or linear) approxima-tion of the phase noise be accurate, the subintervals should


be short, in which case a high sensitivity to additive noiseoccurs.

(iv) Recently, iterative joint estimation and decod-ing/detection algorithms have been proposed that make useof the a priori statistics of the phase noise process. A factorgraph approach for the estimation of the Markov-type phasenoise has been presented in [12, 13], while in [14, 15]sequential Monte Carlo methods combined with Kalmanfiltering are used to perform detection in the presence ofphase noise. These algorithms are computationally rathercomplex, prevent the use of off-the-shelf decoders, andassume detailed knowledge about the phase noise statisticsat the receiver. Less complex iterative phase noise estimationalgorithms based on Wiener filtering have been presentedin [16], but still require knowledge about the phase noiseautocorrelation function at the receiver.

In this contribution, we apply the basis expansion modelto the problem of phase noise estimation from pilot symbolsonly, using the orthogonal basis functions from the discretecosine transform (DCT). In contrast to the case of channelestimation, the phase noise does not enter the observa-tion model in a linear way. Section 2 presents the systemdescription which includes the observation model and ageneral phase noise model. Also, the phase noise estimationalgorithm, based on the estimation of only a few DCTcoefficients, is derived. Section 3 contains the performanceanalysis of the proposed algorithm in terms of the mean-square error (MSE) of the phase estimate. The behavior ofthe linearized model in the frequency domain is examinedin Section 4. Analysis results are confirmed by computersimulations in Section 5, which consider both the mean-square phase estimation error and the associated bit errorrate (BER) degradation. Section 6 gives a complexity analysisof our algorithm. Conclusions are drawn in Section 7.

2. System Description

We consider the transmission of a block of K data symbolsover an AWGN channel that is affected by phase noise. Theresulting received signal is represented as

r(k) = a(k)e jθ(k) +w(k) for k = 0, . . . ,K − 1, (1)

where the index k refers to the kth symbol interval oflength T , {a(k)} is a sequence of data symbols with symbolenergy E[|a(k)|2] = Es, the additive noise {w(k)} is asequence of i.i.d. zero-mean circularly symmetric complex-valued Gaussian random variables with E[|w(k)|2] = N0,and θ(k) is a time-varying phase noise process with K× Kcorrelation matrix Rθ . The symbol sequence {a(k)} containsKP known pilot symbols at positions ki, i = 0, . . . ,KP − 1,with constant magnitude |a(ki)|2 = Es. From the observationof the received signal at the pilot symbol positions ki,

an estimate θ(k) of the time-varying phase θ(k) is to beproduced. This phase estimate will be used to rotate thereceived signal before data detection, that is, the detection of

the data symbols is based on {z(k)} = {r(k)exp(− jθ(k))}.The detector is designed under the assumption of perfect

carrier synchronization, that is, θ(k) = θ(k). For uncoded

transmission, the detection algorithm reduces to symbol-by-symbol detection:

a(k) = arg mina∈A

∣∣z(k)− a∣∣2, k /∈

{ki, i = 0, . . . ,KP − 1

}(2)

with A denoting the symbol constellation. The phase θ(k)can be represented as a weighed sum of K basis functionsover the interval [0,K − 1]:

θ(k) =K−1∑

n=0

xnψn(k), k = 0, . . . ,K − 1. (3)

As θ(k) is essentially a low-pass process, it can be wellapproximated by the weighed sum of a limited number N(�K) of suitable basis functions:

θ(k) ≈N−1∑

n=0

xnψn(k), k = 0, . . . ,K − 1. (4)

In this contribution, we make use of the orthonormaldiscrete cosine transform (DCT) basis functions, that aredefined as

ψn(k) =

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

√1K

, n = 0,√

2K

cos(πn

K

(k +

12

)), n > 0.

(5)

Hence, from (3), xn is the nth DCT coefficient of θ(k).As ψn(k) has its energy concentrated near the frequenciesn/2KT and−n/2KT , the DCT basis functions are well suitedto represent a low-pass process by means of a small numberof basis functions.

In the following, we produce from the observation{r(ki)} at the pilot symbol positions ki, with i = 0, . . . ,KP−1,an estimate xn of the coefficients xn, with n = 0, . . . ,N − 1,using the phase model (4) with equality. The final estimate

θ(k) is obtained by computing the inverse DCT of {xn}:

θ(k) =N−1∑

n=0

xnψn(k) for k = 0, . . . ,K − 1. (6)

However, as (4) is not an exact model of the true phaseθ(k), the phase estimate is affected not only by the additivenoise contained in the observation, but also by a phase noisemodeling error. Considering the observations (1) at instantski, and assuming that (4) holds with equality, we obtain

rP = D(x)aP + wP , (7)

where for i = 0, . . . ,KP − 1; (rP)i = r(ki), (wP)i =w(ki), (aP)i = a(ki), and D(x) is a KP × KP diagonal matrixwith

(D(x)

)i = e j(ΨPx)i (8)

and (ΨP)i,n = ψn(ki), (x)n = xn, n = 0, . . . ,N − 1 withN ≤ KP . The KP × 1 vectors rP, aP, and wP can be viewedas resulting from subsampling {r(k)}, {a(k)}, and {w(k)} at


the instants ki that correspond to the pilot symbol positions.Similarly, the nth column of the KP × N matrix ΨP isobtained by subsampling the n th DCT basis function ψn(k).Maximum likelihood estimation of x from rP results in

xML = arg minx

∣∣rP −D(x)aP∣∣2. (9)

As x enters the observation rP in a nonlinear way, the MLestimate is not easily obtained. Therefore, we resort to asuboptimum ad hoc estimation of x, which is based onthe argument (angle) of the complex-valued observations.However, as the function arg(z) reduces the argument of zto an interval [−π,π], taking arg(r(ki)) might give rise tophase wrapping, especially when the time-average of θ(k) isclose to −π or π. In order to reduce the probability of phasewrapping, we first rotate the observation r over an angle θavg

that is close to the time-average of θ(k), then we estimate theDCT coefficients of the fluctuation θ(k)− θavg and finally we

compute the phase estimate θ(k). We select

θavg = arg

(KP−1∑

i=0

r(ki))

(10)

and construct r′ with

(r′)i = r′(ki)

= arg(r(ki

)a∗(ki)exp

(− jθavg))

for i = 0, . . . ,KP − 1.

(11)

We obtain an estimate x′ of the DCT coefficients of thefluctuation θ(k) − θavg through a least-squares fit x′ =arg minx|r′ −ΨPx|2, yielding

x′ = (ΨPTΨP

)−1ΨP

Tr′. (12)

In order that (ΨPTΨP)

−1exists, we need N ≤ KP . Finally, the

phase estimate is given by

θ = θavg1K +ΨKx′

= θavg1K + Mr′,(13)

where M = ΨK(ΨPTΨP)

−1ΨP

T and (θ)k = θ(k), (1K)k =1, (ΨK)k,n = ψn(k), k = 0, . . . , K − 1; n = 0, . . . ,N − 1.Note from (13) that the estimation algorithm does not needspecific knowledge about the phase noise process. As r′(ki)from (11) can be viewed as a noisy version of θ(ki) − θavg,

the phase estimate θ from (13), or, equivalently, the phase

estimate θ(k) from (6), can be interpreted as an interpolatedversion of the subsampled noisy phase trajectory. Theestimation of the phase trajectory involves the inversion oftheN×N matrixΨP

TΨP, which depends on the pilot symbolpositions {ki, i = 0, . . . ,KP − 1}. Now, we point out that thepilot symbol positions can be selected such that ΨP

TΨP isdiagonal, or, equivalently, that the N columns of the KP ×Nmatrix ΨP are orthogonal. Such selection of {ki} avoids theneed for matrix inversion in (12). Denoting by φn(i) the

orthonormal DCT basis functions of length KP , it is easilyverified that selecting {ki} such that

ki = iK

KP+K − KP

2KP, i = 0, . . . ,KP − 1 (14)

gives rise to

ψn(ki) =

√KPKφn(i) for n = 0, . . . ,KP − 1, (15)

so that

ΨPTΨP = KP

KIN (16)

with IN denoting the N × N identity matrix. Equations (12)and (13) then reduce to

x′ = K

KPΨP

Tr′, (17)

θ = θavg1K +K

KPΨKΨP

Tr′. (18)

In order that all ki from (14) be integer, K must be an oddmultiple of KP , that is, K = (2d + 1)KP , yielding ki =(2d+1)i+d. The resulting pilot symbol configuration is suitedfor estimating any number of DCT coefficients not exceedingKP . WhenK is not an odd multiple ofKP , rounding the right-hand side of (14) to the nearest integer gives rise to pilotsymbol positions that still yield an essentially diagonal matrixΨP

TΨP in which case the simplified equations (17) and (18)can still be used.

3. Performance Analysis

As the observation vector rP is a nonlinear function of thecarrier phase, an exact analytical performance analysis isnot feasible. Instead, we will resort to a linearization ofthe argument function in (11) in order to obtain tractableresults.

Linearization of the argument function yields

r′(i) = arg(r(ki)a∗(ki)e− jθavg

)

= arg(e j(θ(ki)−θavg)(Es + a∗

(ki)w(ki)e− jθ(ki)

))

≈ θ(ki)− θavg + nP(i)

(19)

for i = 0, . . . ,KP − 1, where {nP(i)} is a sequence of i.i.d.zero-mean Gaussian random variables with variance N0/2Es.Note that (19) incorporates the true phase θ(ki) instead ofthe approximate model (4), so that our performance analysiswill take the modeling error into account. In order that thelinearization in (19) be valid, we need |θ(ki) − θavg| < π

(because |arg(z)| < π) and |w(ki)|2 � Es; hence, the phasenoise fluctuations should not cause phase wrapping andEs/N0 should be sufficiently large. Substituting (19) into (13)yields

θ = M(θP + nP

) = MSθ + MnP , (20)


where (nP)i = nP(i), (θP)i = θ(ki), and the KP × K matrix Sis such that its ith row has a 1 at the kith column and zeroeselsewhere (i = 0, . . . ,KP − 1). The estimation error resultingfrom (20) is given by

θ − θ = (MS− IK)θ + MnP , (21)

where IK denotes the K ×K identity matrix. If the model (4)was exact, we would have θ = ΨKx and θP = ΨPx, yielding

θ = θ + MnP , (22)

in which case the estimation error would be caused only bythe additive noise.

As a performance measure of the estimation algorithm,we consider the mean-square error (MSE), defined as

MSE = 1KE[trace((θ − θ)(θ − θ)T)

]. (23)

Substituting (21) into (23) yields

MSE = 1K

N0

2Estrace

((ΨP

TΨP)−1)

+ MSE∞, (24)

where

MSE∞ = 1K

trace((

MS− IK)

Rθ(

MS− IK)T)

. (25)

The first term in (24) denotes the contribution from theadditive noise, whereas the second term in (24) constitutesan MSE floor, caused by the phase noise modeling error.The phase noise statistics affect the MSE floor through theautocorrelation matrix Rθ . The MSE floor decreases withincreasing N (because the modeling error is reduced whenmore DCT coefficients are taken into account), whereas theadditive noise contribution to the MSE increases with N(because N parameters need to be estimated). Hence, thereis an optimum value of N that minimizes the MSE.

From the nonlinear observation model (7), whichassumes that (4) holds with equality, we compute theCramer-Rao lower bound on the MSE (23) resulting fromany unbiased estimate x of the DCT coefficients of θ(k):

MSE ≥ 1K

trace(

J−1). (26)

In (26), J denotes the Fisher information matrix related tothe estimation of x from (7), which is found to be

(J)n,n′ =2EsN0

((ΨP

TΨP)−1)

n,n′ . (27)

Combining (26) with (27) yields the following performancebound:

MSE ≥ 1K

N0

2Estrace

((ΨP

TΨP)−1)

. (28)

Comparison of (24) and (28) indicates that our ad hoc algo-rithm (13) yields the minimum possible (over all unbiasedestimates) noise contribution to the MSE (assuming that thelinearization of the observation model is valid).

When the pilot symbol positions {ki} are selectedaccording to (14), the Cramer-Rao bound (28) reduces to

MSE ≥ N0

2Es

N

KP, (29)

which indicates that the sensitivity to additive noise increaseswith the number (N) of estimated DCT coefficients.

4. Frequency-Domain Analysis

After linearization, (20) relates the phase estimate θ to theactual phase θ and the additive noise nP . In the absence ofadditive noise, the estimator can be viewed as a linear system

that transforms θ into θ by means of the transfer matrix MS.In order to analyze this system in the frequency domain, weconsider an input θn with (θn)k = exp( j2πkn/K), that is, θncontains only the frequency n/K . We investigate the mean-square error MSEn between the input θn and the outputθ = MSθn; MSEn is given by (25), with Rθ replaced by θnθ

Hn ,

where the superscript H indicates conjugate transpose.As θn is periodic in n with period K , the same periodicity

holds for MSEn. Assuming the pilot symbol positions areaccording to (14) with K = 105 and Kp = 15, Figure 1shows MSEn as a function of n/K , with n/K in the interval[−1/2, 1/2] andN = 7. The behavior of MSEn is explained bynoting that subsampling θn at the instants ki (with spacingKP) gives rise to aliasing. Frequencies n/K and (n + KP)/Kyield the same subsampled phase trajectory. In the followingdiscussion, the intervals IKP and IN are defined as [−(KP −1)/(2K), (KP−1)/(2K)] and [−(N−1)/(2K), (N−1)/(2K)],respectively; note that IN ⊂ IKP .

(i) As the first N basis functions of the DCT transformcover the frequency interval IN , we get θn ≈ θn andMSEn ≈ 0 when n/K is in IN .

(ii) When n/K is in the interval IKP , but outside IN , we get

θn ≈ 0 and MSEn ≈ 1.

(iii) Suppose n = mKP + n′, with m /= 0, |m| < K/(2KP)and n′ in IKP , because of aliasing, θn is interpreted asθn′exp( jφm) with φm = 2πm(K − KP)/(2K). When

n′ is in the interval IN , we get θn ≈ θn′exp( jφm). Theresulting estimation error is the sum of two complexexponentials with frequencies n/K and n′/K , yielding

MSEn ≈ 2. When n′ is not in IN , we get θn ≈ 0 andMSEn ≈ 1.

It follows from Figure 1 that the estimator can be viewedas a low-pass system with bandwidth B = (N − 1)/(2K).Basically, the frequency components n/K of θ with |n/K| < Bare tracked by the estimator, whereas the components with|n/K| > B contribute to the MSE.

5. Simulation Results

In this section, we assess the performance of the proposedtechnique in terms of the MSE of the phase estimateand the resulting BER degradation by means of computer


simulations. In our simulations, we will consider two types ofphase noise, that is, Wiener phase noise and first-order phasenoise. The (discrete time) first-order phase noise process θ(k)can be viewed as the output of a one-pole filter driven bywhite Gaussian noise:

θ(k + 1) = (1− α)θ(k) + Δ(k), (30)

where {Δ(k)} is a sequence of i.i.d. zero-mean Gaussianrandom variables with variance σ2

Δ. The corresponding phasenoise power spectrum and phase noise variance are given by

S1st-orderθ

(e j2π f T

) = σ2Δ∣∣exp( j2π f T)− 1 + α

∣∣2

≈ σ2Δ

| j2π f T + α|2 ,

(31)

σ2θ =

σ2Δ

α(2− α)≈ σ2

Δ

2α. (32)

The approximations in (31) and (32) hold for f T � 1/2and α � 1. It follows from (31) that α/2πT is the 3 dBfrequency of the power spectrum. The first-order phase noisemodels the phase instabilities of an oscillator signal thatresults from a phase-locked loop (PLL) circuit. The (discrete-time) Wiener phase noise θ(k) is described by the followingsystem equation:

θ(k + 1) = θ(k) + Δ(k), k = 0, . . . ,K − 2, (33)

where the initial phase noise value θ(0) is uniformlydistributed in [−π,π] and Δ(k) has the same meaning asin (30). Hence, θ(k) can be viewed as the output of anintegrator with a white noise input. From (33), it followsthat the variance of the Wiener phase noise increases linearlywith the time index k, which indicates that the process isnonstationary.

Comparing (33) and (30), it follows that the Wienerphase noise can be interpreted as a limiting case of first-orderphase noise, in the limit for α → 0. Hence, one can formallydefine the Wiener phase noise spectrum as the limit of thefirst-order spectrum (31); for α → 0,

SWienerθ

(e j2π f T

) = σ2Δ∣∣exp( j2π f T)− 1

∣∣2 ≈σ2Δ

4π2 f 2T2, (34)

where the approximation in (34) holds for | f T| � 1/2. Notethat the Wiener phase noise spectrum becomes unboundedat f = 0, which is a consequence of the variance increasinglinearly with time. In contrast, the complex envelope exp( jθ)of the oscillator signal can be shown to be a stationaryprocess (with [1, the Lorentzian power spectrum]). TheWiener phase noise model is often used to describe the phasenoise process of a free-running oscillator, although also moreelaborate models exist, involving a phase noise spectrum thatconsists of a sum of terms of the form Am f −m, m = 0, . . . , 4[10, 17–19]. In order to reduce the strong low-frequencycomponents of the phase noise resulting from a free-runningoscillator, the oscillator is often incorporated in a PLL circuit;

0

0.5

1

1.5

2

2.5

MSE

−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5

n/K

Figure 1: MSE as a function of n/K for K = 105, KP = 15, andN = 7.

1.E − 04

1.E − 03

1.E − 02

1.E − 01

1.E + 00

1.E + 01

1.E + 02

S(ex

p(j2πfT

))/S

0

0.1 1 10 100

f / f3 dB

First-order phase noise power spectrumWiener phase noise power spectrum

Figure 2: Power spectrum for Wiener phase noise and first-orderphase noise.

a first-order PLL gives rise to the first-order phase noiseprocess (30) [17].

Figure 2 shows the first-order phase noise power spec-trum, normalized by its value S0 at f = 0, as a functionof the normalized frequency f / f3 dB, with f3 dB = α/(2πT);also displayed is the Wiener phase noise power spectrum(normalized by the same S0). As for both types of phasenoise, the same value of σ2

Δ has been used, both spectra havethe same high-frequency content.

In the following simulations, Wiener phase noise isassumed, unless noted otherwise. First, we assume trans-mission of a block of length K = 105 symbols, consistingof KD = 90 uncoded QPSK data symbols and KP = 15constant-energy pilot symbols that are inserted into thesequence according to (14).

(i) Figure 3 shows the MSE of the phase estimate in theabsence of phase noise as a function of Es/N0 when N = 1, 4and 10 DCT coefficients are estimated; in addition, thesesimulation results are compared to the corresponding CRB(29). We observe that the CRB is achieved for sufficiently


1.E − 04

1.E − 03

1.E − 02

1.E − 01

1.E + 00M

SE(r

ad2)

0 5 10 15 20 25

Es/N0 (dB)

N = 1N = 4N = 10

CRB (N = 1)CRB (N = 4)CRB (N = 10)

Figure 3: MSE in the absence of phase noise compared to thecorresponding CRB. K = 105, KP = 15.

high values of Es/N0. For small Es/N0, the MSE exceedsthe CRB, which is in agreement with the fact that thelinearized observation model from (19) is no longer accuratein the low-SNR region. Furthermore, it is confirmed thatthe contribution from the additive noise to the MSE isproportional to the number of estimated coefficients N .

(ii) Figure 4 shows the MSE as a function of Es/N0 forN = 1, 4 and 10, but this time in the presence of Wienerphase noise with σ2

Δ = 0.0027 rad2 (which corresponds to“strong” phase noise, with σΔ= 3◦). We observe an MSEfloor in the high-Es/N0 region, which can be reduced byincreasing the number N of estimated coefficients. Figure 4also confirms that for low Es/N0, the MSE increases when Nincreases. This high-Es/N0 and low-Es/N0behaviors indicatethat for given K , KP , and Es/N0, the MSE can be minimizedby proper selection of N .

(iii) Figure 5 shows the bit error rate (BER) as a functionof Eb/N0 (Eb is the energy per transmitted bit, Es = 2(1−η)Ebfor QPSK) for N = 1, 4, and 10. The reference BER curvecorresponds to a system with perfect synchronization and nopilot symbols (η = 0). We observe that for low Eb/N0, it issufficient to estimate only the time-average of the phase (i.e.,N = 1). Estimating a higher number of DCT coefficients canlead to a worse BER performance for low Eb/N0 because theMSE of the phase estimate due to additive noise increaseswith N . At high Eb/N0, a BER floor occurs which decreaseswith increasing N , so in this region it becomes beneficialto estimate more than just one DCT coefficient. Hence, the

1.E − 03

1.E − 02

1.E − 01

1.E + 00

MSE

(rad

2)

0 5 10 15 20 25 30 35

Es/N0 (dB)

N = 1N = 4N = 10

Figure 4: MSE when Wiener phase noise with σΔ= 3◦ is presentK = 105, KP = 15.

optimal number of estimated coefficients Nopt will dependon the operating Eb/N0.

(iv) Figure 6 compares the BER degradations atBERref= 10−4 resulting from Wiener phase noise and first-order phase noise; the value of σ2

Δ is the same for bothphase noise processes, such that the Wiener phase noisespectrum and first-order phase noise spectrum are thesame for large f . (The BER degradation caused by someimpairment is characterized by the increase (in dB) of Eb/N0

(as compared to the case of no impairment) needed tomaintain the BER at a specified reference level.) As the 3 dBfrequency α/(2πT) of the first-order phase noise is less thanBT , the frequency contents of the Wiener phase noise andthe first-order phase noise outside the estimator bandwidthare essentially the same, and the corresponding BER curvesare nearly coincident; this is in agreement with the analysisfrom Section 4, where we showed that the low-frequencycomponents of the phase noise practically do not contributeto the phase error. It is also confirmed that there is anoptimum value of N that minimizes the BER degradation;this optimum N increases with σΔ.

Next, we study the influence of the pilot symbol positionsin the symbol sequence, assuming Wiener phase noisewith σΔ= 3◦. The following scenarios are considered (seeFigure 7), with KP = 15.

(i) The pilot symbols are inserted according to (14)(SCEN1).


(ii) All pilot symbols are located in the middle of thesequence (SCEN2).

(iii) KP/2� pilot symbols are inserted at the beginning ofthe sequence, the remaining �KP/2� pilot symbols areplaced at the end (SCEN3).

(iv) The KP pilot symbols are placed equidistantly atpositions {0,K/KP , ..., (KP − 1)K/KP} (SCEN4).

(v) We divide the total number of 15 pilot symbols into3 clusters of 5 consecutive pilot symbols each. The3 clusters are centered at the positions (14) thatcorrespond to KP = 3 (SCEN5).

(vi) We divide the total number of 15 pilot symbols into5 clusters of 3 consecutive pilot symbols each. The5 clusters are centered at the positions (14) thatcorrespond to KP = 5 (SCEN6).

Figure 8 shows the BER for each scenario with N =4. We observe that SCEN2 and SCEN3 lead to essentiallythe same BER performance, that turns out to be verypoor. The BER resulting from SCEN5 is slightly better, butstill poor. Much better BER performance is obtained forSCEN1, SCEN4, and SCEN6, with SCEN1 yielding the bestperformance. The poor performance resulting from SCEN2,SCEN3, and SCEN5 comes from the poor conditioningof the 15 × 4 matrix ΨP, yielding very large values whencomputing the inverse ofΨP

TΨP. As the DCT basis functionsψ0(k), . . . ,ψ3(k) change only slowly with k, SCEN2 yields amatrix ΨP with nearly identical rows, so it behaves like amatrix of rank 1. Similarly, the matrices ΨP that correspondto SCEN3 and SCEN5 behave like matrices of ranks 2 and3, respectively. Hence, when the pilot symbols are placed ina number of clusters that are less than the number (N) ofDCT coefficients to be estimated, poor performance results.For SCEN1, SCEN4, and SCEN6, the number of pilot symbolclusters exceeds N ; the corresponding matrices ΨP are full-rank (rank = 4), and good performance results. Note thatSCEN1 and SCEN4 can cope with values of N up to KP ,whereas SCEN6 cannot handle values of N in excess of 5.

In the following, we investigate the influence of thenumber of pilot symbols on the MSE and the BER. Theconstant-energy pilot symbols are inserted into the datasequence according to (14). For (14) to hold, the block lengthK should be an odd multiple of the number of pilot symbolsKP . We assume a total block length K = 105 and simulatethe BER and MSE for KP = 7, 15, and 35. Figure 9 shows theBER degradation at BER= 10−4 with respect to the referencesystem, for a fixed ratio η = KP/K = 20% and various valuesof the block length K . The BER degradation −10 log(1 − η)due to the insertion of pilot symbols (which amounts to0.97 dB for η = 0.2) is included. The following observationcan be made.

(i) For given block size K , there is an optimum numberNopt of DCT coefficients to be estimated that minimizes theBER degradation. This is consistent with the observation thatthe MSE of the phase estimate can be minimized by a suitablechoice of N .

(ii) For very small K , Nopt = 1. The optimum value Nopt

increases with increasing K because more DCT coefficients

1.E − 07

1.E − 06

1.E − 05

1.E − 04

1.E − 03

1.E − 02

1.E − 01

1.E + 00

BE

R

0 5 10 15 20

Eb/N0 (dB)

Reference systemN = 1

N = 4N = 10

Figure 5: BER when Wiener phase noise with σΔ= 3◦ is present.K = 105, KP = 15.

1

1.5

2

2.5

3

3.5

BE

Rde

grad

atio

n(d

B)

1 3 5 7 9 11 13 15

N

Wiener phase noise (1)First-order phase noise (1)Wiener phase noise (2)First-order phase noise (2)

Figure 6: BER degradation as a function of the number of estimatedcoefficients N for Wiener phase noise and first-order phase noisewith α = 0.015. (1) σ2

Δ = 0.0027 rad2; (2) σ2Δ = 0.0015 rad2. K =

105, KP = 15.


SCEN1

SCEN2

SCEN3

SCEN4

SCEN5

SCEN6

d 2d · · · 2d d

· · ·KP

· · · · · ·KP/2 KP/2

2d · · · 2d

Figure 7: Pilot symbol insertion schemes.

1.E − 05

1.E − 04

1.E − 03

1.E − 02

1.E − 01

1.E + 00

BE

R

0 2 4 6 8 10 12

Eb/N0 (dB)

SCEN1SCEN2SCEN3SCEN4

SCEN5SCEN6Reference system

Figure 8: BER for different pilot symbol placement scenarios. K =105, KP = 15, N = 4.

are needed to model the phase fluctuations when K getslarger. Keeping N = 1 yields very large degradations whenK increases.

(iii) The BER degradation that corresponds to N = Nopt

exhibits a (broad) minimum as a function of K . As long asthe fluctuation of θ(k) about its time-average is small, sothat linearization of the argument function in (11) applies,the degradation decreases with increasing K because thenumberKP of noisy observations of the phase noise increaseswhen the ratio KP/K is fixed. However, for too large K ,the fluctuation of the Wiener phase noise is so large that

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

BE

Rde

grad

atio

n(d

B)

1 10 100

N

K = 10K = 20K = 30K = 50

K = 100K = 200K = 400K = 600

Figure 9: BER degradation for BER = 10−4 as function of N forvarious K and fixed pilot symbol ratio η = KP/K = 20% andσΔ= 3◦.

linearization is no longer valid (for Wiener phase noise, weneed Kσ2

Δ � 1 for the linearization to be accurate) and theresulting degradation increases with increasing K .

For the considered scenario, the minimum degradationoccurs at (Kopt,Nopt) ≈ (400, 20) and amounts to about2.1 dB. When the actual block size K exceeds Kopt, thedegradation can be limited by dividing the block in subblocksof at most Kopt symbols, and estimating the phase trajectoryfor each subblock separately.

Figure 10 shows the BER degradation when (1) η = 20%and σΔ= 3◦ and (2) η = 10% and σΔ= 2◦, for the followingphase noise estimation algorithms.

(i) The proposed DCT-based algorithm with pilot sym-bol placement according to SCEN1 (14) and selectionof the optimum N .

(ii) Estimation of only the time-average of the phasenoise, with the pilot symbols arranged according toSCEN3.

(iii) The method from Luise et al. [11], with the pilotsymbols arranged according to SCEN3. The phasenoise over the total symbol block is approximated as alinear interpolation between the average phase valuesover the first and the second pilot symbol clusters .

We observe that estimating only the time-average or thelinear trend of the phase noise yields poor BER performance,except for small K . For K = 10, the DCT-based algorithm


0

1

2

3

4

5

6

BE

Rde

grad

atio

n(d

B)

10 100 1000

Block length K

Time-average, SCEN3 (1)DCT algorithm, SCEN1 (1)Luise et al. (1)Time-average, SCEN3 (2)DCT algorithm, SCEN1 (2)Luise et al. (2)

Figure 10: Comparison of BER degradation for BER = 10−4 asfunction of K . (1) η = 20% and σΔ= 3◦; (2) η = 10% and σΔ= 2◦.

also estimates the time-average only (because N = 1 isoptimum for K = 10); we observe that SCEN3 (with pilotsymbols at positions 0 and 9) performs slightly better thanthe DCT-based algorithm (with pilot symbols at positions2 and 7) for K = 10. However, when the block length isincreased, the DCT algorithm that estimates multiple DCTcoefficients outperforms both SCEN3 and Luise et al. andleads to a BER degradation that decreases with increasing Kuntil an optimal value for K is reached.

6. Complexity Analysis

In order to assess the complexity of the proposed algo-rithm, we determine the number of complex multiplicationsrequired per symbol interval. The calculation of the secondterm in (18) requires the highest number of computations.This term can be evaluated in the following ways.

(1) In a first approach, (K/KP)ΨKΨPTr′ is calculated

via two matrix multiplications: first ΨPT (dimension N ×

KP) and r′ (dimension KP × 1) are multiplied and then(K/KP)ΨK (dimension K × N)and ΨP

Tr′ (dimension N ×1) are multiplied. The resulting complexity is of the orderO(NKP + KN) ≈ O(KN), with the approximation holdingfor K � KP . Hence, the complexity per symbol intervalamounts to O(N).

0

5

10

15

20

25

30

Com

puta

tion

alco

mpl

exit

y

10 100 1000

Block length K

DCT approach 1DCT approach 3Luise et al.

Figure 11: Complexity comparison for the proposed algorithm(approaches 1 and 3) and for Luise et al. algorithm.

(2) In a second approach, (K/KP)ΨKΨPTr′ is calculated

via a single-matrix multiplication: (K/KP)ΨKΨPT (dimen-

sionK×KP) and r′ (dimensionKP×1) are multiplied. Takinginto account that (K/KP)ΨKΨP

T can be computed offline,the resulting complexity per symbol isO(KP). AsN ≤ KP , thefirst approach is to be preferred over the second approach.

(3) The third approach exploits the fact that ΨK andΨP are submatrices of K × K and KP × KP DCT transformmatrices, respectively. Hence, the two matrix multiplicationsfrom the first approach can be replaced by an inverseDCT transform (size KP) followed by a DCT transform(size K). As K � KP , the complexity of the size-K DCTdominates. The DCT of a vector {s(0), s(1), . . . , s(K − 1)}of length K can be obtained by calculating the discreteFourier transform (DFT) of its even expansion {s(K −1), . . . , s(1), s(0), s(0), s(1), . . . , s(K − 1)} (note that the evenexpansion has length 2K). As the FFT algorithm usedfor calculating the DFT of length M has a computationalcomplexity O(M log2(M)), the complexity of the size- KDCT is O(2K log2(2K)), yielding a complexity per symbolinterval of O(log2(4K2)).

The complexity per symbol interval of the phase noiseestimation method used by Benvenuti et al. [11] is aboutO(1). Figure 11 shows the order of complexity as a func-tion of the block length K , for the proposed algorithm(approaches 1 and 3) and for Luise et al. algorithm;the result related to the first approach in the proposedalgorithm corresponds to taking for each K the value ofN that is optimum for σΔ= 3◦. Luise et al. algorithm has


a smaller complexity than the proposed algorithm, but thelatter algorithm outperforms the former, especially whenthe phase noise is strong. For the proposed algorithm,we notice that matrix multiplication according to the firstapproach leads to the lowest computational complexity forK < 400. As K becomes larger than 400, calculationvia FFT (third approach) is less complex. At the point(Kopt,Nopt) = (400, 20) yielding minimum BER degradation(see Figure 9), the first and third approaches give rise to thesame complexity.

7. Conclusions and Remarks

In this contribution, we have considered an ad hoc feed-forward data-aided phase noise estimation algorithm thatis based on the estimation of only a few (N) coefficientsof the DCT basis expansion of the time-varying phase. Thealgorithm does not require detailed knowledge about thephase noise statistics. Linearization of the observation modelhas indicated that the mean-square error of the resultingestimate consists of an additive noise contribution (thatincreases with N) and an MSE floor caused by the phasenoise modeling error (that decreases with N). The noisecontribution coincides with the Cramer-Rao lower bound.

These analytical findings have been confirmed by meansof computer simulations. The influence of the positionand number KP of pilot symbols inserted into the sym-bol sequence has been investigated. Computer simulationswere carried out for several pilot symbol configurations.Arranging the pilot symbols according to (14), such thatthe subsampled DCT basis functions remain orthogonal,reduces the BER degradation as compared to the case of apreamble/postamble or midamble pilot symbol arrangementwith estimation of only the time-average; in addition,the configuration (14) allows to estimate up to KP DCTcoefficients with a reduced computational complexity. TheBER degradation can be minimized by a suitable choice ofblock length K , the number KP of pilot symbols, and thenumber N of DCT coefficients to be estimated.

The considered DCT-based phase estimation algorithmmakes use of the energy associated with the pilot symbolsonly. Further research will involve the incorporation of theDCT-based algorithm in an iterative phase noise estimationalgorithm that exploits soft decisions about the data symbols,so that the resulting algorithm benefits from the energyassociated with the data symbols as well. The performanceand complexity of such an iterative algorithm will beinvestigated and compared to other iterative algorithms(such as those from [12–16]).

Acknowledgments

The authors wish to acknowledge the activity of the Networkof Excellence in Wireless COMmunications (NEWCOM++)of the European Commission (Contract no. 216715) thatmotivated this work. This work is also supported by theFWO Project G.0047.06 Advanced space-time processing

techniques for communication through multiantenna sys-tems in realistic mobile channels.

References

[1] A. Demir, A. Mehrotra, and J. Roychowdhury, “Phase noisein oscillators: a unifying theory and numerical methods forcharacterization,” IEEE Transactions on Circuits and Systems I,vol. 47, no. 5, pp. 655–674, 2000.

[2] T. E. Parker, “Characteristics and sources of phase noise instable oscillators,” in Proceedings of the 41st Annual FrequencyControl Symposium, pp. 99–110, Philadelphia, Pa, USA, May1987.

[3] G. B. Giannakis and C. Tepedelenlioglu, “Basis expansionmodels and diversity techniques for blind identification andequalization of time-varying channels,” Proceedings of theIEEE, vol. 86, no. 10, pp. 1969–1986, 1998.

[4] J. K. Tugnait and W. Luo, “Blind space-time multiuserchannel estimation in time-varying DS-CDMA systems,” IEEETransactions on Vehicular Technology, vol. 55, no. 1, pp. 207–218, 2006.

[5] O. Rousseaux, G. Leus, and M. Moonen, “Estimation andequalization of doubly selective channels using known symbolpadding,” IEEE Transactions on Signal Processing, vol. 54, no.3, pp. 979–990, 2006.

[6] J.-C. Nallatamby, M. Prigent, E. Vaury, A. Laloue, M. Camiade,and J. Obregon, “Low phase noise operation of microwaveoscillator circuits,” IEEE Transactions on Ultrasonics, Ferro-electrics and Frequency Control, vol. 47, no. 2, pp. 411–420,2000.

[7] J. Mukherjee, “Optimizing MOSFET channel width for lowphase noise in LC oscillators,” in Proceedings of the 50thMidwest Symposium on Circuits and Systems (MWSCAS ’07),pp. 610–613, Montreal, Canada, August 2007.

[8] D. Y. Jung and C. S. Park, “Power efficient Ka-band low phasenoise VCO in 0.13 μm CMOS,” Electronics Letters, vol. 44, no.10, pp. 628–630, 2008.

[9] H. Meyr, M. Moeneclaey, and S. Fechtel, Digital Communica-tion Receivers: Synchronization, Channel Estimation, and SignalProcessing, Wiley Series in Telecommunications and SignalProcessing, John Wiley & Sons, New York, NY, USA, 1998.

[10] U. Mengali and A. N. D’Andrea, Synchronization Techniquesfor Digital Receivers, Plenum Press, New York, NY, USA, 1997.

[11] L. Benvenuti, L. Giugno, V. Lottici, and M. Luise, “Code-aware carrier phase noise compensation on turbo-codedspectrally-efficient high-order modulations,” in Proceedingsof the 8th International Workshop on Signal Processing forSpace Communications (SPSC ’03), pp. 177–184, Catania, Italy,September 2003.

[12] G. Colavolpe, A. Barbieri, and G. Caire, “Algorithms foriterative decoding in the presence of strong phase noise,” IEEEJournal on Selected Areas in Communications, vol. 23, no. 9, pp.1748–1757, 2005.

[13] J. Dauwels and H.-A. Loeliger, “Phase estimation by messagepassing,” in Proceedings of the IEEE International Conference onCommunications (ICC ’04), vol. 1, pp. 523–527, Paris, France,June 2004.

[14] E. Panayirci, H. Cirpan, and M. Moeneclaey, “A sequentialMonte Carlo method for blind phase noise estimation anddata detection,” in Proceedings of the 13th European Sig-nal Processsing Conference (EUSIPCO ’05), Antalya, Turkey,September 2005.


[15] E. Panayırcı, H. A. Cırpan, M. Moeneclaey, and N. Noels,“Blind-phase noise estimation in OFDM systems by sequentialMonte Carlo method,” European Transactions on Telecommu-nications, vol. 17, no. 6, pp. 685–693, 2006.

[16] S. Godtmann, N. Hadaschik, A. Pollok, G. Ascheid, and H.Meyr, “Iterative code-aided phase noise synchronization basedon the LMMSE criterion,” in Proceedings of the 8th IEEE SignalProcessing Advances in Wireless Communications (SPAWC ’07),pp. 1–5, Helsinki, Finland, June 2007.

[17] D. Petrovic, W. Rave, and G. Fettweis, “Effects of phase noiseon OFDM systems with and without PLL: characterizationand compensation,” IEEE Transactions on Communications,vol. 55, no. 8, pp. 1607–1616, 2007.

[18] ETSI, “Digital video broadcasting (dvb), second generationframing structure, channel coding and modulation systems forbroadcasting, interactive services, news gathering and otherbroadband satellite applications”.

[19] V. S. Abhayawardhana and I. J. Wassell, “Common phase errorcorrection with feedback for OFDM in wireless communica-tion,” in Proceedings of the IEEE Global TelecommunicationsConference (GLOBECOM ’02), vol. 1, pp. 651–655, Taipei,Taiwan, November 2002.


Research Article

Monte Carlo Solutions for Blind Phase Noise Estimation

Frederik Simoens,1 Dieter Duyck,1 Hakan Cırpan,2 Erdal Panayırcı,3 and Marc Moeneclaey1

1 Department of Telecommunications and Information Processing, Faculty of Engineering, Ghent University, 9000 Gent, Belgium2 Department of Electrical-Electronics Engineering, The University of Istanbul, Avcilar 34850, Istanbul, Turkey3 Department of Electronics Engineering, Kadir Has University, Cibali 34083, Istanbul, Turkey

Correspondence should be addressed to Frederik Simoens, [email protected]

Received 30 June 2008; Accepted 7 January 2009


This paper investigates the use of Monte Carlo sampling methods for phase noise estimation on additive white Gaussian noise(AWGN) channels. The main contributions of the paper are (i) the development of a Monte Carlo framework for phase noiseestimation, with special attention to sequential importance sampling and Rao-Blackwellization, (ii) the interpretation of existingMonte Carlo solutions within this generic framework, and (iii) the derivation of a novel phase noise estimator. Contrary to thead hoc phase noise estimators that have been proposed in the past, the estimators considered in this paper are derived from solidprobabilistic and performance-determining arguments. Computer simulations demonstrate that, on one hand, the Monte Carlophase noise estimators outperform the existing estimators and, on the other hand, our newly proposed solution exhibits a lowercomplexity than the existing Monte Carlo solutions.

Copyright © 2009 Frederik Simoens et al. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

1. Introduction

Instabilities of local oscillators are an inherent impairment ofcoherent communication schemes [1, 2]. Such instabilitiesgive rise to a time-varying phase difference between theoscillator at the transmitter and the receiver sides. As thephase of the transmitted symbols conveys (part of) theinformation of a coherent transmission, the carrier phasemust be known to the receiver before the recovery of thetransmitted information can take place. Estimation of thecarrier phase is henceforth a crucial task of a coherentreceiver.

As long as frugality with respect to the available resourcesis deemed important, this estimation process should occurwithout inserting too many training or pilot symbols intothe transmitted data sequence. The presence of trainingsymbols in the data sequence reduces the spectral efficiencyand power efficiency of the transmission. Estimating thecarrier phase based on the unknown information carryingdata symbols is definitely more efficient in that respect.

Spurred by its great importance, the research on phasenoise estimation evolved into a relatively mature state nowa-days. There already exists a myriad of estimation strategies

and most of them achieve a satisfactory performance—atleast under the specific circumstances for which they weredesigned [1–5]. The existing estimators range from feed-forward techniques assuming a piecewise constant carrierphase over the duration of a predefined interval [1–3] tomore advanced algorithms which track the movements ofthe carrier phase from symbol to symbol [4, 5]. Despite allthese ad hoc efforts, no optimal solutions—from a classicalestimation point of view—to the phase noise estimationproblem have yet been presented. Optimal estimation of thephase noise, for example, in a maximum-likelihood or max-imum a posteriori sense, without knowing the transmittedinformation turns out to be an extremely complicated task.

The purpose of the present paper is exactly to investigatethe phase noise problem within a classical estimation context.We will define an optimal receiver strategy and explore theextent to which Monte Carlo methods can be used to obtaina practical implementation of this optimal receiver. In doingso, we will furnish a thorough overview of Monte Carlomethods and their application to phase noise estimation. Itis only fair to point out that Monte Carlo methods havealready been considered for phase noise estimation in thepast [6, 7]. However, these solutions are limited to uncoded


systems and explore only one of the possible Monte Carlotechniques. In this paper, we will lay out a more generalMonte Carlo framework and integrate the existing estimatorswithin this framework. We will also present a novel estimatorand demonstrate that it bears a lower complexity than theexisting techniques.

This paper is organized as follows. Section 2 describesthe channel model. The objective of the paper and theconnection with existing phase noise estimators is outlined inSection 3. Since it is unfair to assume that everyone workingin the field of phase noise estimation is acquainted withMonte Carlo methods, we devote an entire and relativelylarge section of this paper to the introduction of MonteCarlo methods and sequential importance sampling inparticular (Section 4). The framework presented in Section 4is thereafter applied to the phase noise problem for uncodedand coded systems in Sections 5 and 6, respectively. Finally,Section 7 provides numerical results and Section 8 wraps upthe paper.

2. Channel Model

2.1. Phase Noise Channel Model. We consider a digital com-munication scheme, where the information is conveyed byN complex-valued data symbols {ak}k=1,...,N . These symbolstake on values from a predefined constellation set Ω. Theaverage energy of the symbols is equal to Es. Concerning thechannel model, we consider a discrete-time additive whiteGaussian noise channel (AWGN), susceptible to Wienerphase noise. In order to not overcomplicate the analysis,other receiver impairments are ignored. The received signalsamples can, therefore, be written as

rk = ak exp(jθk)

+ nk, (1)

θk = θk−1 + δk, (2)

for k = 1, . . . ,N and θ0 uniformly distributed within[−π,π]. The additive (thermal) noise samples {nk} are zero-mean i.i.d. complex-valued and circular symmetric Gaussianvariables, with a variance of the real and imaginary part equalto σ2

n . The zero-mean i.i.d. Gaussian random variables {δk}are real-valued with a variance equals to σ2

δ . The channelmodel can equivalently be described by the following twoprobability functions:

p(rk∣∣ak, θk

) = 12πσ2

nexp

(− 1

2σ2n

∥∥rk − ak exp(jθk)∥∥2

),

(3)

p(θk∣∣θk−1

) = 1√2πσδ

exp

(− 1

2σ2δ

∥∥θk − θk−1∥∥2). (4)

We assume that the receiver knows these distributions and isable to evaluate them for different values of rk, ak, θk, andθk−1.

2.2. Linearized Phase Noise Channel Model. The carrier phaseaffects the received signal in a nonlinear way. As will become

apparent in the remainder of this paper, it can be useful tolinearize this model. We convert the channel model (1) intoa linear form as follows:

rk = ak exp(jθk−1

)exp

(j(θk − θk−1

))+ nk

� ak exp(jθk−1

)(1 + j

(θk − θk−1

))+ nk,

(5)

where θk−1 represents an initial estimate of the phase atinstant k − 1. This approximation is valid as long as |θk −θk−1| � 1. Hence, the linearized channel model can only be

invoked if σ2δ is small, and an accurate phase estimate θk−1 is

available.

3. Problem Formulation and Prior Work

In a coherent communication scheme, the receiver needsto know the phase θk at each time instant k beforedetection can take place. The traditional way to acquirethis information is by estimating the carrier phase. If thecarrier phase remains constant over a relatively long period,standard feed-forward estimation techniques can be applied.In the presence of severe phase noise, however, other moreingenious techniques are called upon. Before we describe ourapproach in that regard, let us review some of the existingsolutions.

3.1. Prior Work. Existing phase noise estimators or trackershave one thing in common. Their derivation does not stemfrom a probabilistic analysis, but is rather driven by prag-matic (and scenario dependent) arguments. Incidentally, theuse of feedback loops or phase-locked loops is commonpractice [1].

A typical form to which these estimators can generally bereduced is

θk = θk−1 + KkI[rka

∗k exp

(− jθk−1)]

, (6)

where Kk is a positive parameter, θk denotes the phaseestimate at instant k, and ak denotes an estimate (soft or harddecision) of ak, using the phase estimate from a previous timeinstant and possible additional information from a decoder(see also Section 6). Obviously, there exist other estimatorsas well, for example, [8]. To our knowledge, however, theirapplication is limited to pilot symbols only. Estimators ofthe form (6) are based on the linear model (5) and exploit

the fact that I[rka∗k exp(− jθk−1)] hazards an estimate of the

difference between θk−1 and the true value of θk. The impactof the phase noise and the additive (thermal) noise can bebalanced by tuning the parameterKk. Provided the linearizedmodel (5) is a valid approximation, the optimal values, in aminimum mean squared error sense, of Kk follow from theextended Kalman filter equations [9].

For a wide range of applications, these existing estimatorsrender a satisfactory performance, but they nevertheless lacka rock-solid theoretical foundation. In the next section, wewill outline our strategy to settle this issue.


3.2. Probabilistic Solution. In order to lay the foundation forthe analysis in the next two sections, let us investigate whatreally determines the performance of the communicationsystem. For now, we will assume that the transmitted symbolsare a priori independent (and hence uncoded). The extensionto coded systems is covered separately in Section 6. We candefine the following on-the-fly detection rule:

ak = arg maxω∈Ω

p(ak = ω

∣∣r1:k), (7)

where r1:k is a shorthand notation for r1:k.= [r1, . . . , rk].

The on-the-fly label stems from the fact that a decision onak can be made based on readily available information attime instant k, that is, the received samples r1:k. Detectorsthat exploit “future” received information are not consideredhere. It is easily shown that a detector defined by (7)minimizes the symbol error probability, again, for a receiverthat only has access to received information up to instantk. From this, it seems that all it takes to devise an optimalreceiver is to compute and maximize p(ak|r1:k). We canperform a marginalization with respect to the unknownphase θk and exploit the fact that the transmitted symbolsare uncorrelated. With Bayes’ rule, the probability functioncan thus be rewritten as

p(ak∣∣r1:k

)∝∫

θkp(ak∣∣rk, θk

)p(θk∣∣r1:k

)dθk. (8)

A closed-form expression for p(ak|rk, θk) follows immedi-ately from the combination of (3) and the prior distributionp(ak). Hence, the remainder of this paper will focus on thederivation of p(θk|r1:k) and the ensuing computation of theintegral in (8). In particular, we will investigate the use ofMonte Carlo methods for the computation of (8).

4. Monte Carlo Framework

The purpose of this section is to provide a succinct intro-duction to Monte Carlo techniques. Section 5 addresses thespecific application to our phase noise problem.

4.1. Particle Representation. Representing a distribution bymeans of samples or particles drawn from it is an appealingalternative in case the actual distribution defies an analyticalrepresentation. The rationale behind the particle filteringapproach is that as long as we generate enough samples fromthe distribution, further processing with this distribution canbe performed using particles of the distribution rather thanthe actual distribution. An example will serve to illustrate thisbenefit.

Suppose that we can easily generate a number ofsamples x ( j), j = 1, . . . , Jmax whose statistics are specifiedby a distribution p(x). Then, we are able to approximateexpectations of the form

I =∫

xf (x)p(x)dx, (9)

by means of a particle evaluation

Is = 1Jmax

Jmax∑

j=1

f(x ( j)). (10)

It can be shown that Is converges to I as the number ofparticles grows [10]. Hence, as long as we are able to drawsamples from p(x), it is not necessary to solve the integralfrom (9) analytically. The next section elaborates the casewhen sampling from p(x) is not that straightforward.

4.2. Importance Sampling. The technique outlined aboveonly makes sense when it is easy to draw samples from p(x).If this is not the case, we can still proceed by using anotherwell-chosen distribution π(x),from which it is easy to drawsamples , and draw samples from it. Denoting these samplesagain by x ( j), j = 1, . . . , Jmax, the integral from (9) can beapproximated by

Iis =Jmax∑

j=1

w ( j) f(x ( j)), (11)

where the so-called importance weights w ( j) are given by

w ( j) ∝ p(x ( j)

)

π(x ( j)

) . (12)

These weights are normalized such that∑Jmax

j=1 w( j) = 1. The

idea is to assign different weights to the samples x ( j) tocompensate for the difference between the target distributionp(x) and the importance sampling distribution π(x). Again,it can be shown that Iis converges to I for a large number ofsamples and under mild conditions with respect to the choiceof π(x) [10].

In the remainder, we denote the particle representationof a distribution p(x) by p(x) ↔ {x ( j); w ( j)}.

4.3. Sequential Importance Sampling. The true power of theMonte Carlo framework gets unlocked when it is applied tohidden Markov (or state-space) models. An observation rkis said to be the output of a hidden Markov process if itcomplies with

rk ∼ p(rk∣∣xk

),

xk ∼ p(xk∣∣xk−1

),

(13)

where xk denotes the (hidden) state variable of the Markovprocess and the symbol ∼ means that the right-hand sideis the probability function of the variable on the left-handside. Note that we do not impose any restriction about thenature of rk or xk, these can be discrete or continuous, scalaror vector variables.

A typical problem associated with a Markov processinvolves the derivation of the a posteriori state distributionp(x1:k

∣∣r1:k) or inferences thereof. The purpose of this sectionis to explain how to draw samples from p(x1:k

∣∣r1:k) in arecursive manner, the process called sequential importancesampling (SIS).


4.3.1. Derivation of the Algorithm. The first step entails thefactorization of our target distribution and manipulating itinto a recursive expression

p(x1:k

∣∣r1:k)∝ p

(x1:k−1

∣∣r1:k−1)p(xk, rk

∣∣x1:k−1, r1:k−1)

= p(x1:k−1

∣∣r1:k−1)p(xk, rk

∣∣xk−1).

(14)

The first transition follows from Bayes’ rule and the omissionof the normalizing constant 1/p(rk|r1:k−1), whereas thesecond transition exploits the Markov nature of the problem.Now, suppose that we already have a particle representation

p(x1:k−1|r1:k−1) ↔ {x ( j)1:k−1; w

( j)k−1}, where the samples x

( j)1:k−1

are drawn from a distribution πk−1(x1:k−1). From (12), weknow that the corresponding importance weights are then

given by w( j)k−1 ∝ p(x

( j)1:k−1|r1:k−1)/πk−1(x

( j)1:k−1). The next step

is to draw, for every sample x( j)1:k−1, a new sample x

( j)k from a

distribution πk|k−1(xk|x ( j)1:k−1), such that x

( j)1:k

.= [x( j)1:k−1, x

( j)k ]

represents a sample from

πk(x1:k

) = πk−1(x1:k−1

)πk|k−1

(xk|x1:k−1

). (15)

The associated importance weights follow from (14) and(15):

w( j)k = p

(x

( j)k , x

( j)1:k−1

∣∣r1:k)

πk(x

( j)1:k

)

= p(x

( j)1:k−1

∣∣r1:k−1)

πk−1(x

( j)1:k−1

)p(x

( j)k , rk

∣∣x ( j)k−1

)

πk|k−1(x

( j)k

∣∣x ( j)k−1

)

= w( j)k−1

p(x

( j)k

∣∣x ( j)k−1

)p(rk∣∣x ( j)

k

)

πk|k−1(x

( j)k

∣∣x ( j)1:k−1

) .

(16)

The choice of the importance sampling distributionπk|k−1(·|·) plays an important role with respect to theperformance and stability of the algorithm. The nextsection elaborates this issue furthermore. To conclude thissection, we summarize the operation of the SIS algorithm inAlgorithm 1.

4.3.2. Degeneracy of Sequential Importance Sampling. Oneparticularly annoying problem with SIS is that the varianceof the importance weights increases as k becomes larger [11].This is an adverse property as it is intuitively clear that for afixed number of samples, the best approximation, in termsof its ability to evaluate the expectation of a function (11),to a distribution is obtained using equal-weight samples.The increasing variance is so persevering that almost allsamples bear a negligible weight after a few recursions.This implies that the distribution is represented by far lessparticles than the Jmax original particles. Obviously, this doesnot bode well for the accuracy of the approximation of thedistribution and the performance of ensuing algorithms.A detriment that manifests itself especially when dealingwith high-dimensional state spaces, that is, where the statevariable x is actually a vector. Fortunately, this problem canbe resolved by taking the following measures.

(1) Start from a sample representation p(x0) ↔ {x ( j)0 ; w

( j)0 }

(see Section 4.2).(2) for k = 1 to N do(3) for j = 1 to Jmax do

(4) Draw new sample x( j)k from πk|k−1

(xk∣∣x ( j)

1:k−1

).

(5) Update the importance weights

˜w( j)

k = w( j)k−1

p(xk, j

∣∣x1:k−1, j)p(rk∣∣xk, j

)

πk|k−1(xk, j

∣∣x1:k−1, j) .

(6) Normalize the importance weights

w( j)k =

˜w( j)

k∑

i˜w

(i)

k

.

(7) Set x( j)1:k

.= [x ( j)k , x

( j)1:k−1

].

(8){x

( j)1:k ; w

( j)k

}is a new sample of p

(x1:k

∣∣r1:k).

(9) end for(10) end for

Algorithm 1: Sequential importance sampling.

(1) Choice of the Sampling Distribution. It is important tocarefully design the importance sampling distribution. Thedistribution should generate particles or samples in theregions of the state space corresponding to high values of thedistribution that we wish to approximate (in this case, theposterior probability function). In this way, the correctionadministered by the weights can be kept to a bare minimum.It can be shown [11] that the variance of the weights isminimized for

πk|k−1(xk∣∣x1:k−1

) = p(xk∣∣xk−1, rk

). (17)

The corresponding weight update equation then becomes

w( j)k = w

( j)k−1p

(rk∣∣x ( j)

k−1

). (18)

Note that the weight update (18) does not depend on the

current sample x( j)k . This intuitively explains the optimality

of (17) since the particular choice of the samples x( j)k does

not alter the weights, and hence, does not affect (read:increase) their variance. Unfortunately, this design measurewill only slow down the process of degeneration; it will notbring it to a standstill. Furthermore, as will become apparentthrough the remainder of this paper, it is often very difficultto draw samples from (17). In this case, there is no alternativethan to use a suboptimal distribution. The prior importancedistribution p(xk | xk−1) forms a good alternative as itis often easy to sample from it. The corresponding weightupdate function follows from (16) and is given by p(rk |x

( j)k ).

(2) Resampling. A more effective approach to avoid degen-eracy is resampling. The idea is to remove samples withnegligible weight from the set and to include better chosensamples (which actually contribute in a meaningful mannerto the representation of the target distribution). There areseveral methods to implement this rule in practice. The


prevailing method is simply to draw Jmax new and equal-weight samples from the old distribution (defined by theweights of the old samples). Samples associated with lowimportance weights are most probably eliminated by this rule[11, 12].

(3) Rao-Blackwellization. Lesser known, but no less inter-esting is the Rao-Blackwellization method. The idea is thatwhenever it is possible to perform some part of the recursionanalytically, it definitely pays to do so. More specifically,it is possible to show, as an instance of the Rao-Blackwelltheorem [13, 14], that integrating out some of the statevariables in (9) analytically improves the accuracy of theapproximation (11). Moreover, it allows to sharply reduce thenumber of samples used in the SIS algorithm and to mitigatethe degeneracy. In order to provide a formal outline of theprocedure, let us assume that the state variable x consistsof two parts x

.= [y, z]. Rao-Blackwellization boils down toconverting the approximation from (11) into

Irb =Jmax∑

j=1

w ( j)g(z ( j)), (19)

where

g(z ( j)) .=

∫

yf(z ( j), y

)p(y | z ( j))dy, (20)

and where p(z) ↔ {z ( j); w ( j)}. Again, it can be shown that Irb

converges to I , defined in (9), for a large number of samples.Obviously, it only makes sense to rearrange (9) into (19) ifp(y | z ( j)) can be computed analytically, and the integrationfrom (20) is tractable.

In a similar vein, we can also retrieve a Rao-Blackwellizedversion of the SIS algorithm [14]. It turns out that the weightupdate equation is now given by

w( j)k = w

( j)k−1

p(z

( j)k

∣∣z ( j)1:k−1, r1:k

)p(rk∣∣z ( j)

1:k−1, r1:k−1)

πk|k−1(z

( j)k

∣∣z ( j)1:k−1

) , (21)

and the optimal importance sampling distribution is givenby

πk|k−1(zk∣∣z1:k−1

) = p(zk∣∣z1:k−1, r1:k

). (22)

It is interesting to point out that, in general, the sequence z1:k

is no longer a Markov process, neither is the observation rkindependent from r1:k−1 given z1:k−1.

5. Phase Noise Estimation for Uncoded Systems

Geared with the Monte Carlo framework from the previoussection, we are now ready to tackle our original phase noiseproblem.

5.1. Joint Phase and Symbol Sampling. In a first attempt,we cast the problem under investigation immediately intothe SIS algorithm by defining xk

.= [ak, θk]. The original

state space model from (1), (2) is then a special case of thegeneral model from (13). Application of the SIS algorithmimmediately results in a sampled version of the a posterioriprobability function p(a1:k, θ1:k|r1:k).

The optimal importance sampling function is defined in(17), and can be decomposed as follows:

πk|k−1(xk∣∣x ( j)

1:k−1

) = p(ak, θk

∣∣rk, a( j)1:k−1, θ

( j)1:k−1

)

= p(θk∣∣rk, θ

( j)k−1, ak

)p(ak∣∣rk, θ

( j)k−1

).

(23)

The decomposition above allows to produce the symboland phase samples in two steps. First, we draw the symbolsample, and then for each symbol sample, we generate aphase sample:

a( j)k ∼ p

(ak∣∣rk, θ

( j)k−1

), (24)

θ( j)k ∼ p

(θk∣∣rk, θ

( j)k−1, a

( j)k

). (25)

In order to produce these samples, we need the abovefunctions in a closed-form expression. The first probabilityfunction can be written as follows:

p(ak∣∣rk, θ

( j)k−1

)∝ p(rk, ak

∣∣θ ( j)k−1

)

= p(ak)∫

θkp(rk∣∣ak, θk

)p(θk∣∣θ ( j)

k−1

)dθk.

(26)

The exact evaluation of the right-hand side of (26) requires anumerical integration which is not very practical. However,as shown in Appendix A, we can obtain the following closed-form approximation, valid for small σ2

δ :

p(rk, ak

∣∣θ ( j)k−1

)

∝ p(ak)

exp

(− 1

2σ2n + 2

∥∥ak∥∥2σ2θ

∥∥rk − e jθ( j)k−1ak

∥∥2)

.= f( j)

1

(ak).

(27)

Note that p(ak|rk, θ( j)k−1) is equal to f

( j)1 (ak) up to a scaling

factor. It remains to normalize this function before samplescan be drawn.

In Appendix B, we show that the distribution from (25)can be reduced to

p(θk∣∣rk, θ

( j)k−1, a

( j)k

)∝ p(rk∣∣θk, a

( j)k

)p(θk∣∣θ ( j)

k−1

)

∝ exp

(− 1

2σ2u

∥∥θk − θu∥∥2)

,(28)

where θu and σ2u are given by

θu = θ( j)k−1 +

σ2u

σ2nI{rk(a

( j)k

)∗exp

(− jθ( j)k−1

)}, (29)

σ2u =

σ2nσ

2δ

σ2n +

∥∥a ( j)k

∥∥2σ2δ

. (30)


From (28), it follows that the updated samples θ( j)k are

obtained by generating Gaussian samples with mean θu andvariance σ2

u . Finally, the associated weight update function(18) follows immediately from (27)

p(rk∣∣a ( j)

1:k−1, θ( j)1:k−1

) = p(rk∣∣θ ( j)

k−1

)

=∑

ak∈Ωf

( j)1

(ak).

(31)

Benefits and Drawbacks. The benefit of this algorithm isthat it renders an asymptotically optimal solution, for ahigh number of particles, to the phase noise problem,provided that the linearized channel model approximation isaccurate.

The major drawbacks are as follows.

(i) The sample space is two-dimensional. In general,more samples are required to represent a distributionof more than one variable. Obviously, this weighs onthe overall complexity.

(ii) In order to generate a new sample pair [a( j)k , θ

( j)k ],

one has to evaluate (27), (29), and (31). Theseequations are relatively complicated and have to beexecuted for all k, j.

(iii) Finally, the algorithm is based on the linearizedchannel model and tends to be less accurate forhigher values of σ2

δ .

5.2. Rao-Blackwellization. To overcome the drawbacksencountered with the previous method, we explorethe application of the Rao-Blackwellization method inthis section. We distinguish two separate approaches.The first one is a symbol-based sampling method. Thismethod is not new and has already been investigated in[6], albeit without establishing the link with the Rao-Blackwellization framework. For completeness, we provide aRao-Blackwellized derivation of the algorithm in this paper.

In the second and new approach, we only draw samplesof the carrier phase. As we will demonstrate, this offerssignificant computational advantages.

5.2.1. Symbol-Based Sampling. We apply the Rao-Blackwellization method from Section 4.3.2 by settingy = θ1:k and z = a1:k. The optimal importance samplingdistribution is given by (22), which, for the current scenario,breaks down to

πk|k−1(ak∣∣a ( j)

1:k−1

) = p(ak∣∣r1:k, a

( j)1:k−1

)

∝ p(ak, rk

∣∣r1:k−1, a( j)1:k−1

)

=∫

θkp(ak, rk

∣∣θk)p(θk∣∣r1:k−1, a

( j)1:k−1

)dθk.

(32)

The distribution p(θk | r1:k−1, a( j)1:k−1) can be found in a

recursive manner by applying a Kalman filter to the state

space model of (5), (2), which is equivalent to an extendedKalman filter applied to (1), (2). In Kalman parlance, therequested distribution corresponds to the prediction step

of the Kalman filter. For every symbol sequence a( j)1:k−1, we

should run a Kalman filter to keep track of the carrier phasedistribution. This means that we should run Jmax Kalmanfilters in parallel with the SIS algorithm. Denoting the mean

and variance of the carrier phase distribution by μ( j)k|k−1 and

σ( j)2k|k−1, respectively, the integral from (32) can be evaluated

analytically as follows:

πk|k−1(ak∣∣a ( j)

1:k−1

)∝ p(ak) exp

(− 1

2σ( j)2s

∥∥rk − ake jμ( j)k|k−1

∥∥2)

.= f( j)

2

(ak),

(33)

where σ( j)2s

.= σ2n +σ

( j)2k|k−1. The weight update function follows

from (21) and is given by

p(rk∣∣r1:k−1, a

( j)1:k−1

) =∑

ak

p(ak, rk

∣∣r1:k−1, a( j)1:k−1

)

=∑

ak∈Ωf

( j)2

(ak).

(34)

Denote the mean and variance of the carrier variable atinstant k conditioned on the observations up to instant l byμk|l and σ2

k|l, as follows: This succinct derivation captures themain idea and furnishes the key equations of the symbol-based sampling approach.

Benefits and Drawbacks. The main benefit of this approachis the reduction of the sample space to one dimension. Byrunning a Kalman filter in parallel with the particle filter,the posterior distribution of the carrier phase can be trackedanalytically.

However, the following two drawbacks remain.

(i) The algorithm still relies on the linearized channelmodel and suffers from the disadvantages mentionedin Section 5.1.

(ii) The computational complexity remains high due tothe required evaluation of (33), (34), and the Kalmanfilter evaluation.

5.2.2. Phase-Based Sampling. In this second method, sam-ples are drawn of the carrier phase rather than of the datasymbols. We will distinguish two different approaches withinthis method. In the first approach, we use the optimalimportance sampling distribution, whereas in the secondapproach, an alternative distribution is explored. We willshow that the suboptimal sampling method results in a loweroverall complexity.


(a) Optimal Distribution. The optimal importance samplingdistribution for the present case follows again from (22) asfollows:

πk|k−1(θk∣∣θ ( j)

1:k−1

) = p(θk∣∣r1:k, θ

( j)1:k−1

)

= p(θk∣∣rk, θ

( j)k−1

)

=∑

ak∈Ωp(θk∣∣rk, θ

( j)k−1, ak

)p(ak∣∣rk, θ

( j)k−1

).

(35)

The second transition follows from the fact that uk.= [rk, θk]

is a Markov process, provided that the transmitted symbolsare independent. The first distribution in the last line hasalready been derived in Section 5.1. We can simply reuse the

result obtained there if we replace a( j)k by ak in (28). The

second factor in (35) is also known and given by (26). Hence,

as it turns out, πk|k−1(θk|θ ( j)1:k−1) is a mixture of Gaussian

distributions. Sampling from this, a distribution is very

simple. First, draw a sample a( j)k from p(ak|rk, θ

( j)k−1). Then,

draw a phase sample from p(θk|rk, θ( j)k−1, a

( j)k ). The weight

update equation is again given by (31).This approach is almost identical to the approach from

Section 5.1. The only difference is that the samples of the datasymbols are not stored. Hence, this method will not mitigatethe inconveniences of the earlier described methods. Notethat this approach has also been investigated in [7].

(b) Prior Distribution. By carefully selecting the importancesampling distribution, however, we can obtain a significantsaving in the overall complexity. In this paragraph, weexplore the prior distribution of the phase (at instant kgiven phase samples up to k − 1) as a candidate samplingdistribution:

πk|k−1(xk∣∣x ( j)

1:k−1

) = p(θk∣∣θ ( j)

1:k−1

). (36)

Drawing samples from this distribution is very simple. Allwe need is to generate Gaussian noise samples and plug theminto (2). The weight update function follows from inserting(36) into (21) and is given by

p(rk∣∣θ ( j)

k

)∝∑

ak∈Ωp(ak)p

(rk∣∣ak, θ

( j)k

). (37)

The functions in the right-hand side of (37) follow immedi-ately from the channel model and are known.

Benefits and Drawbacks. The apparent simplicity of thelatter method raises high hopes regarding the computationalcomplexity. The only drawback of this method is that itdoes not use the optimal importance sampling distribution.However, as we will show in Section 7, the slightly moresamples required to surmount degeneration are more thancompensated by the reduced complexity of the method.

6. Phase Noise Estimation for Coded Systems

Let us now investigate how we can extend the algorithmsdescribed above to a coded system. For such a coded system,

(8) is no longer valid. The a posteriori probability of a symboltypically depends on all the entire frame of received signals.Therefore, (8) should be replaced with

p(ak | r1:k

)∝∫

θ1:k

p(ak | r1:k, θ1:k

)p(θ1:k | r1:k

)dθ1:k.

(38)

Straightforward application of the SIS algorithm is nolonger possible for two reasons. First, the code constraintprohibits to draw samples from p(θ1:k | r1:k) in a recursivemanner. In particular, the evaluation of the importancesampling and particle update equations is prohibitive in thepresence of a code constraint on the symbols. Second, theintegral in (38) cannot be evaluated using the importancesampling technique as we have no closed-form solution forp(ak | r1:k, θ1:k). The evaluation of p(ak | r1:k, θ1:k) requiresa complicated decoding step, which has to be executedfor every possible sample of θ1:k. Obviously, this becomesimpractical for a large number of samples.

Fortunately, we can extend the algorithms describedabove to a coded setup by means of iterative receiver pro-cessing. As shown in [15–19], there exists a solid frameworkbased on factor graph theory that dictates how the estimationand the decoding can be decoupled in a coded setup. It canbe shown that the factor graph solution converges to theoptimal solution under mild conditions. The loops that arisein the factor graph representation of the receiver should notbe too short. Extending the above algorithms to a codedsystem boils down to replacing the prior probabilities ofthe symbols p(ak) with the extrinsic probabilities providedby the decoder. These extrinsic probabilities are updated bythe decoder and exchanged in an iterative fashion with theestimator which, on its turn, updates the phase estimates.This process repeats until convergence of the algorithm isachieved. More details on this approach can be found in[15, 16]. Section 7 illustrates the performance of the resultingiterative receiver.

7. Numerical Results

We ran computer simulations to evaluate the performanceof the algorithms described above. We have adopted theWiener phase noise model from (1)-(2) and applied a QPSKsignaling. Unless mentioned otherwise, 50 samples were usedto represent the target distributions in the evaluation ofthe various Monte Carlo-based methods. The results formFigures 1 and 2 are for an uncoded setup, whereas Figures 3and 4 pertain to a coded system. In this latter case, a rate-1/2 16-state recursive convolutional code was employed, and5 iterations between the decoder and the estimator wereperformed.

The following paragraphs tender a discussion of theobtained results.

Ambiguities. Let us begin with an uncoded configuration.If the transmitted symbols are unknown, it is impossible toassess the true value of the carrier phase based on the receivedsignal. For QPSK, for instance, the carrier phase can only be


k = 10

p(θ|r

1:k)

0

0.5

1

θ (degrees)

0 50 100 150 200 250 300 350

(a)

k = 20

p(θ|r

1:k)

0

0.5

1

θ (degrees)

0 50 100 150 200 250 300 350

(b)

Figure 1: Histogram of p(θ|r1:k) ↔ {θ ( j); w ( j)} obtained by thephase-based sampling algorithm (σ2

δ= 5◦, Eb/N0 = 6 dB, uncodedQPSK). The dashed line indicates the true value of θk .

known up to a four-fold ambiguity. Figure 1 demonstratesthis fact. It portrays a histogram of the samples from thedistribution p(θk|r1:k), which were obtained through theevaluation of the phase-based sampling algorithm fromSection 5.2.2 (with the optimal sampling distribution). InFigure 1, only the symbols at instants 11 ≤ k ≤ 19 areknown to the receiver. Hence, the distribution p(θk|r1:k) fork = 10 is based solely on unknown symbols. As expected,the distribution exhibits 4 local maxima (at 90◦ intervals).At k = 20, however, these ambiguities have been resolvedbecause of the known symbols inserted before k = 20. Thisresult indicates that it is necessary to insert pilot symbols inthe data stream (at regular time instants).

Performance. Figures 2 and 3 illustrate the BER performanceof various algorithms for an uncoded and coded setups,respectively. We considered the transmission of blocks of400 QPSK symbols, with the periodic insertion of one pilotsymbol per 20 symbols (5% pilot overhead). The scenarioslabeled phase-based A and B correspond to the phase-basedsampling algorithm from Section 5.2.2, using the optimaland prior importance sampling distributions, respectively.The symbol-based algorithm corresponds to the algorithmwhich was proposed in [6] and has also been described inSection 5.2.1. These Monte Carlo approaches have also beencompared to conventional phase noise estimators. Perfor-mance curves are included for an extended Kalman filter,using either hard-symbol decisions, soft-symbol decisions,or pilot symbols only (see also Section 3.1). In a coded setup,these soft or hard symbol decisions are based on the availableposteriori probabilities of the symbols (available during thespecific iteration).

BE

R

10−4

10−3

10−2

10−1

100

Eb/N0

0 2 4 6 8 10

Phase-based APhase-based BSymbol-basedSoft decision

Hard decisionPilot onlyPerfect phase

Figure 2: BER performance for uncoded setup (σ2δ= 2◦, QPSK, 5%

pilots).

As we can observe from Figures 2 and 3, it definitely paysto exploit information from the unknown data symbols. Theestimators that are only based on pilot symbols give rise toa significant performance degradation. On the other hand,there is no much difference between the performance of thevarious blind estimators in the uncoded setup. This confirmsthat in an uncoded setup, the conventional estimators exhibita satisfactory performance. In the coded configuration, how-ever, the Monte Carlo methods outperform the conventionalmethods. Apparently, these conventional ad hoc methodsfail to operate at the lower SNR-values that can be achievedwith the use of coding. We furthermore observe that thephase-based estimators exhibit the best performance. Thereason that the symbol-based method performs not as goodis due to the fact that at high SNRs, the importance samplingdistribution is very peaky. Therefore, almost all samples

drawn from the distribution πk|k−1(ak|a ( j)1:k−1) will be equal

to each other. Hence, it takes a lot more samples to providean accurate representation of this latter distribution, and thealgorithm will suffer from cycle-slip-like phenomena [20].

Complexity. Finally, we will examine the computationalcomplexity of the different Monte Carlo-based methods.First, we note that the complexity of each of the presentedalgorithms scales linearly with the number of samples.Hence, it suffices to determine (i) the complexity per sampleand (ii) the number of samples required to achieve asatisfactory performance.

It is hard to assess the complexity of the algorithms inan analytical manner. Therefore, we compared their relativecomplexity per sample based on the duration of an actualimplementation on a Matlab simulation platform. Table 1

EURASIP Journal on Wireless Communications and Networking 9B

ER

10−4

10−3

10−2

10−1

100

Eb/N0

0 1 2 3 4 5 6 7 8

Phase-based APhase-based BSymbol-basedSoft decision

Hard decisionPilot onlyPerfect phase

Figure 3: BER performance for coded setup (σ2δ= 2◦, QPSK, 5%

pilots).

BE

R

10−5

10−4

10−3

10−2

10−1

100

Jmax (number of samples)

0 10 20 30 40 50 60 70 80

Phase-based APhase-based B

Symbol-basedPerfect phase

Figure 4: BER performance for coded setup as function of numberof samples (σ2

δ= 2◦, Eb/N0 = 5 dB, QPSK, 5% pilots).

displays the results. Apparently, the phase-based samplingmethod with the prior importance sampling distributionbears the lowest complexity. Based on the simplicity of thisestimator operation (see Section 5.2.2), this result does notcome as a surprise.

It remains is to compare the performance of the algo-rithms with respect to the number of samples used in theirevaluation. Figure 4 illustrates this behavior for the codedscenario. It turns out that the phase-based sampling methodsconverge much faster to the asymptotic performance, whichis defined as the performance for Jmax → ∞. Furthermore,

Table 1: Comparison of the complexity per sample of the MonteCarlo methods (for QPSK signaling).

Method Relative complexity

Symbol-based sampling 1.26

Phase-based sampling A 1.29

Phase-based sampling B 1

the difference between the two phase-based sampling meth-ods is negligible. Hence, based on the results from Table 1,the phase-based sampling method with the prior importancesampling distribution has the lowest overall complexity.These findings advocate the use of this last method to dealwith phase noise on coded systems.

8. Conclusions

This paper explored the use of Monte Carlo methodsfor phase noise estimation. Starting with a short surveyon Monte Carlo methods, several techniques were intro-duced, such as sequential importance sampling and Rao-Blackwellization, laying the foundation for the developmentof various phase noise estimators. It turned out that thereare two feasible Monte Carlo approaches to tackle thephase noise problem. The first one boils down to drawingsamples from the a posteriori distribution of the symbolsand updating them in a recursive manner. The carrier phasetrajectory is hereby tracked analytically. This approach haspreviously been examined in [6]. The other approach entailsthe sequential sampling of the a posteriori carrier phase dis-tribution. Two different importance sampling distributionscan be used for this method. The use of the optimal samplingdistribution has been explored in [7], whereas this paper alsoconsiders the use of the prior sampling distribution. Com-puter simulations show that the performance complexitytradeoff is optimized for the phase-based sampling methodwith a prior importance sampling distribution.

Appendices

A. Derivation of (27)

First, we assume that the likelihood function (3) only takes

on significant values in the neighborhood of θ( j)k−1. Invoking

the linearized channel model from (5), this allows to rewrite(3) as follows:

p(rk∣∣ak, θk

)

∝ exp

⎛⎝−

∥∥ak∥∥2

2σ2n

∥∥∥∥∥rkake− jθ

( j)k−1 − 1− j

(θk − θ ( j)

k−1

)∥∥∥∥∥

2⎞⎠

= exp

(−∥∥ak

∥∥2

2σ2n

R

{rkake− jθ

( j)k−1 − 1

}2

−∥∥ak

∥∥2

2σ2n

I

{rkake− jθ

( j)k−1 − 1− j

(θk − θ ( j)

k−1

)}2).

(A.1)


This approximation is valid for values of θk situated in the

neighborhood of θ( j)k−1. We can now combine (A.1) and (4)

into

p(rk∣∣ak, θ

( j)k−1

)

=∫

θkp(rk∣∣ak , θk

)p(θk∣∣θ ( j)

k−1

)dθk

∝ exp

(−∥∥ak

∥∥2

2σ2n

R

{rkake− jθ

( j)k−1 − 1

}2

−∥∥ak

∥∥2

2σ2n + 2

∥∥ak∥∥2σ2δ

I

{rkake− jθ

( j)k−1 − 1

}2)

= exp

(− 1

2(σ2n +


)∥∥rk − ake jθ

( j)k−1∥∥2

− ∥∥ak∥∥2

(σ2δ

σ2n

)R{ rkake− jθ

( j)k−1 − 1}

2)

� exp

(− 1

2(σ2n +


)∥∥rk − ake jθ

( j)k−1∥∥2).

(A.2)

The last approximation is valid for small σ2δ . Finally, multipli-

cation with the prior symbol distribution p(ak) yields (27).

B. Derivation of (28)

The derivation of (28) draws on the linearized channelmodel distribution (A.1) and the following straightforwardmanipulations:

p(θk∣∣rk, θ

( j)k−1, a

( j)k

)

∝ p(rk∣∣θk, a

( j)k

)p(θk∣∣θ ( j)

k−1

)

� exp

(−∥∥ak

∥∥2

2σ2n

∥∥∥∥rkake− jθ

( j)k−1 − 1− j

(θk − θ ( j)

k−1

)∥∥∥∥2

− 12σ2

δ

∥∥θk − θ ( j)k−1

∥∥2)

∝ exp

(−∥∥ak

∥∥2

2σ2n

[I

{rkake− jθ

( j)k−1

}− (θk − θ ( j)

k−1

)]2

− 12σ2

δ

(θk − θ ( j)

k−1

)2)

∝ exp

⎛⎝− 1

2σ2u

(θk − θ ( j)

k−1 −∥∥ak

∥∥2σ2u

σ2n

I

{rkake− jθ

( j)k−1

})2⎞⎠

= exp

(− 1

2σ2u

∥∥θk − θu∥∥2)

,

(B.1)

where θu and σ2u are defined in(29) and (30), respectively.

Acknowledgments

The first author gratefully acknowledges the support fromthe Research Foundation-Flanders (FWO Vlaanderen). Thiswork is also supported by the European Commissionin the framework of the FP7 Network of Excellencein Wireless Communications NEWCOM++ (Contract no.216715), the Turkish Scientific and Technical ResearchInstitute (TUBITAK) under Grant no. 108E054, and theResearch Fund of Istanbul University under Projects UDP-2042/23012008, UDP-1679/10102007.

References

[1] H. Meyr, M. Moeneclaey, and S. A. Fechtel, Digital Commu-nication Receivers: Synchronization, Channel Estimation, andSignal Processing, vol. 2, John Wiley & Sons, New York, NY,USA, 1997.

[2] U. Mengali and A. N. D’Andrea, Synchronization Techniquesfor Digital Receivers, Plenum Press, New York, NY, USA, 1997.

[3] L. Benvenuti, L. Giugno, V. Lottici, and M. Luise, “Codeawarecarrier phase noise compensation on turbo-coded spectrally-efficient high-order modulations,” in Proceedings of the 8thInternational Workshop on Signal Processing for Space Com-munications (SPSC ’03), vol. 1, pp. 177–184, Catania, Italy,September 2003.

[4] N. Noels, H. Steendam, and M. Moeneclaey, “Carrier phasetracking from turbo and LDPC coded signals affected by afrequency offset,” IEEE Communications Letters, vol. 9, no. 10,pp. 915–917, 2005.

[5] G. Colavolpe, A. Barbieri, and G. Caire, “Algorithms foriterative decoding in the presence of strong phase noise,” IEEEJournal on Selected Areas in Communications, vol. 23, no. 9, pp.1748–1757, 2005.

[6] E. Panayırcı, H. Cırpan, and M. Moeneclaey, “A sequentialMonte Carlo method for blind phase noise estimation anddata detection,” in Proceedings of the 13th European Signal Pro-cessing Conference (EUSIPCO ’05), Antalya, Turkey, September2005.

[7] P. O. Amblard, J. M. Brossier, and E. Moisan, “Phase tracking:what do we gain from optimality? Particle filtering versusphase-locked loops,” Signal Processing, vol. 83, no. 1, pp. 151–167, 2003.

[8] J. Bhatti and M. Moeneclaey, “Pilot-aided carrier synchroniza-tion using an approximate DCT-based phase noise model,” inProceedings of the 7th IEEE International Symposium on SignalProcessing and Information Technology (ISSPIT ’07), pp. 1143–1148, Cairo, Egypt, December 2007.

[9] B. D. O. Anderson and J. B. Moore, Optimal Filtering, Prentice-Hall, Englewood Cliffs, NJ, USA, 1979.

[10] A. Doucet, S. Godsill, and C. Andrieu, “On sequential MonteCarlo sampling methods for Bayesian filtering,” Statistics andComputing, vol. 10, no. 3, pp. 197–208, 2000.

[11] A. Doucet, “On sequential simulation-based methods forBayesian filtering,” Tech. Rep. CUED/F-INFENG/TR 310,Department of Engineering, Cambridge University, Cam-bridge, UK, 1998.

[12] O. Cappe, S. J. Godsill, and E. Moulines, “An overview ofexisting methods and recent advances in sequential MonteCarlo,” Proceedings of the IEEE, vol. 95, no. 5, pp. 899–924,2007.


[13] A. E. Gelfand and A. F. M. Smith, “Sampling-based approachesto calculating marginal densities,” Journal of the AmericanStatistical Association, vol. 85, no. 410, pp. 398–409, 1990.

[14] C. Andrieu and A. Doucet, “Particle filtering for partiallyobserved Gaussian state space models,” Journal of the RoyalStatistical Society. Series B, vol. 64, no. 4, pp. 827–836, 2002.

[15] F. Simoens, Iterative multiple-input multiple-output communi-cation systems, Ph.D. thesis, Ghent University, Ghent, Belgium,2008.

[16] H. Wymeersch, Iterative Receiver Design, Cambridge Univer-sity Press, Cambridge, UK, 2007.

[17] J. Dauwels and H.-A. Loeliger, “Phase estimation by messagepassing,” in Proceedings of the IEEE International Conference onCommunications (ICC ’04), vol. 1, pp. 523–527, Paris, France,June 2004.

[18] N. Wiberg, Codes and decoding on general graphs, Ph.D. thesis,Linkoping University, Linkoping, Sweden, 1996.

[19] A. P. Worthen and W. E. Stark, “Unified design of iterativereceivers using factor graphs,” IEEE Transactions on Informa-tion Theory, vol. 47, no. 2, pp. 843–849, 2001.

[20] H. Meyr and G. Ascheid, Synchronization in Digital Commu-nications, John Wiley & Sons, New York, NY, USA, 1990.


Research Article

Digital Receiver Design for Transmitted ReferenceUltra-Wideband Systems

Yiyin Wang, Geert Leus, and Alle-Jan van der Veen

Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS), Delft University of Technology,Mekelweg 4, 2628 CD Delft, The Netherlands

Correspondence should be addressed to Yiyin Wang, [email protected]

Received 30 June 2008; Revised 6 November 2008; Accepted 1 February 2009

Recommended by Erdal Panayirci

A complete detection, channel estimation, synchronization, and equalization scheme for a transmitted reference (TR) ultra-wideband (UWB) system is proposed in this paper. The scheme is based on a data model which admits a moderate data rate andtakes both the interframe interference (IFI) and the intersymbol interference (ISI) into consideration. Moreover, the bias causedby the interpulse interference (IPI) in one frame is also taken into account. Based on the analysis of the stochastic properties ofthe received signals, several detectors are studied and evaluated. Furthermore, a data-aided two-stage synchronization strategyis proposed, which obtains sample-level timing in the range of one symbol at the first stage and then pursues symbol-levelsynchronization by looking for the header at the second stage. Three channel estimators are derived to achieve joint channeland timing estimates for the first stage, namely, the linear minimum mean square error (LMMSE) estimator, the least squares(LS) estimator, and the matched filter (MF). We check the performance of different combinations of channel estimation andequalization schemes and try to find the best combination, that is, the one providing a good tradeoff between complexity andperformance.

Copyright © 2009 Yiyin Wang et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

Ultra-wideband (UWB) techniques can provide high speed,low cost, and low complexity wireless communications withthe capability to overlay existing frequency allocations [1].Since UWB systems employ ultrashort low duty cycle pulsesas information carriers, they suffer from stringent timingrequirements [1, 2] and complex multipath channel esti-mation [1]. Conventional approaches require a prohibitivelyhigh sampling rate of several GHz [3] and an intensivemultidimensional search to estimate the parameters for eachmultipath echo [4].

Detection, channel estimation, and synchronizationproblems are always entangled with each other. A typicalapproach to address these problems is the detection-basedsignal acquisition [5]. A locally generated template is cor-related with the received signal, and the result is comparedto a threshold. How to generate a good template is the taskof channel estimation, whereas how to decide the thresholdis the goal of detection. Due to the multipath channel,

the complexity of channel estimation grows quickly as thenumber of multipath components increases, and because ofthe fine resolution of the UWB signal, the search space isextremely large.

Recent research works on detection, channel estimation,and synchronization methods for UWB have focused on lowsampling rate methods [6–9] or noncoherent systems, suchas transmitted reference (TR) systems [5, 10], differentialdetectors (DDs) [11], and energy detectors (EDs) [9, 12].In [6], a generalized likelihood ratio test (GLRT) for frame-level acquisition based on symbol rate sampling is proposed,which works with no or small interframe interference (IFI)and no intersymbol interference (ISI). The whole trainingsequence is assumed to be included in the observationwindow without knowing the exact starting point. Due toits low duty cycle, an UWB signal belongs to the class ofsignals that have a finite rate of innovation [7]. Hence, it canbe sampled below the Nyquist sampling rate, and the timinginformation can be estimated by standard methods. The the-ory is developed under the simplest scenario, and extensions


are currently envisioned [13]. The timing recovery algorithmof [8] makes cross-correlations of successive symbol-longreceived signals, in which the feedback controlled delaylines are difficult to implement. In [9], the authors addressa timing estimation comparison among different types oftransceivers, such as stored-reference (SR) systems, EDsystems, and TR systems. The ED and the TR systemsbelong to the class of noncoherent receivers. Although theirperformances are suboptimal due to the noise contaminatedtemplates, they attract more and more interest becauseof their simplicity. They are also more tolerant to timingmismatches than SR systems. The algorithms in [9] arebased on the assumption that the frame-level acquisition hasalready been achieved. Two-step strategies for acquisition aredescribed in [14, 15]. In [14], the authors use a differentsearch strategy in each step to speed up the procedure, whichis the bit reversal search for the first step and the linear searchfor the second step. Meanwhile, the two-step procedure in[15] finds the block which contains the signal in the firststep, and aligns with the signal at a finer resolution in thesecond step. Both methods are based on the assumptionthat coarse acquisition has already been achieved to limit thesearch space to the range of one frame and that there are nointerferences in the signal.

From a system point of view, noncoherent receiversare considered to be more practical since they can avoidthe difficulty of accurate synchronization and complicatedchannel estimation. One main obstacle for TR systemsand DD systems is the implementation of the delay line[16]. The longer the delay line is, the more difficult itis to implement. For DD systems [11], the delay line isseveral frames long, whereas for TR systems, it can be onlyseveral pulses long [17], which is much shorter and easierto implement [18]. ED systems do not need a delay line,but suffer from multiple access interference [19], since theycan only adopt a limited number of modulation schemes,such as on-off keying (OOK) and pulse position modulation(PPM). A two-stage acquisition scheme for TR-UWB systemsis proposed in [5], which employs two sets of direct-sequence(DS) code sequences to facilitate coarse timing and finealigning. The scheme assumes no IFI and ISI. In [20], a blindsynchronization method for TR-UWB systems executes anMUSIC-kind of search in the signal subspace to achieve high-resolution timing estimation. However, the complexity of thealgorithm is very high because of the matrix decomposition.

Recently, a multiuser TR-UWB system that admits notonly interpulse interference (IPI), but also IFI and ISIwas proposed in [21]. The synchronization for such asystem is at low-rate sample-level. The analog parts can runindependently without any feedback control from the digitalparts. In this paper, we develop a complete detection, channelestimation, synchronization, and equalization scheme basedon the data model modified from [21]. Moreover, the per-formance of different kinds of detectors is assessed. A two-stage synchronization strategy is proposed to decouple thesearch space and speed up synchronization. The property ofthe circulant matrix in the data model is exploited to reducethe computational complexity. Different combinations ofchannel estimators and equalizers are evaluated to find

the one with the best tradeoff between performance andcomplexity. The results confirm that the TR-UWB systemis a practical scheme that can provide moderate data ratecommunications (e.g., in our simulation setup, the data rateis 2.2 Mb/s) at a low cost.

The paper is organized as follows. In Section 2, thedata model presented in [21] is summarized and modifiedto take the unknown timing into account. Further, thestatistics of the noise are derived. The detection problem isaddressed in Section 3. Channel estimation, synchronization,and equalization are discussed in Section 4. Simulationresults are shown and assessed in Section 5. Conclusions aredrawn in Section 6.

Notation. We use upper (lower) bold face letters todenote matrices (column vectors). x(·)(x[·]) represents acontinuous (discrete) time sequence. 0m×n (1m×n) is an all-zero (all-one) matrix of size m × n, while 0m (1m) is an all-zero (all-one) column vector of length m. Im indicates anidentity matrix of size m × m. �, ⊗ and � indicate timedomain convolution, Kronecker product, and element-wiseproduct. (·)†, (·)T , (·)H , | · |, and ‖ · ‖F designate pseu-doinverse, transposition, conjugate transposition, absolutevalue, and Frobenius norm. All other notation should be self-explanatory.

2. Asynchronous Single User Data Model

The asynchronous single user data model derived in thefollowing paragraphs uses the data model in [21] as a startingpoint. We take the unknown timing into consideration andmodify the model in [21].

2.1. Single Frame. In a TR-UWB system [10, 21], pairs ofpulses (doublets) are transmitted in sequence as shown inFigure 1. The first pulse in the doublet is the reference pulse,whereas the second one is the data pulse. Since both pulses gothrough the same channel, the reference pulse can be used asa “dirty template” (noise contaminated) [8] for correlationat the receiver. One frame-period Tf holds one doublet.Moreover, Nf frames constitute one symbol period Ts =Nf T f , which is carrying a symbol si ∈ {−1, +1}, spread by apseudorandom code cj ∈ {−1, +1}, j = 1, 2, . . . ,Nf , which isrepeatedly used for all symbols. The polarity of a data pulse ismodulated by the product of a frame code and a symbol. Thetwo pulses are separated by some delay interval Dm, whichcan be different for each frame. The delay intervals are in theorder of nanoseconds and Dm � Tf . The receiver employsmultiple correlation branches corresponding to differentdelay intervals. To simplify the system, we use a single delayand one correlation branch, which implies Dm = D. Figure 1also presents an example of the receiver structure for a singledelay D. The integrate-and-dump (I&D) integrates over aninterval of length Tsam. As a result, one frame results inP = Tf /Tsam samples, which is assumed to be an integer.

The received one-frame signal ( jth frame of ith symbol)at the antenna output is

r(t) = h(t − τ) + sic jh(t −D − τ) + n(t), (1)


where τ is the unknown timing offset, h(t) = hp(t)� g(t) oflength Th with hp(t) the UWB physical channel and g(t) thepulse shape resulting from all the filter and antenna effects,and n(t) is the bandlimited additive white Gaussian noise(AWGN) with double-sided power spectral density N0/2 andbandwidth B. Without loss of generality, we may assumethat the unknown timing offset τ in (1) is in the range ofone symbol period, τ ∈ [0,Ts), since we know the signalis present by detection at the first step (see Section 3) andpropose to find the symbol boundary before acquiring thepackage header (see Section 4). Then, τ can be decomposedas

τ = δ · Tsam + ε, (2)

where δ = �τ/Tsam� ∈ {0, 1, . . . ,Ls − 1} denotes the sample-level offset in the range of one symbol with Ls = Nf P,the symbol length in terms of number of samples, andε ∈ [0,Tsam) presents the fractional offset. Sample-levelsynchronization consists of estimating δ. The influence of εwill be absorbed in the data model and becomes invisible aswe will show later.

Based on the received signal r(t), the correlation branchof the receiver computes

x[n]

=∫ nTsam+D

(n−1)Tsam+Dr(t)r(t −D)dt

=∫ nTsam

(n−1)Tsam

{[h(t − τ) + sic jh(t −D − τ) + n(t)

]

× [h(t+D − τ)+sic jh(t − τ)+n(t +D)

]}dt

= sic j

∫ nTsam

(n−1)Tsam

[h2(t − τ) + h(t −D − τ)h(t +D − τ)

]dt

+∫ nTsam

(n−1)Tsam

[h(t − τ)h(t +D − τ)

+ h(t −D − τ)h(t − τ)]dt + n1[n],(3)

where

n1[n]

= n0[n] + sic j

∫ nTsam

(n−1)Tsam

[h(t − τ)n(t)

+ h(t −D − τ)n(t +D)]dt

+∫ nTsam

(n−1)Tsam

[h(t − τ)n(t +D)

+ h(t +D − τ)n(t)]dt

(4)

with

n0[n] =∫ nTsam

(n−1)Tsam

n(t)n(t +D)dt. (5)

Note that n0[n] is the noise autocorrelation term, and n1[n]encompasses the signal-noise cross-correlation term and thenoise autocorrelation term. Their statistics will be analyzedlater. Taking ε into consideration, we can define the channelcorrelation function similarly as in [21]

R(Δ,m)

=∫ mTsam

(m−1)Tsam

h(t − ε)h(t − ε − Δ)dt, m = 1, 2, . . . ,(6)

where h(t) = 0, when t > Th or t < 0. Therefore, the firstterm in (3) can be denoted as sic j

∫ nTsam

(n−1)Tsamh2(t − τ)dt =

sic j∫ nTsam−δTsam

(n−1)Tsam−δTsamh2(t − ε)dt = sic jR(0,n − δ). Other terms

in x[n] can also be rewritten in a similar way, leading x[n] tobe

x[n]

=

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

sic j

[R(0,n− δ) + R

(2D, n− δ +

D

Tsam

)]

+[R(D,n− δ) + R

(D,n− δ +

D

Tsam

)]+ n1[n],

n = δ + 1, δ + 2, . . . , δ + Ph,

n0[n], elsewhere,(7)

where Ph = Th/Tsam is the channel length in termsof number of samples, and R(0,m) is always nonnegative.Although R(2D, m + D/Tsam) is always very small comparedto R(0,m), we do not ignore it to make the model moreaccurate. We also take the two bias terms into account, whichare the cause of the IPI and are independent of the datasymbols and the code. Now, we can define the Ph×1 channelenergy vector h with entries hm as

hm = R(0,m) + R(

2D,m +D

Tsam

), m = 1, . . . ,Ph, (8)

where R(0,m) ≥ 0. Further, the Ph × 1 bias vector b withentries bm is defined as

bm = R(D,m) + R(

2D,m +D

Tsam

), m = 1, . . . ,Ph. (9)

Note that these entries will change as a function of ε,although ε is not visible in the data model. As we statedbefore, sample-level synchronization is limited to the estima-tion of δ. Using (8) and (9), x[n] can be represented as

x[n]

=⎧⎨⎩sic jhn−δ+bn−δ+n1[n], n = δ + 1, δ + 2, . . . , δ + Ph,

n0[n], elsewhere.(10)

Now we can turn to the noise analysis. A number ofpapers have addressed the noise analysis for TR systems [22–25]. The noise properties are summarized here, and more


Ts s = 1

c1 = 1 c2 = −1 c3 = 1

Tf

D

· · ·

(a)

fs = 1Tsam

r(t)

D

∫ nTsam+D

(n−1)Tsam+Dx[n]

(b)

Figure 1: The transmitted UWB signal and the receiver structure.

details can be found in Appendix A. We start by making theassumptions that D � 1/B, Tsam � 1/B, and the time-bandwidth product 2BTsam is large enough. Under theseassumptions, the noise autocorrelation term n0[n] can beassumed to be a zero mean white Gaussian random variablewith variance σ2

0 = N20BTsam/2. The other noise term

n1[n] includes the signal-noise cross-correlation and thenoise autocorrelation, and can be interpreted as a randomdisturbance of the received signal. Let us define two otherPh × 1 channel energy vectors h′ and h′′ with entries h′m andh′′m to be used in the variance of n1[n] as follows:

h′m = R(0,m) + R(

0,m− D

Tsam

), m = 1, . . . ,Ph, (11)

h′′m = R(0,m) + R(

0,m +D

Tsam

), m = 1, . . . ,Ph. (12)

Using those definitions and under the earlier assumptions,n1[n] can also be assumed to be a zero mean Gaussian ran-dom variable with variance (N0/2)(h′n−δ +h′′n−δ + 2sic jbn−δ) +σ2

0 , n = δ+1, δ+2, . . . , δ+Ph. This indicates that all the noisesamples are uncorrelated with each other and have a differentvariance depending on the data symbol, the frame code, thechannel correlation coefficients, and the noise level. Note thatthe noise model is as complicated as the signal model.

2.2. Multiple Frames and Symbols. Now let us extend thedata model to multiple frames and symbols. We assume thechannel length Ph is not longer than the symbol length Ls.A single symbol with timing offset τ will then spread overat most three adjacent symbol periods. Define xk = [x[(k −1)Ls + 1], x[(k− 1)Ls + 2], . . . , x[kLs]]T , which is an Ls -long

sample vector. By stacking M + N − 1 such received samplevectors into an MLs ×N matrix

X =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

xk xk+1 . . . xk+N−1

xk+1 xk+2 . . . xk+N

... . . ....

xk+M−1 xk+M . . . xk+M+N−2

⎤⎥⎥⎥⎥⎥⎥⎥⎦

, (13)

where N indicates the number of samples in each row of X,andM denotes the number of sample vectors in each columnof X, we obtain the following decomposition:

X = Cδ′(

IM+2 ⊗ h)

S + Bδ′1(MNf +2Nf

)×N + N1, (14)

where N1 is the noise matrix similarly defined as X,

S =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

sk−1 sk . . . sk+N−2

sk sk+1 . . . sk+N−1

... . . ....

sk+M sk+M+1 . . . sk+M+N−1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

, (15)

and the structure of the other matrices is illustratedin Figure 2. We first define a code matrix C. It is ablock Sylvester matrix of size (Ls + Ph − P) × Ph, whosecolumns are shifted versions of the extended code vector:[c1, 0TP−1, c2, 0TP−1, . . . , cNf , 0TP−1]

T. The shift step is one

sample. Its structure is shown in Figure 3. The matrix Cδ′ ofsize MLs× (MPh +2Ph) is composed of M+2 block columns,where δ = (Ls − δ′) mod Ls, δ′ ∈ {0, 1, . . . ,Ls − 1}. As longas there are more than two sample vectors (M > 2) stacked inevery column of X, the nonzero parts of the block columnswill contain M−2 code matrices C. The nonzero parts of thefirst and last two block columns result from splitting the codematrix C according to δ′: C′i (2Ls − i + 1 : 2Ls, :) = C(1 : i, :)and C′′i (1 : Ls +Ph−P− i, :) = C(i+ 1 : Ls +Ph−P, :), whereA(m : n, :) refers to column m through n of A. The overlaysbetween frames and symbols observed in Cδ′ indicate theexistence of IFI and ISI. Then we define a bias matrix B whichis of size (Ls + Ph − P) × Nf made up by shifted versions ofthe bias vector b with a shift step of P samples, as shown inFigure 3. The matrix Bδ′ of size MLs× (MNf + 2Nf ) also hasM+2 block columns, the nonzero parts of which are obtainedfrom the bias matrix B in the same way as Cδ′ . Since the biasis independent of the data symbols and the code, it is thesame for each frame. Each column of the resulting matrixBδ′1(MNf +2Nf )×N is the same and has a period of P samples.Defining b f to be the P × 1 bias vector for one such period,we have

Bδ′1(MNf +2Nf

)×N = 1MNf ×N ⊗ b f . (16)

Note that b f is also a function of δ, but since it is independentof the code, we cannot extract the timing information fromit.

Recalling the noise analysis of the previous section, thenoise matrix N1 has zero mean and contains uncorrelated


X =

C′′Ls+δ′

Ls C′′δ′

Ls

Ls − δ′

C

. . .

C

C′Ls+δ′

Ls

Ls

C′δ′Cδ′

h

h

. . .

h

h

S +

B′′Ls+δ′

B′′δ′

B

. . .

B

B′Ls+δ′

B′δ′Bδ′

1

Figure 2: The data model structure of X.

P

cN f −1

cN f

Ph

C

c1

c2

P

b

Ph

N f

B

Ls − P + Ph

Figure 3: The structure of the code matrix C and the bias matrix B.

samples with different variances. The matrix Λ, whichcollects the variances of each element in N1, is

Λ = E(

N1 �N1)

= N0

2

{(H′δ′ + H′′

δ′)

1(MNf +2Nf

)×N

+ 2Cδ′(

IM+2 ⊗ b)

S}

+ σ20 1MLs×N ,

(17)

where H′δ′ and H′′

δ′ have exactly the same structure as Bδ′ ,only using h′ and h′′ instead of b. They all have the same

periodic property, if multiplied by 1. Defining h′f and h′′f tobe the two P × 1 vectors for one such period, we obtain

H′δ′1

(MNf +2Nf

)×N = 1MNf ×N ⊗ h′f , (18)

H′′δ′1

(MNf +2Nf

)×N = 1MNf ×N ⊗ h′′f . (19)

3. Detection

The first task of the receiver is to detect the existenceof a signal. In order to separate the detection and thesynchronization problems, we assume that the transmittedsignal starts with a training sequence and assign the firstsegment of the training sequence to detection only. In thissegment, we transmit all “+1” symbols and employ all “+1”codes. It is equivalent to sending only positive pulses forsome time. This kind of training sequence bypasses thecode and the symbol sequence synchronization. Therefore,we do not have to consider timing issues when we handlethe detection problem. The drawback is the presence ofspectral peaks as a result of the periodicity. It can besolved by employing a time hopping code for the frames.We omit this in our discussion for simplicity. It is alsopossible to use a signal structure other than TR signals fordetection, such as a positive pulse training with an ED.Although the ED doubles the noise variance due to thesquaring operation, the TR system wastes half of the energyto transmit the reference pulses. Therefore, they would havea similar detection performance for the same signal-to-noiseratio (SNR), that is, the ratio of the symbol energy to thenoise power spectrum density. We keep the TR structurefor detection in order to avoid additional hardware for thereceiver.

In the detection process, we assume that the first trainingsegment is 2M1 symbols long, and the observation window is


M1 symbols long (M1Ls =M1Nf P samples equivalently). Wecollect all the samples in the observation window, calculate atest statistic, and examine whether it exceeds a threshold. Ifnot, we jump into the next successive observation windowof M1 symbols. The 2M1-symbol-long training segmentmakes sure that there will be at least one moment, at whichthe M1-symbol-long observation window is full of trainingsymbols. In this way, we speed up our search procedureby jumping M1 symbols. Once the threshold is exceeded,we skip the next 2M1 symbols in order to be out of thefirst segment of the training sequence and we are readyto start the channel estimation and synchronization at thesample-level (see Section 4). There will be situations wherethe observation window only partially overlaps the signal.However, for simplicity, we will not take these cases intoaccount, when we derive the test statistic. If these caseshappen and the test statistic is larger than the threshold, wedeclare the existence of a signal, which is true. Otherwise, wemiss the detection and shift to the next observation window,which is then full of training symbols giving us a secondchance to detect the signal. Therefore, we do not have todistinguish the partially overlapped cases from the overallincluded case. We will derive the test statistic using onlytwo hypotheses indicated below. But the evaluation of thedetection performance will take all the cases into account.

3.1. Detection Problem Statement. Since we only have to tellwhether the whole observation window contains a signalor not, the detection problem is simplified to a binaryhypothesis test. We first define the M1Nf P × 1 sample vector

x = [xTk , xTk+1, . . . , xTk+M1−1]T with entries x[n],n = (k −1)Nf P+1, (k−1)Nf P+2, . . . , (k+M1−1)Nf P, which collectsall the samples in the observation window. The hypothesesare as follows.

(1) H0: there is only noise. Under H0, according to theanalysis from the previous section, x is modeled as

x = n0, (20)

x a∼ N(0, σ2

0 I), (21)

where n0 is the noise vector with entries n0[n],n =(k − 1)Nf P + 1, (k − 1)Nf P + 2, . . . , (k + M1 − 1)Nf P,

and a∼ indicates approximately distributed according to.The Gaussian approximation for x is valid based on theassumptions in the previous section.

(2) H1: signal with noise is occupying the wholeobservation window. Under H1, the data model (14) andthe noise model (17) can be easily specified according to theall “+1” training sequence. We define Hδ′ having the samestructure as Bδ′ , only taking h instead of b. It also has a periodof P samples in each column, if multiplied by 1. Defining h f

to be the P × 1 vector for one such period, we have

Hδ′1(MNf +2Nf

)×N = 1MNf ×N ⊗ h f . (22)

By selecting M = M1 and N = 1 for (14) and taking (16),(18), (19) and (22) into the model, the sample vector x canbe decomposed as

x = 1M1Nf ⊗(

h f + b f)

+ n1, (23)

where the zero mean noise vector n1 has uncorrelated entriesn1[n],n = (k−1)Nf P+1, (k−1)Nf P+2, . . . , (k+M1−1)Nf P,and the variances of each element in n1 are given by

λ = E(

n1 � n1)

= N0

21M1Nf ⊗

(h′f + h′′f + 2b f

)+ σ2

0 1M1Nf P.(24)

Due to the all “+1” training sequence, the impact of theIFI is to fold the aggregate channel response into one frame,so the frame energy remains constant. Normally, the channelcorrelation function is quite narrow, so R(D,m) � R(0,m)and R(2D,m) � R(0,m). Then, we can have the relation

h′f + h′′f + 2b f ≈ 4(

h f + b f). (25)

Defining the P × 1 frame energy vector z f = h f + b f withentries z f [i], i = 1, 2, . . . ,P and frame energy E f = 1TP z f , wecan simplify x and λ

x = 1M1Nf ⊗ z f + n1, (26)

λ ≈ 2N01M1Nf ⊗ z f + σ20 1M1Nf P. (27)

Based on the analysis above and the assumptions from theprevious section, x can still be assumed as a Gaussian vectorin agreement with [23]

x a∼ N(

1M1Nf ⊗ z f , diag(λ)), (28)

where diag(a) indicates a square matrix with a on the maindiagonal and zeros elsewhere.

3.2. Detector Derivation. The test statistic is derived using H0

and H1. It is suboptimal, since it ignores other cases. But it isstill useful as we have analyzed before. The Neyman-Pearson(NP) detector [26] decides H1 if

L(x) = p(

x; H1)

p(

x; H0) > γ, (29)

where γ is found by making the probability of false alarm PFAto satisfy

PFA = Pr{L(x) > γ; H0

} = α. (30)

The test statistic is derived by taking the stochastic propertiesof x under the two hypotheses into L(x) (29) and eliminatingconstant values. It is given by

T(x)=P∑

i=1

z f [i]

σ21 [i]

{(k+M1−1)Nf −1∑

n=(k−1)Nf

(x[nP + i] +

N0

σ20x2[nP + i]

)},

(31)


where σ21 [i] = 2N0z f [i] + σ2

0 . A detailed derivation ispresented in Appendix B. Then the threshold γ will be foundto satisfy

PFA = Pr{T(x) > γ; H0

} = α. (32)

Hence, for each observation window, we calculate the teststatistic T(x) and compare it with the threshold γ. If thethreshold is exceeded, we announce that a signal is detected.

The test statistic not only depends on the noise knowl-edge σ2

0 but also on the composite channel energy profilez f [i]. All data samples make a weighted contribution to thetest statistic, since they have different means and variances.The larger z f [i]/σ2

0 is, the heavier the weighting coefficientis. If we would like to employ T(x), we have to know σ2

0

and z f [i] first. Note that σ20 can be easily estimated, when

there is no signal transmitted. However, the estimation of thecomposite channel energy profile z f [i] is not as easy, since itappears in both the mean and the variance of x under H1.

3.3. Detection Performance Evaluation. Until now, the opti-mal detector for the earlier binary hypothesis test has beenderived. The performance of this detector working underreal circumstances has to be evaluated by taking all thecases into account. As we have described before, there aremoments where the observation window partially overlaysthe signal. They can be modeled as other hypotheses H j , j =2, . . . ,M1Nf P. Applying the same test statistic T(x) underthese hypotheses including H1, the probability of detectionis defined as

PD, j = Pr{T(x) > γ; H j

}, j = 1, . . . ,M1Nf P. (33)

We would obtain PD,1 > PD, j , j = 2, . . . ,M1Nf P. Sincethe observation window collects the maximum signal energyunder H1 and the test statistic is optimized to detect H1,it should have the highest possibility to detect the signal.Furthermore, if we miss the detection under H j , j =1, . . . ,M1Nf P, we still have a second chance to detect thesignal with a probability of PD,1 in the next observationwindow, recalling that the training sequence is 2M1 symbolslong. Therefore, the total probability of detection for thistesting procedure is PD, j + (1− PD, j)PD,1, j = 1, . . . ,M1Nf P,which is larger than PD,1 and not larger than PD,1 + (1 −PD,1)PD,1. Since all hypotheses H j , j = 1, . . . ,M1Nf P haveequal probability, we can obtain that the overall probabilityof detection PDo for the detector T(x) is

PDo =1

M1Nf P

M1Nf P∑

j=1

{PD, j +

(1− PD, j

)PD,1

},

j = 1, . . . ,M1Nf P,

(34)

where PD,1 < PDo < PD,1 + (1 − PD,1)PD,1. Since theanalytical evaluation of PDo is very complicated, we justderive the theoretical performance of PD,1 under H1. In thesimulations section, we will obtain the total PDo by MonteCarlo simulations and compare it with PD,1 and PD,1 + (1 −PD,1)PD,1, which can be used as boundaries for PDo .

A theoretical evaluation of PD,1 is carried out by firstanalyzing the stochastic properties of T(x). As T(x) iscomposed of two parts, we can define

T1(x) =P∑

i=1

z f [i]

σ21 [i]

(k+M1−1)Nf −1∑

n=(k−1)Nf

x[nP + i], (35)

T2(x) =P∑

i=1

z f [i]

σ21 [i]

(k+M1−1)Nf −1∑

n=(k−1)Nf

x2[nP + i]. (36)

Then we have

T(x) = T1(x) +N0

σ20T2(x). (37)

First, we have to know the probability density function (PDF)of T(x). However, due to the correlation between the twoparts, it can only be found in an empirical way by generatingenough samples of T(x) and making a histogram to depictthe relative frequencies of the sample ranges. Therefore, wesimply assume that T1(x) and T2(x) are uncorrelated, andT(x) is a Gaussian random variable. The mean (variance) ofT(x) is the sum of the weighted means (variances) of the twoparts. The larger the sample number M1Nf P is, the betterthe approximation is, but also the longer the detection timeis. There is a tradeoff. In summary, T(x) follows a Gaussiandistribution as follows:

T(x) a∼ N(E(T1(x)

)+N0

σ20E(T2(x)

),

var(T1(x)

)+N2

0

σ40

var(T2(x)

)).

(38)

The mean and the variance of T1(x) can be easily obtainedbased on the assumption that x is a Gaussian vector. Thestochastic properties of T2(x) are much more complicated.More details are discussed in Appendix C. All the perfor-mance approximations are summarized in Table 1, wherethe function Q(·) is the right-tail probability function for aGaussian distribution.

A special case occurs when P = 1, which means thatone sample is taken per frame (Tsam = Tf ). For this case,where no oversampling is used, we have constant energyE f and constant noise variance σ2

1 = 2N0E f + σ20 for each

frame. Then the weighting parameters for each sample in thedetector would be exactly the same. We can eliminate themand simplify the test statistic to

T′1(x) =(k+M1−1)Nf∑

n=(k−1)Nf +1

x[n], (39)

T′2(x) =(k+M1−1)Nf∑

n=(k−1)Nf +1

x2[n], (40)

T′(x) = T′1(x) +N0

σ20T′2(x). (41)


Table 1: Statistical Analysis and Performance Evaluation for Different Detectors, P > 1,Tsam = Tf /P.

T1(x) T2(x) T(x)

H0μ μT1,0 = 0 μT2,0 =M1Nf σ0

2∑P

i=1

z f [i]

σ21 [i]

μT0 = μT1,0 +N0

σ20μT2,0

σ2 σ2T1,0

=M1Nf σ02∑P

i=1

z2f [i]

σ41 [i]

σ2T2,0

= 2M1Nf σ04∑P

i=1

z2f [i]

σ41 [i]

σ2T0= σ2

T1,0+N2

0

σ40σ2T2,0

H1μ μT1,1 =M1Nf

∑P

i=1

z2f [i]

σ21 [i]

μT2,1 =M1Nf

∑P

i=1z f [i]

(1 +

z2f [i]

σ21 [i]

)μT1 = μT1,1 +

N0

σ20μT2,1

σ2 σ2T1,1 =M1Nf

∑P

i=1

z2f [i]

σ21 [i]

σ2T2,1

= 2M1Nf

∑P

i=1z2f [i]

(1 +

2z2f [i]

σ21 [i]

)σ2T1= σ2

T1,1+N2

0

σ40σ2T2,1

PFA Q(γ1

σT1,0

)= α Q

(γ − μT2,0

σT2,0

)= α Q

(γ − μT0

σT0

)= α

γ γ1 = σT1,0Q−1(α) γ2 = σT2,0Q

−1(α) + μT2,0 γ = σT0Q−1(α) + μT0

PD,1 Q(γ1 − μT1,1

σT1,1

)Q(γ2 − μT2,1

σT2,1

)Q(γ − μT1

σT1

)

Therefore, T′2(x)/σ20 will follow a central Chi-squared distri-

bution under H0, and T′2(x)/σ21 will follow a noncentral Chi-

squared distribution under H1. We calculate the thresholdfor T′2(x) as

γ′2 = σ02Q−1

χ2M1N f

(α), (42)

and the probability of detection under H1 as

PD,1 = Qχ2M1N f

(M1Nf E2f /σ

21 )

(γ′2σ2

1

), (43)

where the functions Qχ2ν(x) and Qχ2

ν (λ)(x) are the right-tail probability functions for a central and noncentral Chi-squared distribution, respectively. The statistics of T′1(x) canbe obtained by taking P = 1, z f [i] = E f , and σ2

1 [i] = σ21

into Table 1, and multiplying the means with σ21 /E f and the

variances with σ41 /E

2f . As a result, the threshold γ′1 for T′1(x) is

√M1Nf σ

20Q

−1(α), which can be easily obtained. The PD,1 ofT′(x) could be evaluated in the same way as T(x) in Table 1.

The theoretical contributions of T′1(x) and T′2(x) to T′(x)are assessed in Figure 4. The simulation parameters are setto M1 = 8, Nf = 15, Tf = 30 ns, Tp = 0.2 ns, andB ≈ 2/Tp. For the definition of Ep/N0, we refer to Section 5.The detector based on T′1(x) (dashed lines) plays a key rolein the performance of the detector based on T′(x) (solidlines) under H1. For low SNR, they are almost the same,since T′1(x) can be directly derived by ignoring the signal-noise cross-correlation term in the noise variance under H1.There is a small difference between them for medium SNRs.T′2(x) (dotted lines) has a performance loss of about 4 dBcompared to T′(x). Thanks to the ultra-wide bandwidth ofthe signal, the weighting parameter N0/σ0

2 greatly reducesthe influence of T′2(x) on T′(x). It enhances the performanceof T′(x) slightly in the medium SNR range. According tothese simulation results and the impact of the weightingparameter N0/σ

20 , we can employ T′1(x) instead of T′(x).

It has a much lower calculation cost and almost the sameperformance as T′(x).

Furthermore, the influence of the oversampling rate P tothe PD,1 of T(x) can be ignored because the oversamplingonly affects the performance of T2(x), which only has avery small influence on T(x). Therefore, the impact ofthe oversampling can be neglected. In Section 5, we willevaluate the PD,1 of T(x) using the IEEE UWB channelmodel by a quasi-analytical method and also by Monte Carlosimulations. Based on the simulation results in this section,we can predict that for small P (P > 1), the PD,1 for T(x) willbe more or less the same as the PD,1 for T′(x) or T′1(x).

4. Channel Estimation, Synchronization,and Equalization

After successful signal detection, we can start the channelestimation and synchronization phase. The sample-levelsynchronization finds out the symbol boundary (estimatesthe unknown offset δ), and the result can later on beused for symbol-level synchronization to acquire the header.This two-stage synchronization strategy decomposes a two-dimensional search into two one-dimensional searches,reducing the complexity. The channel estimates and the tim-ing information can be used for the equalizer construction.Finally, the demodulated symbols can be obtained.

4.1. Channel Estimation

4.1.1. Bias Estimation. As we have seen in the asynchronousdata model, the bias term is undesired. It does not haveany useful information, but it disturbs the signal. We willshow that this bias seriously degrades the channel estimationperformance later on. The second segment of the trainingsequence consists of “+1,−1” symbol pairs employing arandom code. The total length of the second segment shouldbe M1 + 2Ns symbols, which includes the budget for jumping2M1 symbols after the detection. The “+1,−1” symbol pairscan be used for bias estimation as well as channel estimation.Since the bias is independent of the data symbols and the


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

PD

,1

−4 −2 0 2 4 6 8 10 12 14

Ep/N0 (dB)

Probabilities of detection under H1

T′(x)T′1 (x)T′2 (x)

PFA = 1e − 1PFA = 1e − 3PFA = 1e − 5

Figure 4: Performance comparison between T ′(x) and its compo-nents T ′1(x) and T ′2(x).

useful signal part has zero mean, due to the “+1,−1” trainingsymbols, we can estimate the Ls×1 bias vector of one symbol,bs = 1Nf ⊗ b f , as

bs = 12Ns

[xk xk+1 · · · xk+2Ns−1

]12Ns . (44)

4.1.2. Channel Estimation. To take advantage of the secondsegment of the training sequence, we stack the data samplesas

X =⎡⎣

xk xk+2 . . . xk+2Ns−2

xk+1 xk+3 . . . xk+2Ns−1

⎤⎦ , (45)

which is equivalent to picking only odd columns of X in(14) with M = 2 and N = 2Ns − 1. As a result, eachcolumn depends on the same symbols, which leads to a greatsimplification of the decomposition in (14) as follows:

X = [(C′Ls+δ′ + C′′Ls+δ′

) (C′δ′ + C′′δ′

)](I2 ⊗ h

)

× [− sk sk]T

1TNs+ 12×Ns ⊗ bs + N1,

(46)

where N1 is the noise matrix similarly defined as X. Forsimplicity, we only count the noise autocorrelation term withzero mean and variance σ2

0 into N1, where σ20 can be easily

estimated in the absence of a signal. Because we jump intothis second segment of the training sequence after detectingthe signal, we do not know whether the symbol sk is “+1” or“−1”. Rewriting (46) in another form leads to

X = Cshssδ1TNs+ 12×Ns ⊗ bs + N1, (47)

where Cs is a known 2Ls × 2Ls circulant code matrix, whosefirst column is [c1, 0TP−1, c2, 0TP−1, . . . , cNf , 0TLs+P−1]

T, and the

vector hssδ of length 2Ls blends the timing and the channelinformation, which contains two channel energy vectors withdifferent signs, skh and −skh, located according to δ asfollows:

hssδ

=

⎧⎪⎪⎨⎪⎪⎩

circshift([skhT , 0TLs−Ph ,−skhT , 0TLs−Ph

]T, δ)

, δ /= 0,[− skhT , 0TLs−Ph , skhT , 0TLs−Ph

]T, δ = 0,

(48)

where circshift (a,n) circularly shifts the values in the vector aby |n| elements (down if n > 0 and up if n < 0). According to(47) and assuming the channel energy has been normalized,the linear minimum mean square error (LMMSE) estimateof hssδ then is

hssδ = CHs

(CsCH

s +σ2

0

NsI)−1

1Ns

(X− 12×Ns ⊗ bs

)1Ns . (49)

Defining

hsδ =[

hssδ(1 : Ls

)− hssδ(Ls + 1 : 2Ls

)]

2, (50)

where a(m : n) refers to element m through n of a, we canobtain a symbol-long LMMSE channel estimate as

hδ =∣∣hsδ

∣∣. (51)

According to a property of circulant matrices, Cs can bedecomposed as Cs = F ΩF H , where F is the normalizedDFT matrix of size 2Ls × 2Ls, and Ω is a diagonal matrixwith the frequency components of the first row of Cs on thediagonal. Hence, the matrix inversion in (49) can be simpli-

fied dramatically. Observing that CHs (CsCH

s + (σ20 /Ns)I)

−1is

a circulant matrix, the bias term actually does not have tobe removed in (49), since it is implicitly removed when wecalculate (50). Therefore, we do not have to estimate the biasterm explicitly for channel estimation and synchronization.

When the SNR is high, ‖CsCHs ‖F � ‖(σ2

0 /Ns)I‖F , (49)can be replaced by

hssδ = 1Ns

F Ω−1F H(

X− 12×Ns ⊗ bs)

1Ns . (52)

It is a least squares (LS) estimator and equivalent to adeconvolution of the code sequence in the frequency domain.On the other hand, when the SNR is low, ‖CsCH

s ‖F �‖(σ2

0 /Ns)I‖F , (49) becomes

hssδ = 1σ2

0F ΩHF H

(X− 12×Ns ⊗ bs

)1Ns , (53)

which is equivalent to a matched filter (MF). The MF canalso be processed in the frequency domain. The LMMSEestimator in (49), the LS estimator in (52), and the MF in(53) all have a similar computational complexity. However,for the LMMSE estimator, we have to estimate σ2

0 and thechannel energy.


−90

−80

−70

−60

−50

−40

−30

−20

−10

0

Ch

ann

eles

tim

ate

(dB

)

0 5 10 15 20 25 30 35 40 45

Samples

The symbol long channel estimate

LMMSE with bias removalLMMSE without bias removalMF with bias removalMF without bias removalTrue channel

Figure 5: The symbol-long channel estimate hδ with bias removal

and |hssδ(1 : Ls)| without bias removal, when SNR is 18 dB.

As an example, we show the performance of these chan-nel estimates under high SNR conditions (the simulationparameters can be found in Section 5). Figure 5 indicates

the symbol-long channel estimate hδ with bias removal

(implicitly obtained) and |hssδ(1 : Ls)| without bias removal,

where hssδ = CHs (CsCH

s + (σ20 /Ns)I)

−1(1/Ns)X1Ns for the

LMMSE and hssδ = (1/σ20 )F ΩHF H X1Ns for the MF. When

the SNR is high, the LMMSE estimator is expected to havea similar performance as the LS estimator. Thus, we omitthe LS estimator in Figure 5. The MF for hδ (dashed line)

has a higher noise floor than the LMMSE estimator for hδ(solid line), since its output is the correlation of the channelenergy vector with the code autocorrelation function. Thebias term lifts the noise floor of the channel estimate resultingfrom the LMMSE estimator (dotted line) and distorts theestimation, while it does not have much influence on the MF(dashed line with + markers). The stars in the figure presentthe real channel parameters as a reference. The position ofthe highest peak for each curve in Figure 5 indicates thetiming information and the area around this highest peakis the most interesting part, since it shows the estimatedchannel energy profile. Although the LMMSE estimatorwithout bias suppresses the estimation errors over the wholesymbol period, it has a similar performance as all the otherestimators in the interesting part.

4.2. Sample-Level Synchronization. The channel estimate hδhas a duration of one symbol. But we know that the truechannel will generally be much shorter than the symbolperiod. We would like to detect the part that contains mostof the channel energy and cut out the other part in order to

be robust against noise. This basically means that we have toestimate the unknown timing δ. Define the search windowlength as Lw in terms of the number of samples (Lw > 1).The optimal length of the search window depends on thechannel energy profile and the SNR. We will show the impactof different window lengths on the estimation of δ in the next

section. Define hwδ = [hTsδ ,−hTsδ(1 : Lw − 1)]T , and define δas the δ estimate as follows:

δ = argmaxδ

∣∣∣∣∣

δ+Lw∑

n=δ+1

hwδ(n)

∣∣∣∣∣. (54)

This is motivated as follows. According to the definition of

hsδ , when δ > Ls − Ph, hsδ will contain channel informationpartially from skh and partially from −skh, which haveopposite signs. In order to estimate δ, we circularly shift thesearch window to check all the possible sample positions inhsδ and find the position where the search window containsthe maximum energy. If we do not adjust the signs of the twoparts, the δ estimation will be incorrect when the real δ islarger than Ls − Ph. This is because the two parts will canceleach other, when both of them are encompassed by the search

window. That is the reason why we construct hwδ by inverting

the sign of the first Lw − 1 samples in hsδ and attaching them

to the end of hsδ . Moreover, the estimator (54) benefits fromaveraging the noise before taking the absolute value.

4.3. Equalization and Symbol-Level Synchronization. Based

on the channel estimate hδ and the timing estimate δ, we

select a part of hδ to build three different kinds of equalizers.Since the MF equalizer cannot handle IFI and ISI, we onlyselect the first P samples (the frame length in terms of

number of samples) of circshift(hδ ,−δ) as hp. The codematrix C is specified by assigning Ph = P. The estimated

bias bs can be used here. We skip the first δ data samplesand collect the rest of the data samples in a matrix Xδ of sizeLs ×N as in the data model (14) but with M = 1. Therefore,the MF equalizer is constructed as

sT = sign{(

Chp

)T(Xδ − 11×N ⊗ bs

)}, (55)

where s is the estimated symbol vector. Moreover, we alsoconstruct a zero-forcing (ZF) equalizer and an LMMSE

equalizer by replacing h with h, which collects the first Phsamples (the channel length estimate in terms of number of

samples) of circshift(hδ ,−δ), and using δ′ = (Ls− δ) mod Lsin the data model (14). The channel length estimate Phcould be obtained by setting a threshold (e.g., 10% of the

maximum value of hδ) and counting the number of samples

beyond it in hδ . These equalizers can resolve the IFI and theISI to achieve a better performance at the expense of a higher

computational complexity. The estimated bias bs can also beused. We collect the samples in a data matrix X of size 2Ls×Nsimilar as the data model (14) with M = 2. Then the ZFequalizer gives

S = sign{(

Cδ′

(I4 ⊗ h

))†(X− 12×N ⊗ bs

)}, (56)


Segment 1,all-one code

+1 +1· · · +1

Segment 2, PN code

+1 −1 +1 −1 · · · +1 −1

Segment 3, the header,PN code Data

· · · · · ·2M1 M1 + 2Ns

Training sequence

Figure 6: The signal structure for training sequence.

and the LMMSE equalizer gives

S = sign{(ΦHΦ + σ2

0 I4

)−1ΦH(

X− 12×N ⊗ bs)}

, (57)

where Φ = Cδ′(I4 ⊗ h). S is a 4 × N symbol matrix. We

can choose either the second or the third row of S as thedemodulated symbol sequence.

Until now, the sample-level synchronization confirmsthe boundaries of the symbols. However, it is not ableto explore the boundary of the training header, since thesecond segment of the training sequence just employs pairsof “+1,−1” symbols. After the sample-level synchronization,the demodulation is triggered. The third segment of thetraining sequence is a known training symbol pattern. Oncewe find the matching symbol pattern, we can distinguishthe training header. Symbol-level synchronization is thenaccomplished. To summarize the training segments used inevery stage, the overall structure of the training sequence isshown in Figure 6.


We evaluate the performance of different detectors and theperformance of different combinations of channel estimationand equalization schemes for a single user and single delayTR-UWB system. We use a Gaussian second derivative pulse,which is 0.2 ns wide. The delay interval D between twopulses in a doublet is 4 ns. The first segment of the trainingsequence is 2M1 = 16 symbols long, all of which arecomposed of positive pulses. Hence, the observation windowincludes M1 = 8 symbols. The second segment of thetraining sequence has M1 + 2Ns = 38 symbols and employsa pseudonoise (PN) code sequence. The code length Nf is15. The frame-period Tf is 30 ns. The IEEE UWB channelmodel CM3 [27] is employed and truncated to 90 ns, whichrepresents a NLOS channel. The oversampling rate P is 3,which results in Tsam = 10 ns. We define Ep/N0 as thereceived aggregate pulse energy to noise ratio with Ep =∫ |h(t)|2dt, where h(t) represents the composite channelimpulse response including pulse shaping and antennaeffects as we have explained before (see Section 2.1). Thesystem sampling rate is 50 GHz for Matlab simulations.

The test statistics T(x) in (37) and T′1(x) in (39) areassessed in both a theoretical way by using the results inTable 1 and an experimental way by running Monte Carlosimulations. Figure 7 shows the probability of detection PD,1

for the test statistics. The theoretical PD,1 of T(x) with P =3 is evaluated in a quasianalytical method. We generate

100 IEEE CM3 channel realizations, and for each channelrealization, we use Table 1 to evaluate its PD,1 performanceand average the obtained PD,1’s. In the experimental way, westill employ 100 IEEE CM3 channel realizations. For eachrealization, we generate 1000 test statistics to compare withthe threshold and count the probability of detection. In orderto evaluate the detection performance, we divide the SNRinto three ranges. For example, when PFA = 0.1, the lowSNR range is below 0 dB, the medium range is from 0 dBto 6 dB, and the high SNR range is above 6 dB. Accordingto Figure 7, the PD,1 of T(x) with P = 3 (solid line with∗ markers) and the PD,1 of T′1(x) (dash-dotted line with ∗markers) are similar in the low and high SNR ranges. Butin the medium range, T(x) with P = 3 outperforms T′1(x)for about 5% ∼ 10%. For PFA= 10−3 and PFA= 10−5, theperformance differences for these test statistics are large inthe SNR range from 2 dB to 8 dB. T(x) (solid lines with ◦ or♦ markers) can have a detection probability as high as 20%more than T′1(x) (dash-dotted lines with ◦ or ♦ markers)under H1. However, when the test statistic T(x) is employed,we have to estimate the channel energy profile first. On theother hand, if we use the test statistic T′1(x), we only have tosum up the samples, which is easy to implement. But theseresults are only the detection probabilities under H1, whichare used as boundaries for the overall performance under realcircumstances.

As we have mentioned before, PD,1 and PD,1 + (1 −PD,1)PD,1 can be used as a lower boundary and an upperboundary for the overall PDo , respectively. We run MonteCarlo simulations to evaluate the PDo under real circum-stances. For each run, the timing offset is randomly generatedfollowing a uniform distribution in the range ofM1 symbols,meanwhile the channel realization remains the same in orderto exclude the channel influence in the multihypotheses case.In the detection procedure, once the first detection fails, wejump into the next observation window. When the seconddetection fails again, we declare a missed detection. Thesimulation results are shown in Figure 8. The PDo ’s of T(x)with P = 3 (solid lines) lie between two boundaries: theupper boundaries (dashed lines) and the lower boundaries(dotted lines), and these boundaries are getting tighter asthe PFA’s are getting smaller. The PDo ’s of T′1(x) (dash-dottedlines) are a bit higher than the PDo ’s of T(x). Especially forPFA= 10−3, around SNR = 6 dB, the PDo of T′1(x) (dash-dotted line with ◦markers) is 5% larger than the PDo of T(x)(solid line with ◦ markers). That is because T(x) weightseach sample only based on two hypotheses H0 and H1. Theweighting coefficients are not optimal for other hypotheses.


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

PD

,1

−4 −2 0 2 4 6 8 10 12 14

Ep/N0 (dB)

PD,1 for T(x) with P = 3 and for T′1(x):experimental versus theoretical

Experimental T(x) P = 3T′1 (x)Theoretical T(x) P = 3T′1 (x)

PFA = 1e − 1PFA = 1e − 3PFA = 1e − 5

Figure 7: Experimental and theoretical PD,1 performance compari-son for T(x) with P = 3 and T ′1(x).

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

PDo

−4 −2 0 2 4 6 8 10 12 14

Ep/N0 (dB)

PDo for T(x) with P = 3 and for T′1(x): experimental

PDo , T(x), P = 3PDo , T′1(x)Upper bound T(x) P = 3Lower bound T(x) P = 3

PFA = 1e − 1PFA = 1e − 3PFA = 1e − 5

Figure 8: Experimental PDo for T(x) with P = 3 and T ′1(x).

The noise samples may be mistakenly weighted heavily underreal circumstances. On the other hand, T′1(x) accumulatesall the frame samples in the observation window, which isequivalent to equally weighting. According to these results,we can employ T′1(x) because of its simplicity and similarperformance as T(x).

10−3

10−2

10−1

100

MSE

for

chan

nel

esti

mat

ion

0 2 4 6 8 10 12 14 16 18

Ep/N0 (dB)

MSE for symbol long and partial channel estimationTf = 30 ns, Tw = 10 ns, D = 4 ns

LMMSELSMF

Lw = Ls∗10 nsLw = 30 nsLw = 90 ns

Figure 9: MSE performance for channel estimation with differentlengths.

10−4

10−3

10−2

10−1

MSE

forδ

esti

mat

ion

0 2 4 6 8 10 12 14 16 18

Ep/N0 (dB)

MSE for δ estimation Tf = 30 ns, Tw = 10 ns, D = 4 ns

LMMSELSMF

Lw = 30 nsLw = 60 nsLw = 90 ns

Figure 10: MSE performance for δ estimation with various Lw ’s.

500 Monte Carlo runs are used to evaluate the meansquared error (MSE) of hδ versus SNR. In each run, thetiming offset and the channel are randomly generated.The results for the symbol-long estimates and the Lw-longestimates assuming perfect timing are shown in Figure 9.The MF curves (dotted lines) always have the highest noisefloor, since the MF output is the convolution of the chan-nel energy vector with the code autocorrelation function.


10−4

10−3

10−2

10−1

100

BE

R

0 2 4 6 8 10 12 14

Ep/N0 (dB)

BER Tf = 30 ns, Tw = 10 ns, D = 4 ns, Lw = 30 ns

Chan.: MF + Eq: LMMSEChan.: MF + Eq: ZFChan.: MF + Eq: MF

+ bias removal+ biasAWGN

Figure 11: BER performance for CM3.

The performance gap for symbol-long estimates betweenthe LS/LMMSE (dashed lines/solid lines) estimator and theMF is large. When we concentrate on the channel estimatesin a limited range, such as 30 ns (lines with ◦ markers)and 90 ns (lines with ♦ markers), the gap between the MFand the LS/LMMSE estimator is smaller. The normalized

MSE E[|(δ − δ)/Ls|2] for δ estimation is also assessed with

different values of Lw based on different channel estimators.From Figure 10, we see that the δ estimates based onMF (dotted lines), LS (dashed lines), and LMMSE (solidlines) channel estimates with the same Lw have similarperformance, and Lw = 30 ns is the best choice among all.The MSE for δ with Lw = 30 ns (lines with ◦ markers) issaturated after the SNR reaches 10 dB. This is because weuse NLOS channels, where the first path may not be thestrongest and there is always remaining a fractional timingoffset ε. Meanwhile the differences of the MSE for channelestimation with a 90-nanosecond range based on differentmethods (lines with ♦ markers) are quite small around10 dB in Figure 9, which will be employed to construct theequalizer. As a result, we choose the MF as the channelestimator.

Furthermore, combinations of the MF channel estima-tor with different equalizers are investigated. We employLw = 30 ns for synchronization. Figure 11 shows the BERperformance. The BER performance for the MF equalizer(lines with ◦ markers) approaches 0 after 12 dB, while theperformances for the ZF (lines with ∗ markers) and theLMMSE equalizers (lines with � markers) approach 0 after10 dB. Hence, the MF equalizer is 2 dB worse than the ZFand the LMMSE equalizer, and all of them employ 90 nslong channel estimates. The curves of the ZF equalizer andthe LMMSE equalizer overlay each other. The bias doesnot have much impact on them. They have almost the

same performance. As a result, the optimal combinationconsidering cost and performance would be an MF channelestimator with a ZF equalizer. According to the resultsabove, we can remark that the IFI after the integrate-and-dump is not so serious in our simulation setup, sincethe channel energy attenuates exponentially and one framecontains most of the energy. The performance differencesof different equalizers are not so obvious. However, theLMMSE estimator has the potential to handle more seriousIFI and ISI. The effects of the bias on the BER performancecan be ignored, but they have to be taken into account forthe channel estimation (done implicitly, see Section 4.1).When we want to shorten the frame length to achievea higher data rate, more interference will be generated.We then need a more accurate data model to handle thisinterference.

6. Conclusions

We have proposed a complete solution for signal detection,channel estimation, synchronization, and equalization in aTR-UWB system. The scheme is based on a data model,which takes IPI, IFI, and ISI into account and releases theframe time requirements to allow for higher data rate com-munications. Several detectors based on a specific trainingscheme are derived and assessed. We find that the simpledetector, which sums up all the samples in the observationwindow and compares the result with a threshold, gives agood balance between performance and cost. Moreover, thejoint channel and timing estimation is achieved in threedifferent ways. The property of the circulant matrix inthe data model is exploited to reduce the complexity ofthe algorithms. Then a two-stage synchronization strategyis proposed to first achieve sample-level synchronizationand later to achieve symbol-level synchronization. Lastbut not least, three kinds of equalizers are derived. Weevaluate different combinations of channel estimation andequalization schemes using the IEEE UWB channel modelCM3, which shows that the TR-UWB system can beimplemented with low cost and achieves moderate data ratecommunications.

Appendices

A. Noise Analysis

The noise autocorrelation term n0[n] is

n0[n] =∫ nTsam

(n−1)Tsam

n(t)n(t +D)dt, (A.1)

where n(t) is band limited AWGN, and its autocorrelationfunction is Rn(τ) = E[n(t)n(t − τ)] = N0Bsinc(2Bτ).Therefore, n0[n] has approximately zero mean, as a result ofRn(D) ≈ 0 based on the assumption D � 1/B. According to


the Gaussian joint variable theorem [28, 29], its variance canbe derived as

var(n0[n]

)

≈ E[n2

0[n]]

≈∫ nTsam

(n−1)Tsam

∫ nTsam

(n−1)Tsam

[R2n(t − u)+Rn(t − u−D)

× Rn(t +D − u)]dt du.

(A.2)

The second term is the product of two sinc functions offsetby 2D, which is approximately zero by using the propertyof sinc functions saying that sinc(2Bτ)sinc(2B(τ + Δ)) ≈sinc2(2Bτ)δ(Δ), where δ(Δ) is the Kronecker delta. RecallingRn(D) ≈ 0 and Tsam � 1/B and applying Parseval’s theorem,we derive the variance of n0[n] as (also see [30])

var(n0[n]) ≈ N20

4

∫ nTsam

(n−1)Tsam

∫ nTsam

(n−1)Tsam

× [4B2sinc2(2B(t − u))

]dt du

≈ N20

4

∫ nTsam

(n−1)Tsam

[∫ B

−B1df

]dt

= N20BTsam

2.

(A.3)

In summary, n0[n] is approximately zero mean and whitewith variance N2

0BTsam/2. These noise autocorrelation sam-ples are uncorrelated with each other, due to the assumptionTsam � 1/B.

Furthermore, the aggregate noise term n1[n] is

n1[n] = n0[n] + sic j

∫ nTsam

(n−1)Tsam

[h(t − τ)n(t)

+ h(t −D − τ)n(t +D)]dt

+∫ nTsam

(n−1)Tsam

[h(t − τ)n(t +D)

+ h(t +D − τ)n(t)]dt.

(A.4)

Defining

γ′[n]

= sic j

∫ nTsam

(n−1)Tsam

[h(t − τ)n(t) + h(t −D − τ)n(t +D)]dt,

(A.5)

γ′′[n]

=∫ nTsam

(n−1)Tsam

[h(t − τ)n(t +D) + h(t +D − τ)n(t)]dt,

(A.6)

we obtain

n1[n] = γ′[n] + γ′′[n] + n0[n], (A.7)

where γ′[n] and γ′′[n] are random variables, resultingfrom the cross-correlation between the signal and thenoise.

Now we will derive the statistical properties of these tworandom variables. Both γ′[n] and γ′′[n] have zero mean. Thevariance of γ′[n] is calculated as follows:

var(γ′[n]

)

= E[∣∣γ′[n]

∣∣2]

=∫ nTsam

(n−1)Tsam

∫ nTsam

(n−1)Tsam

[h(t − τ)h(u− τ)Rn(t − u)

+ h(t −D − τ)h(u−D − τ)

× Rn(t − u)]dt du.

(A.8)

Let us insert Rn(τ) into the first term (also see [30]) asfollows:

∫ nTsam

(n−1)Tsam

∫ nTsam

(n−1)Tsam

h(t − τ)h(u− τ)Rn(t − u)dt du

=∫ nTsam

(n−1)Tsam

∫ nTsam

(n−1)Tsam

h(t − τ)h(u− τ)

×N0B sinc(2B(t − u))dt du

= N0

2

∫ nTsam

(n−1)Tsam

∫ nTsam

(n−1)Tsam

h(t − τ)

× h(u− τ)∫ B

−Be j2π f (t−u)df dt du

= N0

2

∫ nTsam

(n−1)Tsam

h(t − τ)∫ B

−Be j2π f (t−τ)df dt

×∫ nTsam−τ

(n−1)Tsam−τh(u− τ)e− j2π f (u−τ)d(u− τ)

= N0

2

∫ nTsam

(n−1)Tsam

h(t − τ)

(∫ B

−BH( f )e j2π f (t−τ)df

)dt,

(A.9)


where H( f ) is the Fourier transform of h(u − τ), u ∈ [(n −1)Tsam, nTsam], which is a segment of the aggregate channel.Since the bandwidth B of n(t) is assumed much largerthan the bandwidth of h(u − τ), u ∈ [(n − 1)Tsam, nTsam],we obtain

∫ B−BH( f )e j2π f (t−τ)df ≈ h(t − τ), t ∈ [(n −

1)Tsam, nTsam]. As a result, we obtain similar results as in[24, 25, 30] as follows:

∫ nTsam

(n−1)Tsam

∫ nTsam

(n−1)Tsam

h(t − τ)h(u− τ)Rn(t − u)dt du

≈ N0

2

∫ nTsam

(n−1)Tsam

h(t − τ)h(t − τ)dt

= N0

2R(0,n− δ).

(A.10)

In a similar way, the other term of var(γ′[n]) can bededuced. The same method is applied to var(γ′′[n]) andE[γ′[n]γ′′[n]]. All the derivations are based on the assump-tion that Rn(D) ≈ 0 and Tsam � 1/B. The results aresummarized as follows:

var(γ′[n]

)

≈

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

N0

2

(R(0,n− δ) + R

(0,n− δ − D

Tsam

)),

n = δ + 1, δ + 2, . . . , δ + Ph,

0, elsewhere,(A.11)

var(γ′′[n]

)

= E[∣∣γ′′[n]

∣∣2]

≈

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

N0

2

(R(0,n− δ) + R

(0,n− δ +

D

Tsam

)),

n = δ + 1, δ + 2, . . . , δ + Ph,

0, elsewhere,(A.12)

E[γ′[n]γ′′[n]

]

≈

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

N0

2sic j

(R(D,n− δ) + R

(D,n− δ +

D

Tsam

)),

n = δ + 1, δ + 2, . . . , δ + Ph,

0, elsewhere,(A.13)

E[γ′[n]n0[n]

] = E[γ′′[n]n0[n]

] = 0. (A.14)

In summary, the stochastic properties of n1[n] are

E[n1[n]

] ≈ 0,

var(n1[n]

)

≈

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

N0

2

{2R(0,n− δ) + R

(0,n− δ − D

Tsam

)

+R

(0,n− δ +

D

Tsam

)

+sic j

(2R(D,n−δ)+2R

(D,n−δ+

D

Tsam

))}

+σ20 , n = δ + 1, δ + 2, . . . , δ + Ph,

0, elsewhere,(A.15)

where σ20 = N2

0BTsam/2. These aggregate noise samples areuncorrelated with each other, recalling that Tsam � 1/B. Thisassumption has usually been satisfied by UWB signals (e.g.,in our case Tsam = 10 ns, B ≈ 2/Tp = 10 GHz, then 2BTsam =200). Also n0[n] and n1[n] can be assumed as Gaussianrandom variables by invoking the sampling theorem and thecentral limit theorem [28].

B. Detector Derivation

In summary, the statistics of x in (31) are

H0: x a∼ N(0, σ2

0 I), (B.1)

H1: x a∼ N(

1M1Nf ⊗ z f , diag(λ)). (B.2)

The Neyman-Pearson detector decides H1 if

L(x) = p(

x; H1)

p(

x; H0) > γ, (B.3)

where γ is found by making the probability of false alarm PFA

to satisfy

PFA = Pr{L(x) > γ; H0

} = α. (B.4)

L(x) can be expressed as

L(x) =

P∏

i=1

1

(2π(2N0z f [i] + σ20 ))

(M1Nf /2) exp

[− 1

2(2N0z f [i] + σ20 )

∑(k+M1−1)Nf −1

n=(k−1)Nf(x[nP + i]− z f [i])2

]

1

(2πσ20 )

M1Nf P/2exp

[− 1

2σ20

∑(k+M1−1)Nf P

n=(k−1)Nf P+1x2[n]

] . (B.5)


Defining σ21 [i] = 2N0z f [i] + σ2

0 , inserting it into ln L(x), andeliminating the constants leads to

ln L(x)

=P∑

i=1

{1

2σ20

(k+M1−1)Nf −1∑

n=(k−1)Nf

x2[nP + i]

− 12σ2

1 [i]

(k+M1−1)Nf −1∑

n=(k−1)Nf

(x[nP + i]− z f [i]

)2}

=P∑

i=1

{2z f [i]

2σ21 [i]

(k+M1−1)Nf −1∑

n=(k−1)Nf

x[nP + i]

+

(1

2σ20− 1

2σ21 [i]

)(k+M1−1)Nf −1∑

n=(k−1)Nf

x2[nP + i]

}

=P∑

i=1

{z f [i]

σ21 [i]

(k+M1−1)Nf −1∑

n=(k−1)Nf

x[nP + i]

+N0z f [i]

σ20σ

21 [i]

(k+M1−1)Nf −1∑

n=(k−1)Nf

x2[nP + i]

}

=P∑

i=1

z f [i]

σ21 [i]

{(k+M1−1)Nf −1∑

n=(k−1)Nf

x[nP + i]

+N0

σ20

(k+M1−1)Nf −1∑

n=(k−1)Nf

x2[nP + i]

}.

(B.6)

Then, the test statistic is

T(x) =P∑

i=1

z f [i]

σ21 [i]

{(k+M1−1)Nf −1∑

n=(k−1)Nf

x[nP + i]

+N0

σ20

(k+M1−1)Nf −1∑

n=(k−1)Nf

x2[nP + i]

}.

(B.7)

C. Statistic of the Detectors

C.1. Detector T1(x). Since x is assumed to be a Gaussianvector, T1(x) also follows a Gaussian distribution:

H0: T1(x) a∼ N

(0,M1Nf σ0

2P∑

i=1

z2f [i]

σ41 [i]

),

H1: T1(x) a∼ N

(M1Nf

P∑

i=1

z2f [i]

σ21 [i]

, M1Nf

P∑

i=1

z2f [i]

σ21 [i]

).

(C.1)

Actually, if the condition z f [i]/N0 � BTsam/4 is satisfied,which means the signal-to-noise ratio (SNR) is low, the term2N0z f [i] can be ignored in the variance of x under H1, andthen T1(x) can be derived directly.

C.2. Detector T2(x). Since the different entries of x havedifferent weighting factors in T2(x), we collect the datasamples bearing the same weighting factor into the samegroup. Therefore, there are P groups of data samples,and they are assumed to be uncorrelated. Each group∑(k+M1−1)Nf −1

n=(k−1)Nfx2[nP + i] follows a Chi-squared distribution.

However, T2(x) is still assumed to be a Gaussian variable, asit is the sum of the weighted groups. Then, we can obtain

H0 :

(k+M1−1)Nf −1∑

n=(k−1)Nf

x2[nP + i]σ2

0

a∼ χ2M1Nf

,

T2(x) a∼ N

(M1Nf σ0

2P∑

i=1

z f [i]

σ21 [i]

, 2M1Nf σ04P∑

i=1

z2f [i]

σ41 [i]

),

H1 :

(k+M1−1)Nf −1∑

n=(k−1)Nf

x2[nP + i]σ2

1 [i]a∼ χ2

M1Nf

(M1Nf E

2f [i]

σ21 [i]

),

T2(x) a∼ N

(M1Nf

P∑

i=1

z f [i]

(1 +

z2f [i]

σ21 [i]

),

2M1Nf

P∑

i=1

z2f [i]

(1 +

2z2f [i]

σ21 [i]

)),

(C.2)

where χ2ν is the central Chi-squared pdf with ν degrees of

freedom, which has mean ν and variance 2ν. Meanwhile,χ2

ν (λ) is the noncentral Chi-squared pdf with ν degrees offreedom and noncentrality parameter λ. Hence, it has meanν + λ and variance 2ν + 4λ.

Acknowledgments

This work was supported in part by STW under the Greenand Smart Process Technologies Program (Project 7976) andby NWO-STW under the VICI programme (DTC. 5893).Parts of this paper were presented in [17].

References

[1] L. Yang and G. B. Giannakis, “Ultra-wideband communica-tions: an idea whose time has come,” IEEE Signal ProcessingMagazine, vol. 21, no. 6, pp. 26–54, 2004.

[2] Z. Tian and G. B. Giannakis, “BER sensitivity to mistimingin ultra-wideband impulse radios—part II: fading channels,”IEEE Transactions on Signal Processing, vol. 53, no. 5, pp. 1897–1907, 2005.

[3] R. Blazquez, P. Newaskar, and A. Chandrakasan, “Coarseacquisition for ultra wideband digital receivers,” in Proceedingsof IEEE International Conference on Acoustics, Speech andSignal Processing (ICASSP ’03), vol. 4, pp. 137–140, HongKong, April 2003.

[4] V. Lottici, A. D’Andrea, and U. Mengali, “Channel estimationfor ultra-wideband communications,” IEEE Journal on SelectedAreas in Communications, vol. 20, no. 9, pp. 1638–1645, 2002.


[5] S. R. Aedudodla, S. Vijayakumaran, and T. F. Wong, “Timingacquisition in ultra-wideband communication systems,” IEEETransactions on Vehicular Technology, vol. 54, no. 5, pp. 1570–1583, 2005.

[6] Z. Tian and G. B. Giannakis, “A GLRT approach to data-aidedtiming acquisition in UWB radios—part I: algorithms,” IEEETransactions on Wireless Communications, vol. 4, no. 6, pp.2956–2967, 2005.

[7] J. Kusuma, I. Maravic, and M. Vetterli, “Sampling with finiterate of innovation: channel and timing estimation for UWBand GPS,” in Proceedings of IEEE International Conference onCommunications (ICC ’03), vol. 5, pp. 3540–3544, Anchorage,Alaska, USA, May 2003.

[8] L. Yang and G. B. Giannakis, “Timing ultra-wideband signalswith dirty templates,” IEEE Transactions on Communications,vol. 53, no. 11, pp. 1952–1963, 2005.

[9] I. Guvenc, Z. Sahinoglu, and P. V. Orlik, “TOA estimationfor IR-UWB systems with different transceiver types,” IEEETransactions on Microwave Theory and Techniques, vol. 54, no.4, pp. 1876–1886, 2006.

[10] R. Hoctor and H. Tomlinson, “Delay-hopped transmitted-reference RF communications,” in Proceedings of IEEEConference on Ultra Wideband Systems and Technologies(UWBST ’02), pp. 265–269, Baltimore, Md, USA, May 2002.

[11] M. Ho, V. S. Somayazulu, J. Foerster, and S. Roy, “A differentialdetector for an ultra-wideband communications system,” inProceedings of the 55th IEEE Vehicular Technology Conference(VTC ’02), vol. 4, pp. 1896–1900, Birmingham, Ala, USA, May2002.

[12] Z. Tian and B. M. Sadler, “Weighted energy detection of ultra-wideband signals,” in Proceedings of the 6th IEEE Workshopon Signal Processing Advances in Wireless Communications(SPAWC ’05), pp. 1068–1072, New York, NY, USA, June 2005.

[13] Y. Vanderperren, G. Leus, and W. Dehaene, “A reconfig-urable pulsed UWB receiver sampling below nyquist rate,” inProceeedings of the IEEE International Conference on Ultra-Wideband (ICUWB ’08), vol. 2, pp. 145–148, Hannover,Germany, September 2008.

[14] J. Kim, S. Yang, and Y. Shin, “A two-step search schemefor rapid and reliable UWB signal acquisition in multipathchannels,” in Proceedings of IEEE International Conference onUltra-Wideband (ICU ’05), pp. 355–360, Zurich, Switzerland,September 2005.

[15] S. Gezici, Z. Sahinoglu, A. F. Molisch, H. Kobayashi, and H.V. Poor, “Two-step time of arrival estimation for pulse-basedultra-wideband systems,” EURASIP Journal on Advances inSignal Processing, vol. 2008, Article ID 529134, 11 pages, 2008.

[16] M. R. Casu and G. Durisi, “Implementation aspects of atransmitted-reference UWB receiver,” Wireless Communica-tions and Mobile Computing, vol. 5, no. 5, pp. 537–549, 2005.

[17] Y. Wang, G. Leus, and A.-J. van der Veen, “On digital receiverdesign for transmitted reference UWB,” in Proceedings of IEEEInternational Conference on Ultra-Wideband (ICUWB ’08),vol. 3, pp. 35–38, Hannover, Germany, September 2008.

[18] S. Bagga, L. Zhang, W. A. Serdijn, J. R. Long, and E. B. Busk-ing, “A quantized analog delay for an IR-UWB quadraturedownconversion autocorrelation receiver,” in Proceedings ofIEEE International Conference on Ultra-Wideband (ICU ’05),pp. 328–332, Zurich, Switzerland, September 2005.

[19] R. C. Qiu, H. Liu, and X. Shen, “Ultra-wideband for multipleaccess communications,” IEEE Communications Magazine,vol. 43, no. 2, pp. 80–87, 2005.

[20] R. Djapic, G. Leus, and A.-J. van der Veen, “Blind syn-chronization in asynchronous UWB networks based on thetransmit-reference scheme,” in Proceedings of the 38th AsilomarConference on Signals, Systems and Computers (ACSSC ’04),vol. 2, pp. 1506–1510, Pacific Grove, Calif, USA, November2004.

[21] Q. H. Dang and A.-J. van der Veen, “A decorrelating multiuserreceiver for transmit-reference UWB systems,” IEEE Journal onSelected Topics in Signal Processing, vol. 1, no. 3, pp. 431–442,2007.

[22] J. D. Choi and W. E. Stark, “Performance of ultra-widebandcommunications with suboptimal receivers in multipathchannels,” IEEE Journal on Selected Areas in Communications,vol. 20, no. 9, pp. 1754–1766, 2002.

[23] K. Witrisal, G. Leus, M. Pausini, and C. Krall, “Equivalentsystem model and equalization of differential impulse radioUWB systems,” IEEE Journal on Selected Areas in Communica-tions, vol. 23, no. 9, pp. 1851–1862, 2005.

[24] T. Q. S. Quek and M. Z. Win, “Analysis of UWB transmitted-reference communication systems in dense multipath chan-nels,” IEEE Journal on Selected Areas in Communications, vol.23, no. 9, pp. 1863–1874, 2005.

[25] T. Q. S. Quek, M. Z. Win, and D. Dardari, “Unified anal-ysis of UWB transmitted-reference schemes in the presenceof narrowband interference,” IEEE Transactions on WirelessCommunications, vol. 6, no. 6, pp. 2126–2139, 2007.

[26] S. M. Kay, Fundamentals of Statistical Signal Processing, Volume1: Estimation Theory, Prentice-Hall, Upper Sadle River, NJ,USA, 1993.

[27] J. R. Foerster, “Channel modeling sub-committee reportfinal,” Tech. Rep. IEEE P802.15-02/368r5-SG3a, IEEE P802.15Working Group for WPANs, November 2002.

[28] H. Stark and J. W. Woods, Probability, Random Processes,and Estimation Theory for Engineers, Prentice-Hall, EnglewoodCliffs, NJ, USA, 1994.

[29] M. K. Simon, S. M. Hinedi, and W. C. Lindsey, DigitalCommunication Techniques, Prentice-Hall, Englewood Cliffs,NJ, USA, 1995.

[30] L. Yang and G. B. Giannakis, “Optimal pilot waveform assistedmodulation for ultrawideband communications,” IEEE Trans-actions on Wireless Communications, vol. 3, no. 4, pp. 1236–1249, 2004.


Research Article

Autocorrelation Properties of OFDM Timing SynchronizationWaveforms Employing Pilot Subcarriers

Oktay Ureten1 and Selcuk Tascıoglu2

1 Communications Research Centre, Terrestrial Wireless Systems Research Branch, Ottawa, ON, Canada K2H 8S22 Electronics Engineering Department, Ankara University, 06100 Ankara, Turkey

Correspondence should be addressed to Oktay Ureten, [email protected]

Received 13 June 2008; Accepted 7 January 2009


We investigate the autocorrelation properties of timing synchronization waveforms that are generated by embedded frequencydomain pilot tones in orthogonal frequency division multiplex (OFDM) systems. The waveforms are composed by summing aselected number of OFDM subcarriers such that the autocorrelation function (ACF) of the resulting time waveform has desirablesidelobe behavior. Analytical expressions for the periodic and aperiodic ACF sidelobe energy are derived. Sufficient conditionsfor minimum and maximum aperiodic ACF sidelobe energy for a given number of pilot tones are presented. Several usefulproperties of the pilot design problem, such as invariance under transformations and equivalence of complementary sets aredemonstrated analytically. Pilot tone design discussion is expanded to the ACF sidelobe peak minimization problem by includingvarious examples and simulation results obtained from a genetic search algorithm.

Copyright © 2009 O. Ureten and S. Tascıoglu. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

1. Introduction

Timing synchronization is an essential task of an orthog-onal frequency division multiplex (OFDM) receiver, whichrequires alignment of the discrete Fourier transform (DFT)segments with OFDM symbol boundaries. Timing align-ment errors may occur in cases where the DFT aperturecontains part of the guard interval that has been distortedby intersymbol interference (ISI). This results in loss oforthogonality due to spectral leakage [1], therefore, leadingto performance degradation.

Timing synchronization techniques proposed for OFDMsystems can be classified as either blind or data-aided. Blindapproaches exploit the inherent redundancy in the OFDMsignal structure due to, for example, cyclic prefix [2] orwindowing [3]. Radiometric detection and change-pointestimation principles may also be employed to estimate time-of-arrival of a data frame in burst mode systems [4, 5]. Eventhough blind techniques have the advantage of not requiringextra overhead, their performance usually degrades whenthe noise level is high or the channel distortion is severe,

therefore, their use is mostly limited to high signal-to-noiseratio (SNR) applications [2].

Data-aided techniques offer the advantage of superiorperformance in low SNR applications at the expense ofreduced spectral efficiency. These techniques benefit from thecorrelation gain of a synchronization waveform embeddedinto the transmitted signal, which can be maximized by ajudicious design of the waveform. In this scheme, the receivercorrelates a distorted received signal with its known replicaand marks the instant of maximum correlation as an estimateof the timing synchronization point. High correlation gainsimprove the detection of peaks buried under noise, therefore,leading to better noise immunity.

One way of embedding a synchronization waveforminto a transmitted signal is to prefix it to the beginningof the time-domain waveform in the form of a pream-ble. Sequences with good autocorrelation properties arecommonly employed in this approach. There is extensiveliterature on designing sequences with good autocorrelationproperties; see for example [6] for an overview. Chusequences, for instance, have perfect periodic autocorrelation


properties, that is, their autocorrelation values are zeroexcept at zero lag [7]. Chu sequences belong to a class ofsequences called constant amplitude zero autocorrelation(CAZAC) sequences and can be generated for arbitrarylengths. Another useful class of sequences is the generalizedBarker sequences which have maximum aperiodic autocor-relation sidelobe amplitudes of one [8]. Unlike CAZACsequences, there is no straightforward design scheme forgeneralized Barker sequences and only sequences of lengthup to 63 are known to date [9]. Even though, they havefavorable autocorrelation properties, neither CAZAC norBarker sequences have bandwidth restrictions. In bandlim-ited systems, waveforms have to be spectrally shaped to meetgiven bandwidth requirements to mitigate leakage to/fromneighboring channels. After spectral shaping, both CAZACand Barker sequences lose their optimal properties [10].

In addition to the time-domain embedding, synchro-nization waveforms can also be embedded into the transmit-ted signal in the frequency domain by allocating a number ofsubcarriers for timing in OFDM systems. In this approach,the transmitter encodes a number of pilot subcarriers withknown phases and amplitudes to create a signal for timingsynchronization. As the timing clock is spread over a numberof discrete tones in this approach [11], synchronization canbe achieved more effectively in selective fading channels [12].Moreover, this approach facilitates the design of spectrallylimited synchronization waveforms because the transmittedsignal’s spectral characteristics can be easily controlled bydeactivating appropriate subcarriers.

In this paper, we address the autocorrelation propertiesof synchronization waveforms created by embedded pilotsubcarriers in OFDM systems. The outline of the paper is asfollows: in Section 2, the problem definition is given and ourmotivations are explained. A literature survey is presentedand our contributions are summarized. In Section 3, back-ground information and mathematical definitions requiredfor derivations of the analytical expressions are given. InSection 4, sidelobe behavior of both periodic and aperiodicautocorrelation functions (ACFs) of the synchronizationwaveforms are investigated and analytical expressions for thesidelobe energies are derived. Some important properties ofACFs resulted from analytical expressions are introduced.In Section 5, minimization of the ACF sidelobe peak isconsidered as a constrained nonlinear integer programmingproblem and a suboptimal genetic search algorithm isutilized. In Section 6, simulation results obtained for variouscases are presented. A summary and conclusions are givenin Section 7. For ease of exposition most of our proofs arerelegated to Appendices A, B, and C.

2. Preliminaries

2.1. Problem Definition. In this paper, we consider timingsynchronization waveforms that are created by summinga number of orthogonal subcarriers called pilot tones.Merits of such synchronization waveforms depend on theselected parameters of the pilot tones such as locations,amplitudes, and phases. Although pilot design could take

Vacant channels Occupied channels

CH1 CH2 CH3 CH4

Figure 1: Timing synchronization for noncontiguous OFDM-based dynamic spectrum access poses challenges due to spectrallimitation requirements, see, for example, [13, 14]. A user maydecide to transmit in both vacant channels CH1 and CH3 withoutinterfering with the user(s) in channels CH2 and CH4. Morerobust timing may be possible if the synchronization waveform isspread over both CH1 and CH3, which can be achieved withoutcausing harmful interference to other user(s) by the pilot tone-based synchronization scheme investigated in this paper.

into account the combinations of all three parameters, inthis work we narrow our focus to pilot locations only. Wealso assume that the number of pilots that can be allocatedfor synchronization is limited, that is, the number of pilottones is less than the total number of available OFDMsubcarriers. Thus, the problem addressed is manifestedas the selection of the best subcarrier locations for pilotsymbols such that the synchronization waveform has goodautocorrelation properties. A mathematical formulation anda rigorous definition of the problem is presented in Section 4.

For most design problems, solutions require solvingconstrained nonlinear integer programming problems, forwhich analytical treatments are generally difficult. In thispaper, we focus our attention on special cases so thatwe can derive analytical expressions to uncover the linksbetween pilot placement and the autocorrelation behaviorand discover some useful properties of the ACF to easewaveform design process for more complex problems.

2.2. Motivation. Our motive for considering the definedproblem is three-fold. One reason is the overhead issue; if adesign requirement can be met by using only a small numberof pilot tones, then the remaining subcarriers can be used forother purposes. Although the amount of overhead savingsis small in applications where the synchronization waveformis needed only in the first frame of a long packet, savingscan be significant in systems that require synchronizationof each OFDM frame independently, as in the ALOHAenvironment [12].

Our second motivation is the robustness of the reducedpilot waveforms to deviations from their design specifica-tions. It may be possible to design a waveform with higherautocorrelation merit if all available subcarriers are utilized.However, this waveform will experience every spectral notchin the frequency-selective channel resulting in a deviationin its merit from the designed value, depending on thedegree of selectivity of the channel. Although the meritof a waveform designed using a reduced number of pilotsmay be smaller than that of a waveform using all available


subcarriers, a reduced pilot waveform may be preferable inseverely selective channels with multiple spectral notches, asit is more likely to keep its designed merit.

Our last motivation for considering the presented schemeis the increased demand for generating waveforms overfragmented (noncontiguous) frequency bands. Noncontigu-ous OFDM is being considered as a candidate solutionfor dynamic spectrum access due to its flexibility andadjustability to certain spectrum restrictions. More effectivesynchronization waveforms conforming to such spectralrestrictions can be designed using the frequency domainpilot allocation approach, see for example the case shown inFigure 1. Therefore, the material presented in this paper canbe exploited in synchronization waveform design for agileradios that are able to operate over fragmented frequencybands.

2.3. Related Work. Embedded frequency domain pilot sub-carriers have been utilized to ease several tasks such as chan-nel estimation [15], peak-to-average power ratio reduction[16], robust estimation of frequency offsets in frequencyselective fading channels [17] and suppression of out-of-band radiation [18] in OFDM systems. Designing pilot tonesfor specific purposes requires judicious selection of specificparameters of the pilots such as locations, amplitudes and/or,phases. For the purpose of channel estimation, for example,the optimality condition stipulates equidistant pilots withuniform amplitudes [19, 20]. Peak-to-average power ratioreduction and sidelobe suppression problems, on the otherhand, can be solved by quadratic optimization of the pilotamplitudes and phases [16, 18].

Pilot tone-assisted synchronization schemes have beenadopted by wireless communications standards such asIEEE 802.11a [21] and Digital Radio Mondiale (DRM)[22]. In the IEEE 802.11a standard, uniformly spaced pilottones modulated by a complex sequence are used to createa periodic preamble waveform to ease frame detectionand timing synchronization. Periodic preambles facilitatesimple autocorrelation-based metrics for timing recovery;however, a timing metric plateau inherent in these methodscauses large estimation errors. In [23–25], various periodicpreamble structures and metrics are proposed to improveestimation performance by creating sharper correlationpeaks. In [26], performance of auto- and cross-correlation-based metrics is compared in terms of synchronizationperformance in an 802.11a system. The cross-correlation-based metric utilizes the long preamble for synchronization,which is created by modulating all useful subcarriers witha binary sequence. In [10], instead of using a binarysequence, the phases of all useful subcarriers are optimizedthrough a greedy search algorithm such that the resultingtime-domain waveform has good autocorrelation properties.The authors show that such synchronization waveformsoutperform Barker and CAZAC sequences in a bandlimitedsystem.

Due to reasons stressed in Section 2.2, some applicationsmay obligate the use of a subset of all useful subcarriers.In this case, waveform design requires optimal selection of

pilot tone locations as each selection results in a differ-ent autocorrelation sidelobe pattern. Such an approach isadopted in [12], in the context of an OFDM/FM system forALOHA environment in which each OFDM frame has tobe synchronized independently. Due to the limited availablespectrum, only a subset of subcarriers is reserved to keepthe overhead small. A suboptimal heuristic approach is usedto reduce the search time of the pilot location selectionprocess by dividing the search space into subgroups. Abrute-force search is then performed in a smaller subsetof subchannels and additional subchannels that providesmaller sidelobes are added into the set. In [11], a pilottone-based synchronization scheme, inspired from a sonarwaveform design approach presented in [27], is proposed fordiscrete multitone spread spectrum communication systems.Nonuniformly spaced pilot tones are utilized to minimizethe harmonics of the autocorrelation function and reducehigh sidelobe peaks by spacing pilot tones at a prime numberor a Fibonacci series increment of the minimum frequencyspacing. Even though the proposed selections result in bettersidelobe behavior than the periodic placement, the proposedpilot configurations are far from being optimal and their useis limited due to particular spacing restrictions.

In the DRM standard [22], time reference subcarriers areallocated to perform ambiguity resolution. Locations of apredefined number of pilot cells are given in the standard;however, the design process is not disclosed. In [28], asuboptimal genetic search algorithm is proposed to yieldan effective solution for the pilot tone location selectionproblem.

2.4. Contributions. Although the effect of pilot tone locationson the characteristics of the ACF has been previously notedand suboptimal search schemes have been proposed, neithera detailed investigation nor an analytical treatment of theproblem has been presented in the literature.

In this paper, an in-depth discussion of pilot tone designfor timing synchronization in OFDM systems is presented.Analytical expressions for both periodic and aperiodic ACFsidelobe energy are derived and sufficient conditions forobtaining minimum and maximum aperiodic ACF sidelobeenergy are presented. Some useful properties of the pilotdesign problem such as invariance under transformationsand equivalence of complementary sets are demonstratedanalytically. Finally, the pilot tone design discussion isexpanded by including various examples and simulationresults obtained by using a genetic search algorithm.

3. Basic Definitions for Derivations

In this section, background information required for thederivation of analytical expressions is presented along withnecessary definitions of merit measures that will be used inthe following to evaluate periodic and aperiodic ACFs.

3.1. Autocorrelation Function. Correlation gain of a syn-chronization waveform is associated with its autocorrelation


characteristics. The ACF measures self-similarity of a wave-form at various time lags; therefore, it is a suitable tool forestimating time of arrival of a known signal.

The ACF of a periodic discrete-time signal s(n) is definedas

R(τ) =N−1∑

n=0

s(n)s∗(n + τ), (1)

where τ is the integer time lag and N is the period ofs(n). R(τ) is also periodic with period N and is called theperiodic ACF. If s(n) is not periodic, then the aperiodic ACFis employed, which is given by

C(τ) =N−τ−1∑

n=0

s(n)s∗(n + τ), (2)

where 0 ≤ τ ≤ N − 1. Here, N is the length of the signalsequence which is equal to single-sided autocorrelationlength.

3.2. Merit of Autocorrelation Functions. Merit of an ACF isassociated with its sidelobe pattern, that is, the off-peakvalues of the correlation. A common approach to evaluate themerit of an ACF is to measure a suitable norm of its sidelobes.The pth-norm of the ACF sidelobe is defined as

Lp =(N−1∑

τ=1

|ϕ(τ)|p)1/p

, (3)

where ϕ can be either periodic or aperiodic ACF. The mostwidely used norms in merit evaluations are Euclid (p = 2)and Tchebychev (p = ∞) norms (also known as maximumnorm), which are used to define sidelobe energy and sidelobepeak of the ACF as given in the following:

E = L22 =

N−1∑

τ=1

|ϕ(τ)|2,

Π = L∞ = maxτ /= 0

|ϕ(τ)|·(4)

These norms are usually employed to calculate the meritfactor (MF) and peak-to-side-peak ratio (PSPR); they arealso defined as:

MF = ϕ(0)2

2E,

PSPR = ϕ(0)Π·

(5)

MF and PSPR can be combined to develop new meritmeasures as the minimization of one merit may not alwaysminimize the other. Selection of which norm to considerusually depends on the specific problem; however, sidelobeenergy is often employed for analytical investigations as it ismore tractable than the maximum norm.

4. Synchronization Waveform andthe Characteristics of Its ACF

Analytical treatment of pilot tone location selection problemis difficult for most cases in which the maximum norm ofthe ACF sidelobe is involved. In this section, we considerthe Euclidian norm due to its tractability and obtain someanalytical results for the ACF sidelobe energy.

4.1. Synchronization Waveform. An OFDM waveform iscomposed of a sum of orthogonal subcarriers modulated bydata and/or pilot symbols. Let us assume that a subset of allsubcarriers is reserved for pilot symbols to achieve robusttiming synchronization and the remaining subcarriers aremodulated with data. The time waveform is given by theIDFT of the modulated symbols

x(n) = 1N

N−P∑

k=1

akejΩkn

︸︷︷︸d(n)

+1N

P∑

k=1

bkejwkn

︸︷︷︸s(n)

, (6)

where Ωk = (2π/N)αk , wk = (2π/N)βk, αk ∈ Sd, k =1, . . . ,N − P, βk ∈ Sp, k = 1, . . . ,P, and Sd and Sp are thedata and pilot tone sets, respectively. N is the DFT size, Pis the number of pilot subcarriers, ak and bk are the dataand pilot symbols, respectively. The signals d(n) and s(n)are only a function of data and pilot symbols, respectively,and these waveforms are orthogonal to each other (becauseSp ∩ Sd = ∅) when there is no frequency offset. In thefollowing, s(n) will refer to the synchronization waveform.

Suppose that P out of a total of N subcarriers of anOFDM symbol are reserved for synchronization and all pilottones are modulated by unit amplitude zero phase symbols.By fixing amplitudes and phases of the pilot symbols, wefocus our attention on pilot locations only for simplicity.The corresponding time domain synchronization waveformis given by

s(n) = 1N

P∑

k=1

e jwkn. (7)

Our objective is to select Sp such that the ACF of s(n)has a desirable sidelobe pattern. Both L2 and L∞ normsare considered and analytical investigation of periodic andaperiodic ACF sidelobe energy is presented in subsequentsections.

4.2. Sidelobe Energy of the Periodic ACF

Theorem 1 (Periodic ACF sidelobe energy theorem). Side-lobe energy of the periodic ACF of s(n) is given by

E = NP − P2

N2· (8)

Proof. See Appendix A.

The sidelobe energy expression given in (8) can berewritten as

E = r(1− r), (9)


0

0.0625

0.125

0.1875

0.25

Side

lobe

ener

gy

0 N/4 N/2 3N/4 N

Number of pilot tones

Figure 2: Relation between periodic ACF sidelobe energy and thenumber of pilot tones.

where r = P/N is the ratio of the number of pilot tones tothe total number of subcarriers. The expression (9) showsthat the sidelobe energy of periodic ACF is a function ofthe ratio of the number of pilot tones to the total numberof subcarriers only, thus it is independent of pilot tonelocations.

Although the sidelobe energy expression given in (8) isderived under a zero phase assumption of the pilots, the sameresult holds when the pilot tones are modulated with nonzerophase symbols. This is due to the Wiener-Khinchin theorem,which relates the periodic ACF to the power spectral densityvia the Fourier transform. Different selections of pilot phasesresult in different synchronization waveforms; however, theywill have a common ACF as their power spectral densityfunctions are the same. Therefore, pilot phase selection willnot improve sidelobe energy characteristics of the periodicACF; however, proper selection of phases helps to reduce thepeak-to-average-power-ratio (PAPR) of the synchronizationwaveform.

The relation between the number of pilot tones and thesidelobe energy of the periodic ACF is plotted in Figure 2.As seen from this figure, sidelobe energy increases with thenumber of pilot tones until P = �N/2� and then reduces backto zero when all subcarriers are used. However, in practice,using all subcarriers may not be possible due to bandwidthconstraints. Hence, a synchronization waveform with perfectautocorrelation cannot be designed. This result is a directconsequence of the periodic ACF sidelobe energy theorem,which is formulated in (8).

4.3. Sidelobe Energy of the Aperiodic ACF. If a periodiccorrelation is employed for synchronization then at least twoperiods of the waveform must be embedded in the transmit-ted signal. If this is not feasible, due to overhead limitations,one may opt to use aperiodic, instead of periodic, correlation.In this section, aperiodic autocorrelation properties of thesynchronization waveform are investigated, the aperiodic

ACF sidelobe energy theorem is stated and some importantcorollaries resulted from this theorem are presented.

Theorem 2 (Aperiodic ACF sidelobe energy theorem). Side-lobe energy of the aperiodic ACF of s(n) is given by

E = P

3N− P2

2N2+

P

6N3+

12N3

P∑

k=1

P∑

l=1, l /= kcsc2

(wk −wl

2

)·

(10)

Proof. See Appendix B.

Immediate results of this theorem follow.

Corollary 1. The sidelobe energy of the aperiodic ACF dependson pilot tone locations.

Proof. See (10).

The aperiodic ACF sidelobe energy expression given in(10) can be rewritten as a sum of two terms as follows:

E = κ + Δ, (11)

where

κ = P

3N− P2

2N2+

P

6N3,

Δ = 12N3

P∑

k=1

P∑

l=1, l /= kcsc2

(wk −wl

2

)·

(12)

The term κ is a function of the number of pilot tones whereasthe term Δ is a function of subcarrier locations, therefore,sidelobe energy depends on the number of pilot tones as wellas pilot locations.

Corollary 2 (Invariance property). The ACF sidelobe energyremains unchanged under any transformation of pilot set thatdoes not change the relative distances of the pilot tones.

Proof. The sidelobe energy expression given in (10) is a func-tion of the differences of pilot locations, that is, only relativepositions of the pilots determine the amount of sidelobeenergy. Thus any transformation such as translations, cyclicshifts, or reversal of the pilot locations does not change themerit of the original set.

The invariance property indicates the existence of mul-tiple sets with identical ACF properties, which can be easilyobtained by simple transformations of the original set. Thisproperty can be exploited in adaptive waveform designapplications in which waveform parameters are requiredquickly adapt to changes in the RF environment.

The term Δ is a sum of sidelobe energy contributions dueto each pilot pair. Each pilot pair contributes to the sidelobeenergy with an amount depending on the separation betweentwo pilots. A plot showing the relation between pair distanceand corresponding sidelobe energy contribution is displayedin Figure 3 for N = 64. As seen from the figure, sidelobeenergy contribution decreases with increasing pairwise pilot


0

0.2

0.4

0.6

0.8

1×10−4

Side

lobe

ener

gyco

ntr

ibu

tion

0 4 8 12 16 20 24 28 32

Pairwise distance

E0δE1

λδE2

Figure 3: Relation between pairwise pilot distance and aperiodicACF sidelobe energy contribution for N = 64.

distance. This observation leads to some important resultsof the aperiodic ACF sidelobe energy theorem, which aresummarized in the following two remarks.

Remark 1 (Maximum aperiodic ACF sidelobe energy). Thesidelobe energy of the aperiodic ACF is maximum when pilottones are placed adjacently.

As the sidelobe energy contribution of a pilot pairdecreases with the pilot separation, total sidelobe energyis maximized when pilots are placed as closely as possible.This condition is satisfied when pilot tones are adjacent (nospacing between the pilots). The sidelobe energy value dueto this placement is the maximum among other possibleplacements for the given number of pilot tones.

Remark 2 (Minimum aperiodic ACF sidelobe energy). Thesidelobe energy of the aperiodic ACF is minimum when pilottones are placed uniformly.

As the sidelobe energy contribution of a pilot pairdecreases with the pilot separation, total sidelobe energy isminimized when pilots are placed maximally spaced. Thiscondition is satisfied when pilots are placed periodically(equal spacing between the pilots). The energy value dueto this placement is the minimum possible sidelobe energyvalue for the given number of pilot tones.

An example is provided in Figure 4 to explain Remark 2.Assume that two pilot tones P1 and P2 are located at adistance of 2Λ, and a third pilot P3 is placed between P1

and P2 such that the distances from P1 to P3 and P2 to P3

are equal. We name this placement scenario the equilibriumstate E0, (see Figure 3). Suppose the pilot P3 moves awayfrom P1 by an amount of λ, to reduce the sidelobe energycontribution of the P1P3 pair by an amount of δE1. Thismovement, however, decreases the P2P3 distance, therefore,the energy contribution due to P2P3 pair increases by an

Λ Λ

P1 Λ + λ Λ− λP3 P3 P2

λ

Figure 4: If a pilot moves away from one pilot, it becomes closer toanother pilot in its neighborhood.

amount, δE2. It can be shown that the sidelobe energycontribution f (x) = csc2(x) is a convex function of thepairwise distance, therefore, δE2 is always greater than δE1.This requires that the sidelobe energy be higher than in theequilibrium state when the symmetry in pilot placement isbroken.

In order to place all P pilot tones at equal distances,N/P must be an integer. Finding the minimum sidelobeenergy value and the corresponding pilot placement is notstraightforward if N/P is not an integer. However, optimalpilot placements can be easily found for P = N − Q pilots ifN/Q is integer. Proving this statement requires the followingdefinition.

Definition 1 (Complementary pilot set). For any given pilotset Sp of size P contained in the universal set of SN =1, 2, . . . ,N , the complementary pilot set Sc of size N − P isdefined as the set of pilot locations not contained in Sp, thatis Sc = SN − Sp. Sp and Sc are called complementary sets.

Theorem 3 (Complementary set theorem). If C(τ) andC′(τ) are the aperiodic ACFs of the synchronization andcomplementary synchronization waveforms, respectively, then

C′(τ) = −C∗(N − τ). (13)

Proof. See Appendix C.

Corollary 3 (Equivalence of complementary sets). The ACFsidelobe characteristics of the complementary sets are identical.

Proof. The ACF sidelobe characteristics depend on theabsolute value of the off-peak values of the ACF. Therefore,the proof can be shown by taking the absolute value of boththe left and right sides of (13) and summing over τ values forany P value:

(N−1∑

τ=1

∣∣C′(τ)∣∣p)1/p

=(N−1∑

τ=1

∣∣C(N − τ)∣∣p)1/p

· (14)

This corollary shows how to construct a solution for apilot set of size N − P when a solution for a set size ofP is already available. Note that we have only shown thatthe sidelobe behavior of the ACFs of synchronization andcomplementary synchronization waveforms are identical.However, waveforms may have different energies as they arecreated with a different number of pilot tones; the energydifferences are contained in C(0) and C′(0) values.

Before concluding this section, we will now introduce atrigonometric identity that is derived from the aperiodic ACFsidelobe energy expression given in (10).


Theorem 4 (Asymptotical value of Δ). .

limN→∞

12N3

N∑

k=1

N∑

l=1, l /= kcsc2

(π(k − l)

N

)= 1

6· (15)

Proof. See Appendix D.

Equation (15) shows that the sum of sidelobe energycontributions of pilot tones converges to 1/6 as the numberof subcarriers approaches infinity. This series converges quitequickly; approximation error is less than 10−3 and 10−4 whenN > 16 and N > 40, respectively.

4.4. Sidelobe Peak of the ACF. In the previous section, itwas shown that equal spaced pilot placement meets theminimum ACF sidelobe energy requirement.

In various applications, minimization of the ACF sidelobepeak level may be required. A pilot sequence that minimizesACF sidelobe energy does not necessarily guarantee a lowsidelobe peak value. For example, equally spaced pilots,which can achieve the optimal sidelobe energy value, gen-erate secondary peaks with large amplitudes, that is, gratinglobes, in the ACF due to the periodicity of the waveform.WhenN/P is integer and P pilots are equally spaced, the ACFcontains large peaks located at the integer multiples of N/Pand zeros elsewhere. The sidelobe energy is low due to theexistence of a large number of zeros, however, the amplitudesof the secondary peaks become large. In this section, weconsider the minimization of the sidelobe peak level.

ACF sidelobe peak level expressions for periodic andaperiodic ACFs are obtained from |R(τ)| and |C(τ)|, respec-tively, and both can be shown to depend on pilot locations.Finding the optimal pilot locations that minimize ACFsidelobe peak level requires solving the following minimaxproblem:

Sp = arg minwk

maxτ /= 0

|ϕ(τ,wk)|, (16)

which is not tractable as the Tchebychev norm is notdifferentiable. This problem can be reformulated as aminimization of a differentiable Lp norm where p is takenas a sequence of 4, 8, 12, 16, 32, 64. This approach (Polya’salgorithm) avoids many local minima, but unfortunatelythere is no guarantee that the algorithm converges to a globalminimum [29].

The structure of the considered problem not only defiesan analytical solution but also prevents finding nontrivialbounds for ACF sidelobe peak. The problem of obtain-ing lower bounds for the modulus of certain classes oftrigonometrical sums has been considered in number theoryand harmonic analysis literature; see for example, [30–34].Most studies in these fields consider total or truncatedsums of harmonics that are placed adjacently and they arenot directly applicable to the considered synchronizationwaveform design problem in which the pilots are separated.

The problem of finding optimal pilot locations thatminimize ACF sidelobe peak can be considered as a non-linear integer programming problem. This is because pilotlocations are only allowed to take integer values and the

cost function, that is, the ACF sidelobe norm expression isnonlinear. Nonlinear integer programming problems can beefficiently solved by using suitable search techniques. In thefollowing section, we utilize a genetic search algorithm as aviable solution for the investigation of the ACF sidelobe peakcharacteristics of the considered synchronization waveforms.Note that similar to other approaches such as Polya’salgorithm, the genetic algorithm (GA) used in this work doesnot necessarily converge to a global solution either.

5. Search for Lower ACF Sidelobe Peaks UsingGenetic Algorithm

In this section, a brief introduction to genetic algorithmsis given and basic terminology used in the genetic searchliterature is presented. There is an extensive literature ongenetic algorithms and the interested reader is referred to[35, 36] for an in-depth discussion of the topic.

5.1. Genetic Algorithms. GAs are stochastic search methodsinspired from the principles of biological evolution observedin nature. Evolutionary algorithms operate on a populationof potential solutions by applying the principle of survivalof the fittest to produce better approximations to a solution.The solution to a problem is called a chromosome. Eachchromosome is made up of a collection of alleles whichare the parameters to be optimized. A GA creates an initialpopulation (a collection of chromosomes), evaluates it, thenevolves the population through multiple generations insearch for a good solution of a problem using the so-calledgenetic operators.

(i) Cross-over is a genetic operator that combines(mates) two chromosomes (parents) to produce newchromosomes (offspring).

(ii) Mutation is a genetic operator that alters one or moregene values in a chromosome from its initial state.

(iii) Selection is a genetic operator that chooses a chro-mosome from the current generation’s populationfor inclusion in the next generation’s mating pool.Several selection schemes can be used, such as theroulette selection rule, in which the chance of achromosome getting selected is proportional to itsfitness.

GAs have been applied to a wide variety of optimizationproblems including binary sequence search [37–39] andantenna array thinning [40], which bear some similaritieswith the pilot location selection problem considered in thispaper.

5.2. Pilot Location Search with Genetic Algorithms. A concisedescription of the genetic search algorithm used for searchingpilot tone locations is described in what follows. Furtherinformation regarding its convergence and its comparison toa random search can be found in [28].

An initial population of M parent sequences is randomlygenerated. Each parent sequence is a vector of length N ,


Parent A

A1 A2

Split point

Offspring 1

A1B2

Pilot added

Cross-over

Mutation

Pilot removed

Offspring 2

B1A2

Split point

B1 B2

Parent B

Figure 5: An illustration of cross-over and mutation operations forP = 5. Black circles show pilot locations.

and each element of a vector contains a binary zero or onedepending on the existence of a pilot tone at that location.Time domain synchronization waveforms corresponding tothe parent sequences are computed by taking the IDFTof each sequence in the population and their merits arecalculated. The GA is run to minimize sidelobe peak of theaperiodic ACF.

The two sequences having the best merits (elitesequences) are kept for the next generation and thenall sequences are crossed-over. The cross-over operationnaturally fits to the pilot location search problem as the meritof a solution depends on the pairwise distances of pilots,which is partly preserved and diversified under the cross-over operation. At this stage, care is taken to ensure that theresulting offspring sequences have P pilot tones only.

In order to prevent local minima, mutation is appliedby inverting randomly selected genes. When only one bitis flipped the number of pilot tones is changed; therefore,two random bits are flipped in order to keep the pilot tonenumbers fixed.

An illustration of the cross-over and mutation operationsis presented in Figure 5. Chromosomes from both parentsare split from a randomly chosen point and crossed-overto generate new offspring. If an offspring has more thanthe required pilot tones, then randomly chosen pilot(s)is/are removed. If the offspring has less pilots than required,pilot(s) is/are added randomly chosen locations.

The merits of all parent and offspring sequences are re-evaluated after each cycle. Each sequence competes for thenext solution pool. The two elite sequence from the previousgeneration replace the worst two solutions to increase theprobability of generating better sequences.

The cycle repeats a predetermined number of times oruntil a solution with a predefined merit is achieved.

6. Simulation Examples

In this section, genetic search examples are presented togain insights into the ACF sidelobe peak behavior. In allsimulations, a DFT size of N = 64 is used and the searchalgorithm runs to minimize the aperiodic ACF sidelobe peak.

the initial population size is determined to be 72, as theoptimal population size for problems coded as bit strings isapproximately the length of the string in bits for moderateproblem complexity [41]. Each member of the population iscrossed-over to double the initial size of 72, then the best 72are chosen for the next iteration.

Mutation is applied in each iteration only to thesequences that have the same merit [39]. Instead of runninga single long search, multiple shorter runs are employed. Ineach case considered, 50 simulations starting from a differentinitial solution pool are run for 1000 iterations.

Three cases are investigated in the simulations. In thefirst example, no constraint on pilot locations is assumed;therefore, the GA explores each DFT bin as a candidate pilotlocation. Even though in practice some OFDM subcarriersare typically reserved for various purposes, the uncon-strained case serves as a benchmark for the investigationsof the ACF sidelobe peak behavior. In the second example,practical bandwidth and DC level limitations are imposedby excluding edge and zero subcarriers from the searchspace. In the last example, we explore the relation betweenpilot phases, the ACF sidelobe peak and the PAPR ofsynchronization waveforms.

6.1. Unconstrained Pilot Locations. The genetic search algo-rithm was run to obtain subcarrier locations for pilot set sizesof 1 to 32. Subcarrier locations for pilot set sizes of 33 to64 can be obtained without running a search by using thecomplementary set theorem presented in Section 4.3.

Pilot locations extracted by the GA are shown in Figure 6.In this figure, dark circles along the vertical axis mark thelocations of pilot subcarriers for a given number of pilottones, which is shown on the horizontal axis.

Sidelobe energy values of the waveforms constructedfrom the pilot tone sets given in Figure 6 are shown inFigure 7. Also shown in this figure are the lower and uppersidelobe energy bounds, which can be calculated as describedin Section 4.3. As seen from this figure, waveforms withlow sidelobe peaks do not always have the minimal sidelobeenergy, that is, minimization of sidelobe peak does notnecessarily result in minimum sidelobe energy.

MF and PSPR values for the pilot sets given in Figure 6are plotted in Figures 8 and 9, respectively. As seen from thesefigures, both PSPR and MF increase monotonically with thenumber of pilot tones when there are no constraints on pilotlocations.

6.2. Bandwidth and DC Subcarrier Restrictions. In practicalsystems, transmitted waveforms must be bandlimited tomeet spectral masking requirements. Such waveform ban-dlimiting can be accomplished in an OFDM system by deac-tivating the subcarriers located at the edges of the spectrum.Similarly, subcarrier zero is deactivated for receivers thatcannot handle DC offsets. For the case considered in thisexample, the search algorithm runs in a constrained set,which excludes subcarriers −31 to −27, 27 to 31 and 0, asproposed in the IEEE 802.11a standard.


−32

−28

−24

−20

−16

−12

−8

−4

0

4

8

12

16

20

24

28

32

Subc

arri

erin

dex

1 4 8 12 16 20 24 28 32

Number of pilots

Figure 6: Pilot locations that minimize the ACF sidelobe peak forthe unconstrained search. Pilot locations for P > 32 can be obtaineddirectly using this figure from the complementary set theorem.For example, the configuration for P = 40 pilots is obtained byinterchanging black and white circles of the configuration for P =24 (64− 40 = 24).

In the constrained case, trivial solutions for P valuesgreater than N/2 do not exist because the complementary settheorem is not applicable due to the fact that some elementsof the complementary sets will exist in the constrainedregion, so the GA is run for P = 1, 2, . . . , 52.

Pilot locations extracted by the search algorithm areshown in Figure 10 whereas the corresponding MF andPSPR curves are plotted in Figures 8 and 9, respectively.Even though the MF of a waveform monotonically increaseswith the number of pilot tones, the PSPR value does notincrease monotonically when there are constraints on thepilot locations. For the considered example, the maximumPSPR value is achieved when 40 out of 52 available pilots areused, and a further increase in the number of pilots degradesthe PSPR.

6.3. Nonzero Pilot Phases. In the derivation of the analyticalexpressions for the aperiodic sidelobe energy in Section 4,pilot subcarriers are assumed to have zero phase to simplifyanalytical treatment. However, the sum of subcarriers withequal phases generates a waveform that has high PAPR,which is not desirable as it results in inefficient use of poweramplifiers. The PAPR can be reduced if phase rotations areintroduced on the subcarriers; however, inappropriate phasevalues may also increase the sidelobe peak of the ACF.

In order to explore the relations between pilot phases,the aperiodic ACF sidelobe peak and the PAPR of the syn-chronization waveform, PAPR and PSPR improving phases

0

0.02

0.04

0.06

0.08

0.1

0.12

Side

lobe

ener

gy

0 8 16 24 32 40 48 56 64


Maximum energy

Minimum energySimulation

Figure 7: Aperiodic ACF sidelobe energy values of the waveformswhose sidelobe peaks are minimized. Minimum and maximumenergy values are shown. (Minimum energy values for pilot num-bers for which N/P is not an integer are obtained by interpolationand plotted with dashed lines.)

−20

−15

−10

−5

0

5

10

15

20

MF

(dB

)

8 16 24 32 40 48 56 64


UnconstrainedConstrained

Figure 8: MF values of the synchronization waveforms.

are introduced to the pilots. For PAPR reduction, we haveemployed Schroeder’s phases [42]. These nonoptimal phasesare easy to implement and are known to provide significantreduction in sidelobe peaks. For PSPR improvement, we havemodified the genetic algorithm as described below to obtainproper phase values.

To generate an initial solution set, randomly drawn phasevalues quantized into 1024 levels are used to modulate pilotsubcarriers. During the cross-over, parents swap the phases


0

3

6

9

12

15

18

PSP

R(d

B)

8 16 24 32 40 48 56 64


UnconstrainedConstrained

Figure 9: PSPR values of the synchronization waveforms. Note thatthe PSPR value does not increase monotonically with the numberof pilot tones in the constrained case.

−32

−28

−24

−20

−16

−12

−8

−4

0

4

8

12

16

20

24

28

32

Subc

arri

erin

dex

1 4 8 12 16 20 24 28 32 36 40 44 48 52

Number of pilots

Figure 10: Pilot locations that minimize the ACF sidelobe peak forconstrained search.

of the pilots without changing their locations. Similarly,mutation is applied to the phase of a gene, which is modifiedwith a randomly selected value from the set of quantizedphase values.

PAPR reducing Schroeder’s phases and PSPR improvingphase values obtained from the modified GA are given inTable 1 for P = 15. Note that these values are the principalphase values normalized by π.

PAPR reducing Schroeder’s phases and PSPR improvingphase values are used to modulate pilot subcarriers. ThePSPR and PAPR of the resulting waveforms are shown inFigures 11 and 12, respectively. As seen from these figures,

0

2

4

6

8

10

12

PSP

R(d

B)

2 7 12 17 22 27 32 37 42 47 52


Zero phasePSPR improving phasePAPR reducing phase

Figure 11: PSPR comparison of waveforms generated by using zero,PSPR, and PAPR reducing phases.

0

2

4

6

8

10

12

14

16

18

PAP

R(d

B)

5 10 15 20 25 30 35 40 45 50


Zero phasePSPR improving phasePAPR reducing phase

Figure 12: PAPR comparison of waveforms generated using zero,PSPR improving, and PAPR reducing phases.

PSPR improving phase values, which are not optimal forPAPR reduction, achieves significant PAPR reduction inaddition to sidelobe peak suppression. On the other hand,even though the PAPR gain of Schroeder’s phases is slightlybetter than the PAPR gain of the PSPR phases, Schroeder’sphases degrade the PSPR significantly.

It is observed from Figure 12 that, for some P values, suchas P = 12, 15, and 19, the PAPR values of the waveformsresulting from the use of Schroeder’s phases are higher thanthe PAPR values of the waveforms resulting from using


Table 1: Schroeder’s phases (φ1) and PSPR improving phase values obtained from the modified GA (φ2) for P = 15.

k −21 −20 −18 −15 −13 −10 −5 −3 −1 6 8 12 19 20 26

φ1 1.34 0.88 1.56 0.06 1.44 1.03 0.44 1.06 0.22 1.72 0.56 1.59 0.69 0.91 1.00

φ2 1.93 0.31 1.58 0.21 0.25 1.99 1.62 1.85 1.56 1.38 1.45 1.41 0.12 0.12 1.79

the GA. We note that the Schroeder’s rule is a simple intuitiverule for phase angle adjustment based on the assumption thatthe number of harmonic components is large. Therefore, it isnot implausible to observe the behavior shown in Figure 12,especially, when the number of subcarriers is small comparedto the total number of subcarriers. Its simplicity paired withthe fact that the Schroeder’s rule may produce substantiallylower peak values even when the assumption does not holdare the main motivations to use Schroeder’s phases in thePAPR comparison.

7. Conclusions

In this paper, synchronization waveforms composed of asum of orthogonal complex exponentials are considered fortiming synchronization of OFDM systems. Sidelobe energyexpressions for periodic and aperiodic ACF are derived. It isshown that the periodic ACF sidelobe energy is independentof the locations and phases of the subcarriers whereas theaperiodic ACF sidelobe energy depends on the pilot loca-tions; therefore, optimal waveform design requires judiciousselection of the pilot locations. Pilot configurations thatwould result in maximum and minimum sidelobe energylevel for a given number of pilot tones are presented. Someproperties of the ACF are introduced to use in waveformdesign process.

Finding pilot locations that minimize ACF sidelobe peakis not trivial; therefore, we resort to a search algorithm.Simulation results show that increasing the number of pilottones does not necessarily improve sidelobe peak behavior ofthe ACF when the waveforms are spectrally constrained.

The aperiodic ACF sidelobe peak can be further reducedby proper selection of the pilot phases. When subcarrierphases are selected to further minimize the ACF sidelobepeak, the resulting waveform has a significantly reducedPAPR due to unequal pilot phases.

We have obtained the ACF sidelobe energy expressionsanalytically and provided pilot placement requirements forminimum and maximum aperiodic ACF sidelobe energy lev-els. We also considered sidelobe peak level and we employeda search algorithm for the minimization of sidelobe peaklevel due to the intractability of the problem. Thus, obtaininguseful bounds for the aperiodic ACF sidelobe peak levelremains as an open problem.

In this paper, we assumed that pilot tones are modulatedby zero phase symbols in the derivation of the aperiodic ACFsidelobe energy. An analytical investigation of the impact ofpilot phases on the aperiodic ACF sidelobe energy is a subjectfor future work.

Appendices

A. Proof of the Periodic ACF Sidelobe EnergyTheorem (Derivation of E)

By substituting (7) into (1) we obtain

R(τ) = 1N2

N−1∑

n=0

( P∑

k=1

e jwkn)( P∑

l=1

e− jwl(n+τ)

)

= 1N2

N−1∑

n=0

P∑

k=1

P∑

l=1

e− j(wl−wk)ne− jwlτ .

(A.1)

The sum terms can be split into two by grouping termsfor k = l and k /= l as shown in what follows:

R(τ)= 1N2

⎡⎢⎢⎢⎢⎣

N−1∑

n=0

P∑

l=1

e− jwlτ+P∑

k=1

P∑

l=1, l /= ke− jwlτ

N−1∑

n=0

e− j(wl−wk)n

︸︷︷︸zero

⎤⎥⎥⎥⎥⎦.

(A.2)

The under braced term is equal to zero as the sum is carriedover a full period of the complex exponential. Therefore, wehave the following equation:

R(τ) = 1N2

NP∑

l=1

e− jwlτ = 1N

P∑

l=1

e− jwlτ . (A.3)

If we substitute R(τ) in |R(τ)|2 = R(τ)R∗(τ), we obtain

|R(τ)|2 = 1N2

P∑

k=1

P∑

l=1

e− j(wk−wl)τ

= 1N2

[P +

P∑

k=1

P∑

l=1, l /= ke− j(wk−wl)τ

].

(A.4)

Using the Euler formula, the complex exponentials on theright can be written as sums of cosine and sine terms yielding

|R(τ)|2 = 1N2

[P +

P∑

k=1

P∑

l=1, l /= kcos

(wk −wl

)τ

], (A.5)

where we have used sin(−x) = − sin(x).The total sidelobe energy of the periodic ACF, E, is given

by the sum of energies of the off-peak values:

E =N−1∑

τ=1

|R(τ)|2· (A.6)


By substituting (A.5) into (A.6), we obtain

E = 1N2

N−1∑

τ=1

[P +

P∑

k=1

P∑

l=1, l /= kcos

(wk −wl

)τ

]

= 1N2

[P(N − 1) +

P∑

k=1

P∑

l=1, l /= k

N−1∑

τ=1

cos(wk −wl

)τ

]

= 1N2

[P(N − 1)− P(P − 1)] = NP − P2

N2,

(A.7)

where we have used the fact that∑N−1

τ=1 cos(wk −wl)τ = −1.

B. Proof of the Aperiodic ACF Sidelobe EnergyTheorem (Derivation of E)

By substituting (7) into (2) we obtain

C(τ) = 1N2

N−τ−1∑

n=0

P∑

k=1

P∑

l=1

e− j(wl−wk)ne− jwlτ

= 1N2

[(N − τ)

P∑

l=1

e− jwlτ

+P∑

k=1

P∑

l=1, l /= ke− jwlτ

N−τ−1∑

n=0

e− j(wl−wk)n

].

(B.1)

Using geometric series expansion:

N−τ−1∑

n=0

e− j(wl−wk)n = 1− e j(wl−wk)τ

1− e− j(wl−wk) , (B.2)

we obtain

C(τ)=(

1− τ

N

)R(τ)

− 1N2

P∑

k=1

P∑

l=1,l /= ke− jwlτe j((wl−wk)/2)(τ+1) sin((wl−wk)/2)τ

sin((wl−wk)/2).

(B.3)

Applying standard trigonometric sum formulae we obtain

C(τ) =(

1− τ

N

)R(τ)− 1

N2X(τ), (B.4)

where

X(τ) = jP∑

k=1

P∑

l=1, l /= ke− jwlτ cot

(wl −wk

2

). (B.5)

The sidelobe energy of the aperiodic ACF is given by

E =N−1∑

τ=1

∣∣C(τ)∣∣2 =

N−1∑

τ=1

C(τ)C∗(τ)

=N−1∑

τ=1

∣∣∣∣(

1− τ

N

)R(τ)

∣∣∣∣2

+ φ1 + φ2,

(B.6)

where

φ1 = − 2N2

N−1∑

τ=1

(1− τ

N

)R{R∗(τ)X(τ)

},

φ2 = 1N4

N−1∑

τ=1

∣∣X(τ)∣∣2.

(B.7)

After tedious but rather straightforward calculations, one canshow that φ1 + φ2 = 0 by substituting R(τ) and X(τ) in (B.7)and by using the following identities [43, pages 35–37]:

N−1∑

τ=1

sin(τx) = sin(N

2x)

sin(((N − 1)/2)x)sin(x/2)

, (B.8)

N−1∑

τ=1

τ sin(τx) = sin(Nx)

4 sin2(x/2)− N cos(((2N − 1)/2)x)

2 sin(x/2).

(B.9)

Therefore, we get the following equation:

E =N−1∑

τ=1

∣∣∣∣(

1− τ

N

)R(τ)

∣∣∣∣2

=N−1∑

τ=1

∣∣R(τ)∣∣2 − 2

N

N−1∑

τ=1

τ∣∣R(τ)

∣∣2+

1N2

N−1∑

τ=1

τ2∣∣R(τ)

∣∣2.

(B.10)

The second term on the right-hand side can be shown tocancel the first term by using the following trigonometricidentity [43, page 37]:

N−1∑

τ=1

τ cos(τx) = N sin(((2N − 1)/2)x)2 sin(x/2)

− 1− cos(Nx)

4 sin2(x/2).

(B.11)

The sidelobe energy of the aperiodic ACF then reduces to

E = 1N2

N−1∑

τ=1

τ2∣∣R(τ)

∣∣2· (B.12)

This expression is calculated by substituting |R(τ)|2 givenin (A.5) into (B.12). This calculation requires the sum∑N−1

τ=1 τ2 cos(τx), which can be obtained by differentiating

(B.9) with respect to x.After straightforward calculations we obtain the ape-

riodic energy sidelobe energy expression as given in thefollowing:

E = P

3N− P2

2N2+

P

6N3+

12N3

P∑

k=1

P∑

l=1, l /= kcsc2

(wk −wl

2

)·

(B.13)

C. Proof of the Complementary Set Theorem

If s(n) is the synchronization waveform composed using Sp,then the complementary synchronization waveform s′(n) isgiven by

s′(n) = 1N

N−P∑

l=1

e jwln. (C.1)


Since s(n) + s′(n) = δ(n), the aperiodic ACF of thecomplementary synchronization waveform can be written as

C′(τ) =N−τ−1∑

n=0

s′(n)[s′(n + τ)

]∗

=N−τ−1∑

n=0

[δ(n)− s(n)

][δ(n + τ)− s(n + τ)

]∗.

(C.2)

This reduces to

C′(τ) = C(τ)− s∗(τ) (C.3)

for τ /= 0. From (A.3) and (7), R(τ) = s∗(τ). If we substituteR(τ) in (C.3) and use the well-known relation between theperiodic and aperiodic ACF:

R(τ) = C(τ) + C∗(N − τ), (C.4)

we obtain

C′(τ) = −C∗(N − τ). (C.5)

D. Asymptotical Value of Δ

The ACF of the synchronization waveform becomes animpulse when all subcarriers, with unit amplitude and zerophase, are used as pilots, therefore the sidelobe energybecomes zero for P = N . Using this observation into (10),we obtain

E = 0 =⇒ Δ = 12− 1

3− 1

6N2. (D.1)

Asymptotical value of Δ is then obtained in the limit casewhen the number of subcarriers approaches to infinity:

limN→∞

Δ = limN→∞

16− 1

6N2. (D.2)

Therefore, we get the following equation:

limN→∞

12N3

N∑

k=1

N∑

l=1, l /= kcsc2

(π(k − l)

N

)= 1

6. (D.3)

Acknowledgments

This work has been supported by Defence R&D Canada.Selcuk Tascıoglu was a visiting fellow at the CommunicationsResearch Centre Canada with a Grant from the InternationalResearch Fellowship Programme of the Scientific and Tech-nological Research Council of Turkey (TUBITAK).

References

[1] Y. Mostofi and D. C. Cox, “Mathematical analysis of theimpact of timing synchronization errors on the performanceof an OFDM system,” IEEE Transactions on Communications,vol. 54, no. 2, pp. 226–230, 2006.

[2] J. J. van de Beek, M. Sandell, and P. O. Borjesson, “MLestimation of time and frequency offset in OFDM systems,”IEEE Transactions on Signal Processing, vol. 45, no. 7, pp. 1800–1805, 1997.

[3] H. Bolcskei, “Blind estimation of symbol timing and carrierfrequency offset in wireless OFDM systems,” IEEE Transactionson Communications, vol. 49, no. 6, pp. 988–999, 2001.

[4] H. H. Nguyen, J. E. Salt, and Z. Zhou, “Coarse timingrecovery in burst mode OFDM,” in Proceedings of the 57thIEEE Vehicular Technology Conference (VTC ’03), vol. 1, pp.646–650, Jeju, South Korea, April 2003.

[5] O. Ureten and N. Serinken, “Improved coarse timing forburst mode OFDM,” in Proceedings of the IEEE GlobalTelecommunications Conference (GLOBECOM ’07), pp. 2841–2846, Washington, DC, USA, November 2007.

[6] P. Z. Fan and M. Darnell, Sequence Design for CommunicationsApplications, John Wiley & Sons, New York, NY, USA, 1996.

[7] D. C. Chu, “Polyphase codes with good periodic correlationproperties,” IEEE Transactions on Information Theory, vol. 18,no. 4, pp. 531–532, 1972.

[8] S. W. Golomb and R. A. Scholtz, “Generalized Barkersequences,” IEEE Transactions on Information Theory, vol. 11,no. 4, pp. 533–537, 1965.

[9] P. Borwein and R. Ferguson, “Polyphase sequences with lowautocorrelation,” IEEE Transactions on Information Theory,vol. 51, no. 4, pp. 1564–1567, 2005.

[10] G. Bumiller and L. Lampe, “Fast burst synchronization forpower line communication systems,” EURASIP Journal onAdvances in Signal Processing, vol. 2007, Article ID 12145, 15pages, 2007.

[11] M. Hirano and G. J. Veintimilla, “Non-uniformly spaced tonesfor synchronization waveform,” US Patent no. 5896425, 1999.

[12] W. D. Warner and C. Leung, “OFDM/FM frame synchroniza-tion for mobile radio data communication,” IEEE Transactionson Vehicular Technology, vol. 42, no. 3, pp. 302–313, 1993.

[13] J. Acharya, H. Viswanathan, and S. Venkatesan, “Timingacquisition for non contiguous OFDM based dynamic spec-trum access,” in Proceedings of the 3rd IEEE InternationalSymposium on New Frontiers in Dynamic Spectrum AccessNetworks (DySPAN ’08), pp. 1–10, Chicago, Ill, USA, October2008.

[14] K. E. Nolan, T. W. Rondeau, and L. E. Doyle, “Tests andtrials of software-defined and cognitive radio in Ireland,” inProceedings of Software Defined Radio Technical Conference andProduct Exposition (SDR ’07), Denver, Colo, USA, November2007.

[15] S. Coleri, M. Ergen, A. Puri, and A. Bahai, “Channelestimation techniques based on pilot arrangement in OFDMsystems,” IEEE Transactions on Broadcasting, vol. 48, no. 3, pp.223–229, 2002.

[16] S. Hosokawa, S. Ohno, K. Teo, and T. Hinamoto, “Pilot tonedesign for peak-to-average power ratio reduction in OFDM,”in Proceedings of the IEEE International Symposium on Circuitsand Systems (ISCAS ’05), pp. 6014–6017, Kobe, Japan, May2005.

[17] A. F. Kurpiers and V. Fischer, “Open-source implementationof a Digital Radio Mondiale (DRM) receiver,” in Proceedingsof the 9th International Conference on HF Radio Systems andTechniques, pp. 86–90, Bath, UK, June 2003.

[18] S. Brandes, I. Cosovic, and M. Schnell, “Sidelobe suppressionin OFDM systems by insertion of cancellation carriers,” inProceedings of the 62nd IEEE Vehicular Technology Conference(VTC ’05), vol. 1, pp. 152–156, Dallas, Tex, USA, September2005.


[19] M. Dong and L. Tong, “Optimal design and placement of pilotsymbols for channel estimation,” IEEE Transactions on SignalProcessing, vol. 50, no. 12, pp. 3055–3069, 2002.

[20] S. Song and A. C. Singer, “Pilot-aided OFDM channel estima-tion in the presence of the guard band,” IEEE Transactions onCommunications, vol. 55, no. 8, pp. 1459–1465, 2007.

[21] IEEE Std. 802.11a-1999, “Part 11: Wireless LAN MediumAccess Control (MAC) and Physical Layer (PHY) Specifica-tions: High-speed Physical Layer in the 5 GHz Band,” 1999.

[22] European Telecommunications Standards Institute (ETSI),“ES 201 980, Digital Radio Mondiale (DRM),” 2004.

[23] H. Minn, M. Zeng, and V. K. Bhargava, “On timing offsetestimation for OFDM systems,” IEEE Communications Letters,vol. 4, no. 7, pp. 242–244, 2000.

[24] B. Park, H. Cheon, C. Kang, and D. Hong, “A novel timing esti-mation method for OFDM systems,” IEEE CommunicationsLetters, vol. 7, no. 5, pp. 239–241, 2003.

[25] K. Shi and E. Serpedin, “Coarse frame and carrier synchro-nization of OFDM systems: a new metric and comparison,”IEEE Transactions on Wireless Communications, vol. 3, no. 4,pp. 1271–1284, 2004.

[26] A. Fort, J.-W. Weijers, V. Derudder, W. Eberle, and A.Bourdoux, “A performance and complexity comparison ofautocorrelation and cross-correlation for OFDM burst syn-chronization,” in Proceedings of IEEE International Conferenceon Acoustics, Speech, and Signal Processing (ICASSP ’00), pp.341–344, Istanbul, Turkey, June 2000.

[27] H. Cox and H. Lai, “Geometric comb waveforms for rever-beration suppression,” in Proceedings of the 28th AsilomarConference on Signals, Systems and Computers (ACSSC ’94),vol. 2, pp. 1185–1189, Pacific Grove, Calif, USA, October-November 1994.

[28] O. Ureten, S. Tascıoglu, N. Serinken, and M. Yılmaz, “Searchfor OFDM synchronization waveforms with good aperiodicautocorrelations,” in Proceedings of IEEE Canadian Conferenceon Electrical and Computer Engineering (CCECE ’04), vol. 1,pp. 13–18, Niagara Falls, Canada, May 2004.

[29] P. Guillaume, J. Schoukens, R. Pintelon, and I. Kollar, “Crest-factor minimization using nonlinear Chebyshev approxima-tion methods,” IEEE Transactions on Instrumentation andMeasurement, vol. 40, no. 6, pp. 982–989, 1991.

[30] L. Carlitz and S. Uchiyama, “Bounds for exponential sums,”Duke Mathematical Journal, vol. 24, no. 1, pp. 37–41, 1957.

[31] D. R. Anderson and J. J. Stiffler, “Lower bounds for themaximum moduli of certain classes of trigonometric sums,”Duke Mathematical Journal, vol. 30, no. 1, pp. 171–176, 1963.

[32] S. V. Konyagin and M. A. Skopina, “Comparison of the > L1-norms of total and truncated exponential sums,” MathematicalNotes, vol. 69, no. 5-6, pp. 644–651, 2001.

[33] P. Oswald, “Extremal properties of trigonometric polynomialswith applications to signal design,” in Proceedings of Interna-tional Conference on Trends in Approximation Theory, pp. 343–352, Nashville, Tenn, USA, October 2001.

[34] E. Belinsky, “Some extremal problems for trigonometric poly-nomials,” Journal of Mathematical Analysis and Applications,vol. 286, no. 2, pp. 675–681, 2003.

[35] J. H. Holland, Adaptation in Natural and Artificial Systems,MIT Press, Cambridge, Mass, USA, 1992.

[36] D. E. Goldberg, Genetic Algorithms in Search, Optimization andMachine Learning, Addison-Wesley Longman, Boston, Mass,USA, 1989.

[37] B. Militzer, M. Zamparelli, and D. Beule, “Evolutionary searchfor low autocorrelated binary sequences,” IEEE Transactions onEvolutionary Computation, vol. 2, no. 1, pp. 34–39, 1998.

[38] X. Deng and P. Fan, “New binary sequences with good ape-riodic autocorrelations obtained by evolutionary algorithm,”IEEE Communications Letters, vol. 3, no. 10, pp. 288–290,1999.

[39] S. E. Kocabas and A. Atalar, “Binary sequences with lowaperiodic autocorrelation for synchronization purposes,” IEEECommunications Letters, vol. 7, no. 1, pp. 36–38, 2003.

[40] R. L. Haupt, “Thinned arrays using genetic algorithms,” IEEETransactions on Antennas and Propagation, vol. 42, no. 7, pp.993–999, 1994.

[41] J. T. Alander, “On optimal population size of genetic algo-rithms,” in Proceedings of Computer Systems and SoftwareEngineering (CompEuro ’92), pp. 65–70, The Hague, TheNetherlands, May 1992.

[42] M. Schroeder, “Synthesis of low-peak-factor signals andbinary sequences with low autocorrelation,” IEEE Transactionson Information Theory, vol. 16, no. 1, pp. 85–89, 1970.

[43] I. S. Gradshtein and I. M. Ryzhik, Table of Integrals, Series, andProducts, Academic Press, San Diego, Calif, USA, 2000.


Research Article

Time and Frequency Synchronisation in 4G OFDM Systems

Adrian Langowski

Chair of Wireless Communications, Poznan University of Technology, Polanka 3A, 61-131 Poznan, Poland

Correspondence should be addressed to Adrian Langowski, [email protected]

Received 30 June 2008; Revised 28 October 2008; Accepted 20 December 2008


This paper presents a complete synchronisation scheme of a baseband OFDM receiver for the currently designed 4G mobilecommunication system. Since the OFDM transmission is vulnerable to time and frequency offsets, accurate estimation of theseparameters is one of the most important tasks of the OFDM receiver. In this paper, the design of a single OFDM synchronisationpilot symbol is introduced. The pilot is used for coarse timing offset and fractional frequency offset estimation. However, it canbe applied for fine timing synchronisation and integer frequency offset estimation algorithms as well. A new timing metric thatimproves the performance of the coarse timing synchronisation is presented. Time domain synchronisation is completed afterreceiving this single OFDM pilot symbol. During the tracking phase, carrier frequency and sampling frequency offsets are trackedand corrected by means of the nondata-aided algorithm developed by the author. The proposed concept was tested by means ofcomputer simulations, where the OFDM signal was transmitted over a multipath Rayleigh fading channel characterised by theWINNER channel models with Doppler shift and additive white Gaussian noise.

Copyright © 2009 Adrian Langowski. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

Due to its many advantages, orthogonal frequency divisionmultiplexing (OFDM) was adopted for the European stan-dards of terrestrial stationary and handheld video broadcast-ing systems (DVB-T, DVB-H) as well as wireless networkstandards 802.11 and 802.16. It was also chosen as one ofthe transmission techniques for 3GPP Long-Term Evolutionsystem and WINNER Radio Interface Concept [1], whichhas recently been proposed for 4G systems. However, theOFDM transmission is sensitive to receiver synchronisationimperfections. The symbol timing synchronisation errormay cause interblock interference (IBI) and the frequencysynchronisation error is one of the sources of intercarrierinterference (ICI). Thus, synchronisation is a crucial issuein an OFDM receiver design. It depends on the form ofthe OFDM transmission (whether it is continuous or has abursty nature). In case of the WINNER MAC superframestructure shown in Figure 1 [2], synchronisation algorithmsspecific for packet or bursty transmission have to be applied.

Synchronisation is not fully obtained after the acquisitionmode since the sampling frequency offset still remainsuncompensated. The inaccuracy of the sampling clock

frequency causes slow drift of the FFT window giving riseto ICI and subcarrier phase rotation. Both signal distortions,but not their sources, may be removed by a frequency-domain channel equaliser. However, the time shift of the FFTwindow builds up, and eventually the FFT window shiftsbeyond the orthogonality window of the OFDM symbolgiving rise to IBI. Therefore, the sampling clock synchroni-sation, performed by a resampling algorithm, should also beimplemented in the OFDM receiver.

A number of time and frequency synchronisation algo-rithms in the OFDM-based systems have already beenproposed. The less complex but less accurate algorithms arebased on the correlation of identical parts of the OFDMsymbol. The correlation between the cyclic prefix and thecorresponding end of the OFDM symbol, or between twoidentical halves of the synchronisation symbol, is appliedin [3, 4], respectively. The use of pseudonoise sequencecorrelation properties was proposed in [5, 6]. Both solutionsoffer very accurate time and frequency offset estimates;however, the main disadvantage of both of them is theircomplexity.

The sampling frequency offset estimation has beeninvestigated in many papers too. Since sampling period


offset causes subcarrier phase rotation, some algorithms,like those introduced in [7, 8], estimate the phase changebetween the subcarriers of the OFDM symbol or betweenthe same subcarriers of succeeding OFDM symbols (see themethod described in [9]). A noncoherent solution, that is,without carrier phase estimates, was proposed in [10]. Thedrawback of that algorithm is its sensitivity to symbol timingsynchronisation errors. Like the schemes shown in [7, 8],it requires pilot tones transmitted in every OFDM symbol,as it is done in the DVB-T system. Thus, such algorithmsare not suitable for systems with pilot tones separated intime by data symbols, as it can be found in the WINNERsystem. The algorithm described in [9] is driven by data harddecisions made by the receiver, and it estimates and tracks theresidual carrier frequency offset as well. That solution will becompared with the proposed algorithm in Section 7.2.

In this paper, fast and accurate timing and frequencysynchronisation algorithms are proposed. The synchronisa-tion is a two-stage process. First, coarse timing and frac-tional frequency offset synchronisation are performed. Afterdetecting the transmitted signal, the carrier frequency andsampling frequency offsets are tracked during the trackingmode by a low-complex algorithm, which is immune tosymbol timing offset estimation errors. The algorithm isdesigned for OFDM systems with a small pilot overhead, andit applies channel estimates already computed by the channelestimation block.

The paper is organised as follows. In Section 2, the systemmodel is introduced. Section 3 contains the descriptionof the acquisition mode algorithms. In Section 4, timingsynchronisation errors are briefly characterised. Sections5 and 6 contain the description of the decision-directedalgorithm and the newly proposed algorithm in whichchannel transfer function estimates are used. Computersimulation results are presented and discussed in Section 7,and finally, the paper is concluded in Section 8.

2. System Model

The system of interest uses OFDM symbols with KU < Nsubcarriers for the data transmission. The remaining N −KUsubcarriers serve as a guard band. The time domain samplesare computed using the well-known IFFT formula

xk(n) = 1√N

KU−1∑

m=0

Xk(m)e jωNmn, (1)

where k is the index of the OFDM symbol, Xk(m) is thefrequency domain mth modulated symbol, ωN = 2π/N , andN is the total number of subcarriers.

Let us assume that the OFDM signal model developedwithin the WINNER project [1]. The OFDM symbol consistsof N = 2048 subcarriers out of which KU = 1664 are usedfor transmission of user data and pilots. The user data aretransmitted in packets called chunks. Every chunk consistsof 8 subcarriers and lasts for 12 OFDM symbols. Withineach chunk, there are 4 pilot tones spaced by Dt = 10OFDM symbols and by Df = 4 subcarriers [11]. Theirpattern is shown in Figure 2. Generated OFDM symbols are

Upl

ink

syn

ch

RA

C(U

L)

Dow

nlin

ksy

nch

BC

H,s

upe

rfra

me

con

trol

(dow

nlin

k)

Fram

e

· · ·

Fram

e

Freq

uen

cy

Time

Figure 1: WINNER MAC superframe structure.

Dt

D f8su

bcar

rier

s

12 OFDM symbols

Figure 2: Pilot tones pattern within the chunk.

grouped into packets and transmitted over a Rayleigh fadingmultipath channel for which the impulse response is

h(τ, t) =L−1∑

l=0

hl(t)δ(τ − τl

), (2)

where hl(t) is the complex channel coefficient of the lth path,τl is the delay of the lth path, and L is the number of channelpaths.

3. Data-Aided Correlation Scheme

3.1. Coarse Timing Synchronisation. Downlink timing syn-chronisation should be performed during the DownlinkSynch slot of the WINNER MAC superframe [2]. The firstOFDM symbol of the Downlink Synch is called the T-Pilot and is dedicated to the synchronisation process. Twosynchronisation symbol designs have been considered as pos-sible T-Pilots. Their time-domain structures are illustratedin Figure 3. The first one is used together with the originalSchmidl and Cox algorithm [4], and the latter one is usedwith a modified version of the Schmidl and Cox algorithmproposed by the author. In order to generate OFDM symbolsconsisting of 2 and 8 identical elements, BPSK representationof the Gold sequence is transmitted on every second andeighth subcarrier of the OFDM symbol, respectively. If the


CP A A

a

b

CP c(0)B c(1)B c(2)B c(3)B c(4)B c(5)B c(6)B c(7)B

Figure 3: Time-domain structures of the considered synchronisa-tion symbols.

Schmidl and Cox algorithm is applied together with thesecond candidate synchronisation symbol, the time metricplateau occurs after the first subsymbol. The problem issolved by multiplying the already generated time-domainOFDM symbol by the sign coefficients c(i) (i = 0, . . . , 7) thatare defined as

c = [c(0), c(1), c(2), c(3), c(4), c(5), c(6), c(7)]

= [−1, 1, 1, 1, 1, 1, 1, 1].(3)

In order to perform the coarse timing synchronisation,both subsymbols of the first candidate preamble and the firstfour subsymbols of the latter candidate preamble are used.The remaining subsymbols of the second candidate preambleare used for fractional frequency offset estimation. In orderto obtain the best of the 8-element candidate preamble, thenew time metric is defined as

P(n) =|(1/3)

∑2i=0c(i)

∑L−1l=0 ylate(n, l, i)y�early(n, l, i)|2

(∑L−1

l=0 |y(n− l − L)|2)2 , (4)

where ylate(n, l, i) = y(n−l−(3−i)L), yearly(n, l, i) = y(n−l−(2− i)L), and L = N/8. In the above formula, the numeratoris an averaged value of three cross-correlation samplescomputed between four consecutive sample blocks of lengthL each. Thus, the quality of the time metric is improved dueto noise averaging. Time metric (4) is compared with anappropriately selected detection threshold Γ, and the middleof the OFDM symbol, that is, the maximum value of thetime metric, is found among all time metrics greater than thedetection threshold. Thus, the beginning of the next OFDMsymbol is estimated with the following formula:

θ = arg maxn

(P(n)

)+N

2, for P(n) > Γ. (5)

Detection of the maximum value of (4) ends the coarsetiming synchronisation stage. However, fractional frequencyestimation needs yet to be performed.

3.2. Fractional Frequency Estimation. The process of fre-quency synchronisation consists of two stages: frequencyoffset estimation and correction. Having a preamble of theform shown in Figure 3 at the beginning of each superframe,we are able to estimate the frequency offset using the sameprocedure as in timing offset estimation. This time, the

argument of the correlation between two subsequent pilotsymbols determines the frequency offset, that is,

γ(n) =n+L−1∑

i=ny(i− L)y�(i),

δ f = 12πL

arg(γ(θ))

,

(6)

where θ is the estimated symbol timing. Such an algorithm isable to estimate only a fractional part of the frequency offset,whereas its integer part lΔ f , in terms of the multiples ofthe currently used subcarrier distance Δ f , must be estimatedin another way. The distance between the used subcarriersin the pilot subsymbols A is equal to 8Δ f (assuming everysubcarrier of every pilot symbol is used), so ±4Δ f is themaximum frequency offset which can be estimated. It can beobserved that there are a number of available frequency offsetestimates due to repetitive nature of the synchronisationsymbol. The correct estimates are computed within thewindow W starting from the end of the third subsymbol Aand ending at the end of the last subsymbol. This impliesthat the frequency offset estimation quality can be improvedby averaging the estimates computed during the window W ,that is,

δ fW = 1W2πL

W+NG∑

i=NG/2

arg(γ(θ − i)), for W = 1, . . . , 4L,

(7)

where NG is the cyclic prefix length. The use of the offsetequal to NG/2 in averaging aims to compensate the influenceof the symbol timing estimation error on the computedfrequency offset.

4. Postacquisition Synchronisation Errors

Assuming that the timing synchronisation was successfulenough to find the OFDM symbol start within the IBI-free region, two kinds of frequency offsets remain after theacquisition mode, that is, sampling period offset (SPO) andresidual carrier frequency offset (CFO). Denote ε = (

T′s −Ts)/Ts as the normalised SPO and δ fN = δ f /Δ f as thenormalised frequency offset, where T′s , Ts, δ f , and Δ fare real sampling period, the ideal sampling period, carrierfrequency offset, and subcarrier distance, respectively. Thedata symbol received on themth subcarrier of the kth OFDMsymbol is described by [9, 12, 13]

Yk(m) = α(θ(m)

)Xk(m)Hk(m)e jπθ(m)(N−1)/N

× e j2πθ(m)(NG+kM)/N + ICIk(m) +Nk(m),(8)

where θ(m) = δ fN (1 + ε) + mε ≈ δ fN + mε, M = N + NG,α(θ(m)) is an attenuation caused by both offsets, and Nk(m)is the Gaussian noise sample.

The sampling period offset affects the OFDM signalin two ways. First, it rotates data symbols. Second, sinceaccumulated sampling period offset is not constant during


the OFDM symbol but increases from sample to sample,it disturbs the orthogonality of the subcarriers giving riseto intercarrier interference. However, for small offsets thesecond phenomenon and the attenuation are negligible, andthey will not be considered in this work.

5. Decision-Directed Algorithm

Decision-directed (DD) estimation of the sampling periodoffset and carrier frequency offset was proposed in [9] and ispresented here as a reference to our method. First, the phase-difference-dependent signal λDD

k (m) for each subcarrier iscomputed

λDDk (m) = Yk(m)Y�k−1(m)

Dk(m)D�k−1(m), (9)

where Dk(m) is the hard data decision, and (·)� denotes thecomplex conjugate. The arguments of the above signals arethen used for CFO and SPO estimation:

δ fNk =ρ

2πϕk,1 + ϕk,2

2,

εk =ρ

2πϕk,2 − ϕk,1

(KU/2) + 1,

(10)

where

ϕk,1 = arg

(∑

i∈C1

λDDk (i)

), ϕk,2 = arg

(∑

i∈C2

λDDk (i)

), (11)

and C1 = 〈−KU/2,−1〉 and C2 = 〈1,KU/2〉 are the sets ofindices of the first and the second half of the OFDM signalband, respectively, and ρ = N/M. The one-shot estimates arefiltered using the first-order tracking loop filter:

δ fNk = δ fNk−1 + γ f δ fNk ,

εk = εk−1 + γεεk,(12)

where γ f and γe are CFO and SPO loop filters coefficients,respectively. The sampling period offset estimate controlsthe interpolator/decimator block that corrects the offset. Thecarrier frequency offset is used for correcting the phase of thetime samples of the received OFDM signal. The drawback ofthis algorithm is that the CFO estimate does not take intoconsideration the influence of SPO that can be significantduring the initialisation of the algorithm.

6. Proposed Algorithm

6.1. CFO and SPO Estimation. The phase rotation of thesubcarrier is easily detectable by the channel estimator andis estimated jointly with the channel transfer function. Thus,the generalised CTF takes the form

H′k(m) = Hk(m)e jπθ(m)(N−1)/Ne j2πθ(m)(NG+kM)/N . (13)

The author proposes to apply the knowledge obtained bythe channel estimator for sampling period offset correction.

The phase-difference-dependent variable λk(m) is defined asfollows:

λk(m) = H′k(m)H′�

k−1(m), (14)

where Hk(m) is the CTF estimate of the mth channel. Insteadof using an interpolator/decimator block, the proposedscheme corrects the subcarrier phases. This implies that theintercarrier interference remains unchanged, however, thereceiver is simpler and cheaper. Another consequence of thissolution is that the FFT window drift during one OFDMsymbol is estimated instead of the exact sampling periodoffset. After substituting (13) into (14) and modifying theintermediate result, the phase-difference-dependent λk(m),assuming Hk+1(m) ≈ Hk(m), is defined as

λk(m) = ∣∣H′k(m)

∣∣2e j2π(δ fN+εm)/ρ. (15)

Then, the one-shot sampling frequency offset estimate isgiven by

εM,k = N

2πϕε,k

(KU/2) + 1, (16)

where

ϕε,k = arg

(∑

i∈C1

λk

(i +

KU2

+ 1)λ�k (i)

)

≈ 2πρε(KU2

+ 1)

,

(17)

and C1 is the set of indices of the pilot subcarriers in thefirst half of the OFDM signal band. The approximation in(17) becomes exact if the channel transfer function estimatesH′k(m) (m = 1, . . . ,N) are ideal and there is no additive

noise. The algorithm computes the FFT window offsetcaused by the sampling period error accumulated duringone OFDM symbol instead of estimating the exact samplingperiod error itself. In order to estimate the carrier frequencyoffset, the phase ϕf ,k is computed first:

ϕf ,k = arg

(∑

i∈C1

λk

(i +

KU2

+ 1)λk(i)

)

≈ 2πρ

2δ fN +2πρε(KU2

+ 1)

+Nk,

(18)

where

Nk = arg∑

i∈I1e j(2π/ρ)2εi

∣∣∣∣H′k

(i +

KU2

+ 1)∣∣∣∣

2∣∣H′k(i)∣∣2

(19)

can be interpreted as a phase noise caused by the samplingfrequency offset. It can be seen that the second componentin (18) is equal to the phase given by (17) and in this case isundesired. Thus, the one-shot CFO estimate is given by

δ fN ,k =ρ

2π

ϕf ,k − ϕε,k

2. (20)


101

102

103

MSE

4 6 8 10 12 14 16 18 20 22 24

SNR (dB)

Schmidl & Cox, A1Proposed, A1Schmidl & Cox, B1

Proposed, B1Schmidl & Cox, C2Proposed, C2

Figure 4: Timing synchronisation MSE of Schmidl and Coxalgorithm and the proposed algorithm for A1, B1, and C2 channels.

6.2. DPLL. Both sampling frequency offset estimate εM,k and

carrier frequency offset estimate δ fN ,k are fed to two second-order digital phase-locked loop (DPLL) filters whose blockdiagram is presented in Figure 5. Coefficients μ1 and μ2 arethe proportional and integral coefficients, respectively. Thetransfer function of the DPLL is [14]

H(z) = μ2(z − 1) + μ1(z − 1)2 + μ2(z − 1) + μ1

= 2ζωn(z − 1) + ω2n(

z − 1)2 + 2ζωn(z − 1) + ω2n

,

(21)

where μ2 = 2ζωnTs, μ1 = μ22/4ζ

2, ωn = 2π fn, Ts is thesampling period, ζ is the damping factor, and fn is the naturalfrequency of the loop. In order to guarantee the stability ofthe loop, the damping factor ζ and the natural frequency fnmust satisfy the following relationship [15]:

ζ > 1,

0 < ωn < 2,

ζωn <(ω2n

4

)+ 1,

orζ ≤ 1,

0 < ωn < 2ζ.(22)

From the sampling frequency offset loop output εM,k

the integer εint and fractional part εfra of the accumulatedsampling period error are extracted. The integer part is usedfor correcting the FFT window while the fractional part isused for correcting the subcarriers phase.

6.3. Channel Estimation. As we know, in the proposed CFOand SPO estimation algorithms, estimation of the channeltransfer function is needed. The channel transfer functionestimate may be computed using any algorithm that givesreliable estimates. In our design, the Zero Force (ZF) channel

εM,k

μ2

μ1

Z−1

Z−1εM,k

Figure 5: Second-order digital phase-locked loop filter diagram.

estimator was applied to obtain the initial channel estimate[16]:

H1(m) = D�i (m)Y1(m)

|Di(m)|2. (23)

The symbol Di(m) is the hard decision made by thedemodulator; however, when the first OFDM symbol ofthe superframe is received, the symbol represents the pilotsymbol known to the receiver. After receiving the first OFDMsymbol, the estimator switches to the tracking mode. Thechannel estimates are refined and tracked according to thegradient algorithm, which minimises the mean square error(MSE) [17]

Hk+1(m) = Hk(m) + αH(Yk(m)− Hk(m)Dk(m)

)D�k (m),

(24)

where αH is the coefficient dependent on transmittedsymbols power and is constant during the transmission.The channel coefficients are updated every received OFDMsymbol. The author would like to stress that the channelestimation algorithm is not an integral part of the carrier fre-quency and sampling frequency offset estimation algorithmand other channel estimation algorithms can be applied aswell.


The proposed synchronisation scheme was tested for theWINNER system parameters presented in Table 1. TheRayleigh fading channels were simulated using 20-pathNLOS channel models, denoted as A1, B1, and C2, with root-mean square delay spreads τRMS equal to 24.15, 94.73, and310 nanoseconds, respectively. These models were developedwithin the WINNER project for indoor/small office, typicalurban (TU) microcellular and macrocellular environments[18]. The simulation results were obtained using 10 000channel realisations for each SNR value.

7.1. Acquisition. As a first test, the comparison of theaccuracy of the timing synchronisation using the proposedtime metric with the 8-element synchronisation symbolwith respect to the accuracy of the Schmidl and Coxsynchronisation algorithm using 2-element synchronisationsymbol was performed. The results are presented in Figure 4.The performance of the new metric is slightly better than the


Table 1: WINNER signal parameters.

Base Coverage Urban Microcellular Indoor

Carrier frequency 3.95 GHz DL 3.95 GHz 3.95 GHz

Signal bandwidth 2 × 45 MHz 89.84 GHz 89.84 GHz

Subcarrier distance 39062.5 Hz 48828.125 Hz 48828.125 Hz

Used subcarriers 1152 1840 1840

IFFT size N 2048 2048 2048

Prefix length NG 256 200 200

Channel models C2 B1 A1

Max velocity 19.44 m/s 19.44 m/s 1.39 m/s

Packet langth 192 192 192

10−6

10−5

10−4

10−3

MSE

4 6 8 10 12 14 16 18 20 22 24

SNR (dB)

Schmidl & Cox, A1Proposed, A1Schmidl & Cox, B1

Proposed, B1Schmidl & Cox, C2Proposed, C2

Figure 6: Frequency synchronisation MSE of Schmidl and Coxalgorithm and the proposed algorithm for A1, B1, and C2 channels.

performance of the latter one in all three scenarios. However,as opposed to Schmidl and Cox method, the proposed coarsetiming synchronisation is already finished at the beginning ofthe second half of the synchronisation symbol.

Results of both fractional frequency offset estimationalgorithms, obtained for three different channels, are pre-sented in Figure 6. The algorithms performance was testedfor the frequency offsets close to the maximum frequencyoffsets that the algorithms are able to estimate, that is,0.99Δ f for Schmidl and Cox algorithm and 3.99Δ f forthe proposed solution. Although the correlation length inthe proposed algorithm is four times shorter than in theSchmidl and Cox algorithm, the accuracy of both solutionsis almost the same, regardless of the transmission scenario.Similar performance between the proposed solution and thereference algorithm is achieved as a result of the averagingof the estimates computed during the reception of thesynchronisation symbol. The comparison of the accuracy ofthe algorithm with and without averaging is illustrated inFigure 7. The averaging decreases the MSE approximately bya factor of 10 for all SNR values.

If the frequency offset is larger than four times subcarrierdistance, an integer frequency offset estimation algorithm,like the one described in [19] or [20], is required.

10−6

10−5

10−4

10−3

10−2

MSE

4 6 8 10 12 14 16 18 20 22 24

SNR (dB)

With averagingWithout averaging

Figure 7: Frequency synchronisation MSE with and withoutaveraging of the frequency offset estimate.

7.2. Tracking. During the tracking mode, randomly gen-erated user data and pilots were mapped onto a QPSKconstellation. Loops’ parameters used by both algorithmsduring simulations are shown in Table 2.

The algorithms for the carrier frequency and samplingfrequency offsets estimation and tracking were tested forfrequency offsets of δ f = 0.01 and δ f = 0.05 andthe sampling frequency offsets of δTs = 5 ppm and 30ppm. The second frequency offset was chosen to be largerthan the maximum frequency offset estimation error of thefrequency synchronisation algorithm. The results of SPOestimation are illustrated in Figures 8, 9, and 10 for A1,B1, and C2 scenarios, respectively. The mean square errorof the estimated SPO is the same in the whole used SNRrange, except for small signal power in the C2 scenario.The influence of the channel estimator inaccuracy on theproposed algorithm performance is visible when comparedwith the results achieved for the AWGN channel only. Themean square error floor occurs for large SNR values due tothe Rayleigh fading channel and its estimation.

The same error floor behaviour can be observed duringthe estimation of the carrier frequency offset (see Figures 11,12, and 13). In A1 and C2 scenarios, the algorithm estimatessmall δ f more accurately than the larger offsets for small


Table 2: DPLL loops parameters.

Channel model AlgorithmSFO DPLL CFO DPLL

ζ ωn ζ ωn

A1DD 0.20 0.20 0.40 0.50

proposed 0.30 0.20 0.40 0.50

B1DD 0.30 0.20 0.40 0.50

proposed 0.35 0.20 0.50 0.30

C2DD 0.23 0.44 0.40 0.50

proposed 0.23 0.44 0.30 0.50

10−14

10−13

10−12

10−11

10−10

MSE

5 10 15 20 25

SNR (dB)

δTs = 30 ppm, A1δTs = 5 ppm, A1δTs = 30 ppm, AWGN

Figure 8: The mean square error of the estimated SPO in A1channel.

10−14

10−13

10−12

10−11

MSE

5 10 15 20 25

SNR (dB)

δTs = 30 ppm, B1δTs = 5 ppm, B1δTs = 5 ppm, AWGN

Figure 9: The mean square error of the estimated SPO in B1channel.

SNRs. However, again an MSE floor occurs for large SNRvalues.

The performance of the proposed carrier frequency offsetand sampling period offset estimation algorithm was testedfor small and large velocities of the terminal with respectto its maximum value. The simulation results, obtained forSNR=30 dB, δTs = 30 pps, and δ f = 0.05, are presentedin Figure 14 for SPO estimation and in Figure 15 for CFO

10−14

10−13

10−12

10−11

10−10

MSE

5 10 15 20 25

SNR (dB)

δTs = 30 ppm, C2δTs = 5 ppm, C2δTs = 30 ppm, AWGN

Figure 10: The mean square error of the estimated SPO in C2channel.

10−8

10−7

10−6

10−5

10−4

MSE

5 10 15 20 25

SNR (dB)

δ f = 0.05 ppm, A1δ f = 0.03 ppm, A1δ f = 0.05 ppm, AWGN

Figure 11: The mean square error of the estimated CFO in A1channel.

estimation. The mean square error of the offset estimationdegrades rapidly with the low but increasing velocity of theterminal. The degradation slows down for velocities largerthan 10 m/s. On average, an increase of the velocity by 10 m/sin B1 and C2 scenarios increases the MSE of the estimatedSPO and CFO approximately by a factor of 1.5. An increaseof the velocity by 1 m/s in A1 scenario increases the MSE ofthe estimated SPO and CFO by a factor of 1.2.


10−8

10−7

10−6

10−5

MSE

5 10 15 20 25

SNR (dB)

δ f = 0.05 ppm, B1δ f = 0.03 ppm, B1δ f = 0.05 ppm, AWGN

Figure 12: The mean square error of the estimated CFO in B1channel.

10−7

10−6

10−5

10−4

MSE

5 10 15 20 25

SNR (dB)

δ f = 0.05 ppm, C2δ f = 0.03 ppm, C2δ f = 0.05 ppm, AWGN

Figure 13: The mean square error of the estimated CFO in C2channel.

10−14

10−13

10−12

10−11

10−10

MSE

0 5 10 15 20 25 30

v (m/s)

A1B1C2

10−13

10−12

0 1 2 3 4

Figure 14: The mean square error of the estimated SPO for differentvalues of mobile velocity.

10−7

10−6

10−5

10−4

MSE

0 5 10 15 20 25 30

v (m/s)

A1B1C2

10−7

10−6

0 1 2 3 4

Figure 15: The mean square error of the estimated CFO fordifferent values of mobile velocity.

10−14

10−13

10−12

10−11

10−10

10−9M

SE

5 10 15 20 25

SNR (dB)

Proposed algorithm, A1Decision-directed algorithm, A1Proposed algorithm, B1Decision-directed algorithm, B1Proposed algorithm, C2Decision-directed algorithm, C2

Figure 16: The mean square error of the estimated SFO for δTs =30 ppm.

Finally, both algorithms, that is, the proposed anddecision-directed algorithms, are compared in all scenariosfor a sampling period offset of δTs = 30 ppm and a CFOof δ f = 0.05. However, as with to the proposed solution,carrier frequency and sampling period offsets estimated bythe DD algorithm were filtered using the second-order DPLL.Both solutions used the same sets of subcarrier indices C1

and C2. The results plotted in Figures 16 and 17 indicatethat for low SNR values the proposed algorithm copes betterwith severe channel conditions than the decision-directedone, especially in A1 and C2 scenarios. Poor performance ofthe DD algorithm is related to the increase of the channelestimate phase error due to the hard decisions made by thedata demodulator and propagation of the phase error to thephase-difference-dependent signal (9). Because the proposedsolution does not use hard decisions, the phase errors of


10−7

10−6

10−5

10−4

10−3

10−2

MSE

5 10 15 20 25

SNR (dB)

Proposed algorithm, A1Decision-directed algorithm, A1Proposed algorithm, B1Decision-directed algorithm, B1Proposed algorithm, C2Decision-directed algorithm, C2

Figure 17: The mean square error of the estimated CFO for δ f =0.05.

the erroneous channel estimates are not amplified, and theirinfluence on the overall algorithm performance is smallerthan in the DD algorithm.

8. Conclusions

In this paper, link-level synchronisation algorithms designedfor the OFDM-based proposal for 4G system developed inthe WINNER project have been introduced. A new timemetric and pilot symbol design for coarse timing synchro-nisation, as well as new carrier and sampling frequency offsetestimation algorithms, were proposed. The algorithms weretested in three different transmission scenarios. Simulationresults showed that on the basis of only one OFDM symbol,the algorithms, at the cost of moderate complexity, gaveaccurate time and frequency offset estimates. The carrier andsampling frequency offset estimation and tracking algorithm,based on the channel estimates, is suitable for transmissionsystems with low pilot overhead. Simulation results showedthat for low SNR, the proposed algorithm works better thanthe decision-directed solution.

References

[1] “D2.10: Final report on identified RI key technologies,system concept, and their assessment,” Tech. Rep. IST-2003-507581, Information Society Technologies, Yerevan, Armenia,December 2005.

[2] M. Abaii, G. Auer, Y. Cho, et al., “D6.13.7 Test Scenariosand Calibration Cases Issue 2,” Tech. Rep. IST-4-027756WINNER II, Information Society Technologies, Yerevan,Armenia, December 2006.

[3] J.-J. van de Beek, M. Sandell, and P. O. Borjesson, “MLestimation of time and frequency offset in OFDM systems,”IEEE Transactions on Signal Processing, vol. 45, no. 7, pp. 1800–1805, 1997.

[4] T. M. Schmidl and D. C. Cox, “Robust frequency andtiming synchronization for OFDM,” IEEE Transactions onCommunications, vol. 45, no. 12, pp. 1613–1621, 1997.

[5] F. Tufvesson, O. Edfors, and M. Faulkner, “Time and frequencysynchronization for OFDM using PN-sequence preambles,” inProceedings of the 50th IEEE Vehicular Technology Conference(VTC ’99), vol. 4, pp. 2203–2207, Amsterdam, The Nether-lands, September 1999.

[6] C. Yan, J. Fang, Y. Tang, S. Li, and Y. Li, “OFDM synchroniza-tion using PN sequence and performance,” in Proceedings ofthe 14th IEEE International Symposium on Personal, Indoor andMobile Radio Communications (PIMRC ’03), vol. 1, pp. 936–939, Beijing, China, September 2003.

[7] D. K. Kim, S. H. Do, H. B. Cho, H. J. Chol, and K. B. Kim, “Anew joint algorithm of symbol timing recovery and samplingclock adjustment for OFDM systems,” IEEE Transactions onConsumer Electronics, vol. 44, no. 3, pp. 1142–1149, 1998.

[8] S. A. Fechtel, “OFDM carrier and sampling frequency syn-chronization and its performance on stationary and mobilechannels,” IEEE Transactions on Consumer Electronics, vol. 46,no. 3, pp. 438–441, 2000.

[9] K. Shi, E. Serpedin, and P. Ciblat, “Decision-directed finesynchronization in OFDM systems,” IEEE Transactions onCommunications, vol. 53, no. 3, pp. 408–412, 2005.

[10] B. Yang, K. B. Letaief, R. S. Cheng, and Z. Cao, “Timingrecovery for OFDM transmission,” IEEE Journal on SelectedAreas in Communications, vol. 18, no. 11, pp. 2278–2291, 2000.

[11] D. Aronsson, G. Auer, S. Bittner, et al., “Link level proceduresfor the WINNER System,” Tech. Rep. IST-4-027756 WIN-NER II, Information Society Technologies, Yerevan, Armenia,November 2007.

[12] P. H. Moose, “Technique for orthogonal frequency divisionmultiplexing frequency offset correction,” IEEE Transactionson Communications, vol. 42, no. 10, pp. 2908–2914, 1994.

[13] M. Luise and R. Reggiannini, “Carrier frequency acquisitionand tracking for OFDM systems,” IEEE Transactions onCommunications, vol. 44, no. 11, pp. 1590–1598, 1996.

[14] F. M. Gardner, Phaselock Techniques, John Wiley & Sons, NewYork, NY, USA, 2005.

[15] Z.-W. Zheng, Z.-X. Yang, C.-Y. Pan, and Y.-S. Zhu, “Novelsynchronization for TDS-OFDM-based digital television ter-restrial broadcast systems,” IEEE Transactions on Broadcasting,vol. 50, no. 2, pp. 148–153, 2004.

[16] J. Proakis, Digital Communications, McGraw-Hill, New York,NY, USA, 4th edition, 2001.

[17] A. Langowski, A. Piatyszek, Z. Długaszewski, and K.Wesołowski, “VHDL realisation of the channel estimator andthe equaliser in the OFDM receiver,” in Proceedings of the 10thNational Symposium of Radio Science (URSI ’02), pp. 129–134,Poznan, Poland, March 2002.

[18] “D5.4 Final Report on Link Level and System Level ChannelModels,” Tech. Rep. IST-2003-507581 WINNER, InformationSociety Technologies, Yerevan, Armenia, September 2005.

[19] K. Bang, N. Cho, J. Cho, et al., “A coarse frequency offsetestimation in an OFDM system using the concept of thecoherence phase bandwidth,” IEEE Transactions on Commu-nications, vol. 49, no. 8, pp. 1320–1324, 2001.

[20] Z. Długaszewski and K. Wesołowski, “Simple coarse frequencyoffset estimation schemes for OFDM burst transmission,” inProceedings of the 13th IEEE International Symposium on Per-sonal, Indoor and Mobile Radio Communications (PIMRC ’02),vol. 2, pp. 567–571, Lisbon, Portugal, September 2002.


Research Article

Impact of Carrier Frequency Offsets on Block-IFDMA Systems

E. P. Simon, V. Degardin, and M. Lienard

Telecommunications, Interferences and Electromagnetic Compatibility (TELICE), Institute of Electronics,Microelectronics and Nanotechnology (IEMN) Laboratory, University of Lille, IEMN/UMR 8520, 59655 Villeneuve d Ascq, France

Correspondence should be addressed to E. P. Simon, [email protected]

Received 25 June 2008; Accepted 15 December 2008

Recommended by Heidi Steendam

Recently, a new multiple access (MA) scheme called block-interleaved frequency division multiple access (B-IFDMA) is underconsideration as an MA scheme candidate for 4G wireless applications. In this paper, the two variants of B-IFDMA are considered,the joint- DFT B-IFDMA and the added-signal B-IFDMA, and compared in terms of sensitivity to carrier frequency offsets (CFOs)for both uplink and downlink. CFO gives rise to multiuser interference and self-user interference. We derive analytical expressionsfor the power of these interferences, and we quantify their detrimental effect through the evaluation of the signal-to-interference-plus-noise ratio (SINR) degradation. We point out that both variants of B-IFDMA are not similarly affected by CFO. Hence,joint-DFT B-IFDMA provides a better robustness to multiuser interference than added-signal B-IFDMA, and so is better suitedfor the uplink. Then we show by means of numerical results that added-signal B-IFDMA is less sensitive to CFO in the downlink.

Copyright © 2009 E. P. Simon et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

In the context of the research on beyond 3rd and 4thgeneration (B3G/4G) mobile radio systems, a novel power-efficient multiple access scheme called block-interleavedfrequency multiple access (B-IFDMA) has been proposed asa candidate for nonfrequency-adaptive transmission mode.B-IFDMA is a particular case of discrete Fourier transform(DFT) precoded OFDMA, where the data of the user underconsideration is transmitted on blocks of subcarriers that areequidistantly distributed over the total available bandwidth.Hence, it can be viewed as a generalization of DFT precodedOFDMA with interleaved subcarrier allocation, also calledIFDMA [1]. Two different variants of B-IFDMA are currentlyunder investigation, the joint-DFT B-IFDMA and the added-signal B-IFDMA [2, 3]. The joint-DFT B-IFDMA signal isbased on applying DFT once to all subcarriers assigned to agiven user whereas the added-signal B-IFDMA is constructedby applying DFT to groups of subcarriers.

The robustness of B-IFDMA compared to IFDMA tocarrier frequency offsets (CFOs) has been discussed in [2] forthe uplink. The authors showed that B-IFDMA is expectedto be more robust to CFO than IFDMA due to the fact thatschemes with interleaved subcarrier allocation are knownto be more sensitive to CFO compared to schemes with

block allocation. However, it is not clear which variant of B-IFDMA is more robust to CFO. Moreover, to the best of ourknowledge, no detailed analysis exists on the sensitivity of B-IFDMA to CFO. The purpose of this paper is to present acomprehensive study of the sensitivity of the joint-DFT andadded-signal B-IFDMA to CFO and to compare those twovariants in terms of CFO sensitivity.

The effect of CFO on multicarrier schemes has beenstudied in [4] for OFDM, in [5] for MC-DS-CDMA, andin [6] for MC-CDMA. It was shown that CFO gives riseto signal distortions, yielding interference and power losswhich degrades system performance. When this degradationcan no longer be tolerated, carrier frequency correctionmust be applied. For downlink, the CFO is the same for allusers. Hence, the carrier frequency can be corrected by usingfeedback carrier synchronization mechanisms, at the expenseof phase jitter [7, 8]. Note that for uplink, since the CFOsassociated with different users are different to each other, it ismuch more difficult to carry out an offset correction [9, 10].In this paper, we consider both uplink and downlink.

To quantify the performance degradation, we proposeto compute the expressions of the signal-to-interference-plus-noise ratio (SINR) degradation for both variants of B-IFDMA. We also provide a detailed analysis of the obtainedanalytical expressions in order to compare the sensitivity of


both variants to CFO. In addition, numerical results illustratethe analysis.

The paper is organized as follows. In Section 2, a systemmodel including the CFO for both variants of B-IFDMA isgiven. The sensitivity to CFO is investigated in Section 3.Numerical results are presented in Section 4. Section 5concludes the paper.

2. System Model

In this section, a system model including the CFO is given.As added-signal B-IFDMA model can be generated fromIFDMA signals [2], here we focus on the joint-DFT B-IFDMA model. The signal model for IFDMA is describedin detail in [11]. The model for joint-DFT B-IFDMA isderived as a particular case of general precoded OFDMAsystem. Although new algorithms for a lower complexityimplementation of B-IFDMA based on time-domain signalgeneration have been proposed in [3], it is more convenientto perform algebra with the general OFDMA transmittermodel.

The joint-DFT B-IFDMA transmitter of user u (seeFigure 1) performs a block transmission of Q symbols

a(u)q , q = 0, . . . ,Q− 1, which are assumed to be uncorrelated

symbols with power E(u)s .

The first operation consists in a DFT-precoding of thedata symbol vector:

X (u)q = 1√

Q

Q−1∑

n=0

a(u)n c

qn, q = 0, . . . ,Q − 1, (1)

where cqn = e− j2π(nq/Q), n = 0, . . . ,Q−1 is a Fourier sequence.

Let N = KQ designate the total number of subcarriersavailable in the OFDMA system, where K is the maximumnumber of users. Note that Nu will designate the number

of active users. Then, the Q precoded symbols X (u)q , q =

0, . . . ,Q−1 of user u are transmitted on blocks of subcarriersthat are equidistantly distributed over the N subcarriers.Thus, Q =ML, where L stands for the number of blocks andM the number of subcarriers per block. The qth symbol X (u)

q

modulates the subcarrier of indexMuq = lKM+m+uM−N/2,

where q = lM +m; l = 0, . . . ,L− 1; m = 0, . . . ,M − 1. Thismapping is specific to the joint-DFT B-IFDMA scheme.

TheN samples of the transmitted sequence are generatedby feeding the mapped symbols to an inverse fast Fouriertransform. Then, a cyclic prefix of Ng samples is inserted inorder to avoid interference caused by dispersive channel. Thetransmitter feeds those samples at a rate 1/T to a unit energyzero roll-off square root Nyquist filter P( f ) with respect tothe sampling time T .

This results in the continuous- time signal:

x(u)(t) = 1√N

N−1∑

n=−Ng

Q−1∑

q=0

X (u)q ·e j2π(Mu

qn/N)p(t − nT). (2)

The signal x(u)(t) is then transmitted over the dispersivechannel from the transmitter of user u to the base station

DFT-precoding

Q to Nmapping

IFFTInsertprefix

P( f )

a(u)0

a(u)Q−1

X(u)0

X(u)Q−1

x(u)(t)...

Figure 1: Joint-DFT B-IFDMA transmitter for user u.

with the channel transfer functionH(u)ch ( f ). The output of the

dispersive channel is disturbed by a carrier phase error whichlinearly increases in time within an OFDM symbol period:Φ(u)(t) = 2πΔF(u)t+Φ(u)(0), where ΔF(u) stands for the CFOfor user u. Without loss of generality, we assumeΦ(u)(0) = 0.We also assume small CFO compared to the bandwidth ofthe receiver filter ΔF(u)T � 1.

The base station receives the sum of the signals trans-mitted by the different users, disturbed by additive whiteGaussian noise w(t), with uncorrelated real and imaginaryparts, each having a power spectral density N0. The resultingsignal enters the receiver filter, which is matched to thetransmitted filter and is sampled at instants tk = kTassuming perfect timing synchronization.

Without loss of generality, we focus on the detection ofthe data symbols transmitted by the user u. Moreover, toclearly emphasize the effect of CFO, a transmission over anondispersive channel for each user is considered from nowon, that is, H(u)

ch ( f ) = 1, u = 0, . . . ,Nu − 1. So, in order todetect the data symbols of user u, the samples correspondingto the cyclic prefix are removed and the remainingN samplesare fed to the discrete Fourier transform. Note that anequalizer should be used to compensate for the systematicphase rotation of the FFT outputs. However, the equalizer isnot able to eliminate interference caused by CFO. As the topicof this paper is to study the effect of CFO, it is not usefulto include the equalizer in the analysis. Then, Q samplesare taken from the N resulting frequency domain samplesaccording to the specific mapping of user u. ThoseQ samplesare de-precoded by means of an inverse DFT operation. The

qth resulting sample, denoted z(u)q , is used to make a decision

about the data symbol a(u)q . The sample z(u)

q can be written

z(u)q =

Nu−1∑

u′=0

Q−1∑

q′=0

a(u′)q′ I

u,u′q,q′ +W (u)

q , (3)

where W (u)q is a white complex Gaussian noise with variance

2N0 and Iu,u′q,q′ is the contribution of the symbol a(u′)

q′ to theinput of the decision device. The next paragraph deals withthe computation of the quantity Iu,u′

q,q′ .Let us now define an equivalent time-varying channel

for a given user u including the carrier phase errorsand the transmitter and receiver filters. As ΔF(u) is muchsmaller than 1/T , the variation of the phase error over theimpulse response duration of the receiver filter can be safelyneglected. Its Fourier transform is then given by

H(u)eq ( f ; tk) = ∣∣P( f )

∣∣2e jΦ

(u)(tk). (4)


Assuming a sufficient cyclic prefix length, Iu,u′q,q′ finally

reduces to

Iu,u′q,q′ =

1Q

Q−1∑

p=0

Q−1∑

p′=0

cq∗p c

q′

p′1N

N−1∑

k=0

e j2πk(Mu′p′ −Mu

p/N)G(u′)Mu′

p′(tk), (5)

where

G(u′)n (tk) = 1

T

+∞∑

m=−∞H(u′)eq

(n

NT+m

T; tk

)(6)

is the folded transfer function of the equivalent channeldefined in (4) evaluated at the frequencies n/NT .

The quantities Iu,u′q,q′ , q′ = 0, . . . ,Q − 1, u′ = 0, . . . ,Nu − 1

can be classified into several contributions. The first contri-bution obtained for q′ = q, u′ = u is the useful contribution.It can be decomposed into an average useful componentE{Iu,u

q,q } and a zero-mean fluctuation Iu,uq,q −E{Iu,u

q,q } around itsaverage, called self-interference. The contribution obtainedfor (q′ /= q, u′ = u) is the intrablock interference, caused bythe other symbols transmitted by the desired user u. Fromnow on, we group the self-interference and the intrablockinterference both caused by the desired user in order toonly consider one interference term called the self-userinterference (SUI). The last contribution (u′ /=u) is themultiuser interference (MUI). To measure the performanceof the system, we use the SINR which is the ratio of thepower of the average useful component to the sum of thepower of the additive noise with the interference. WhenCFOs are present, the SINR is degraded compared to the casewith no synchronization errors. Then, we compute the SINRdegradation caused by CFO. The SINR is defined as

SINR(u)q =

E(u)s P(u)

U ,q

2N0 + E(u)s(P(u)

SUI,q + P(u)MUI,q

) , (7)

where

P(u)U ,q =

∣∣E{Iu,uq,q

}∣∣2, (8)

P(u)SUI,q = E

{∣∣Iu,uq,q − E

{Iu,uq,q

}∣∣2}+

Q−1∑

q′=0; q′ /= qE{∣∣Iu,u

q,q′∣∣2}

, (9)

P(u)MUI,q =

Nu−1∑

u′=0; u′ /=u

Q−1∑

q′=0

E(u′)s

E(u)s

E{∣∣Iu,u′

q,q′∣∣2}

. (10)

In the absence of synchronization errors, the SINRbecomes independent of the symbol index q and is given by

SINR(u)(0) = E(u)s

2N0, (11)

whereas in the presence of synchronization errors, the SINRis reduced compared to SINR(u)(0). The degradation of theSINR compared to SINR(u)(0) expressed in decibels is finallygiven by

Deg = −10 log

(P(u)U ,q

1 + SINR(u)(0)(P(u)

SUI,q + P(u)MUI,q

)

). (12)

3. Impact of Carrier Frequency Offseton B-IFDMA

In this section, we investigate the effect of CFO to the per-formance of the two B-IFDMA variants, the joint-DFT B-IFDMA and the added-signal B-IFDMA. First, we considerthe joint-DFT B-IFDMA signal.

3.1. Joint-DFT B-IFDMA. Under the assumption of a non-dispersive channel, (6) becomes

G(u′)n (kT) = e j2πΔF

(u′)kT . (13)

Thus, (5) reduces to

Iu,u′q,q′ =

1Q

Q−1∑

p=0

Q−1∑

p′=0

cq∗p c

q′

p′DN

(M(u′)p′ −M(u)

p

N+ ΔF(u′)T

),

(14)

where DN (x) is defined as

DN (x) = 1N

N−1∑

n=0

e2 jπnx = e jπ(N−1)x sinπNxN sinπx

. (15)

The power of the average useful component, the self-userinterference and the multiuser interference are computed byinserting (14) in (8), (9), and (10), respectively. The details ofthe computation are reported in the appendix, yielding (16),(17), and (18):

P(u)U = ∣∣DN

(ΔF(u)T

)∣∣2, (16)

P(u)SUI = A(u,u)(ΔF(u))− ∣∣DN

(ΔF(u)T

)∣∣2, (17)

P(u)MUI =

Nu−1∑

u′=0;u′ /=u

E(u′)s

E(u)s

×(A(u,u′)(ΔF(u′))

−∣∣∣∣DN

((u′ − u)M

N+ ΔF(u′)T

)∣∣∣∣2).

(18)

Note that since the obtained expressions are independantof the desired symbol index q, we have dropped this index.In (17) and (18), the term A(u,u′)(ΔF(u′)) is defined in (19):

A(u,u′)( f ) = 1M

M−1∑

m=−(M−1)

(M − |m|)

×∣∣∣∣DKM

(L(f T +

(u′ − u)M +m

N

))∣∣∣∣2

.

(19)

Note that since DN (x) is periodic of period 1, A(u,u′)( f ) isa periodic function with period 1/LT = KM/NT , whichcorresponds to the spacing between two blocks ofM adjacentsubcarriers. Also note that when M increases, it can be


shown that the pattern of the periodic function tends to thefollowing triangular function:

Λ( f ) =

⎧⎪⎨⎪⎩

1−∣∣∣∣ f

NT

M

∣∣∣∣, | f | < M

NT,

0, otherwise.(20)

Figure 2 shows the plots of A(u,u)( f ) and DN ( f ) for M = 8,L = 2, and K = 2.

In addition to the interference terms, it follows from (16)that the useful component at the FFT output is reducedcompared to the case of a zero CFO. Hence, to keep thepower loss within reasonable bounds, the CFO must satisfyΔF(u)T � 1/N which is easy to understand since the IFFTbehaves like a bank of filters of bandwidth 1/(NT).

The resulting expression of the degradation for joint-DFT B-IFDMA is obtained by inserting (16), (17), and (18)in (12).

3.2. Added-Signal B-IFDMA. The added-signal B-IFDMAmodel for a given user comes from the superimposing of MIFDMA signals, each with L subcarriers [2]. TheseM IFDMAsignals are mutually shifted by one subcarrier bandwidth.

On the other hand, the signal model for IFDMA can beviewed as a particular case of joint-DFT B-IFDMA, where theblock size M equals 1. Hence, from these two remarks andfrom the results obtained in Section 3.1, it is straightforwardto compute the interference power expressions for the added-signal B-IFDMA. The useful power is the same as that ofjoint-DFT B-IFDMA, given by (16). The interference powerexpressions are given by (21) and (22):

P(u)SUI =

M−1∑

m=0

(∣∣∣∣DKM

(L(ΔF(u)T +

m

N

))∣∣∣∣2

−∣∣∣∣DN

(ΔF(u)T +

m

N

)∣∣∣∣2)

,

(21)

P(u)MUI =

Nu−1∑

u′=0;u′ /=u

E(u′)s

E(u)s

×M−1∑

m=0

(∣∣∣∣DKM

(L(ΔF(u′)T +

m + (u′ − u)MN

))∣∣∣∣2

−∣∣∣∣DN

(ΔF(u′)T +

m + (u′ − u)MN

)∣∣∣∣

2).

(22)

The resulting expression of the degradation for added-signal B-IFDMA is obtained by inserting (16), (21), and (22)in (12).

3.3. Comparison of Sensitivity to CFO for Both Variants of B-IFDMA. To compare both variants of B-IFDMA in termsof sensitivity to CFO, we analyze the interference powerexpressions obtained in the previous sections. We start withthe analysis of the SUI power. From (17) and from the shapeof the functions A(u,u)( f ) and DN ( f T) given in Figure 2,

0

0.2

0.4

0.6

0.8

1

−1T

−KMNT

01NT

M

NT

KM

NT

1T

fA(u,u)( f )

DN ( f T)

Figure 2: Plot of A(u,u)( f ) and DN ( f T) for M = 8, L = 2, andK = 2.

it follows that to obtain small SUI power for joint-DFT B-IFDMA, ΔF(u)T must be limited, that is, ΔF(u)T � 1/N .On the contrary, it follows from (21) that the SUI power foradded-signal B-IFDMA is very small even for ΔF(u)T > 1/N .Figure 3 illustrates the SUI power as a function of ΔF(u)

for M = 8, L = 2, and K = 2. Let us now consider theMUI power. Note that for both variants of B-IFDMA, theinterference power due to user u′, u′ /=u, can be obtainedby shifting in frequency domain the SUI power expressionby (u′ − u)M/NT and by evaluating it at the frequencyΔF(u′). Hence, when considering the joint-DFT B-IFDMA,even when the condition ΔF(u)T � 1/N is not satisfied, theMUI power value is small which is not the case for added-signal B-IFDMA (see Figure 3).

In summary, it turns out that for the joint-DFT B-IFDMA, most of the interference comes from the SUIwhereas the added-signal B-IFDMA mostly suffers fromthe MUI. Numerical results are presented in Section 4 toillustrate this analysis.

4. Numerical Results

In this section, we present numerical results of SINRdegradations due to CFO for the joint-DFT B-IFDMA andadded-signal B-IFDMA. We assume the same CFO ΔF forall users, that is, ΔF(u′) = ΔF for u′ = 0, . . . ,K − 1. We alsoassume that all users exhibit the same energy per symbol withQ = 64 subcarriers assigned to each user. The maximumnumber of users is K = 8 and SINR(0) = 25 dB.

Figure 4 shows the SINR degradation computed with(12) as a function of ΔF for the full load with M = 8subcarriers per block and L = 8 blocks. As expected, weobserve that both variants are very sensitive to CFO. Hence,in order to keep the degradation value small (say, less than0.5 dB), it is required that ΔF < 0.01/NT .

We also observe that the joint-DFT B-IFDMA is lessrobust to CFO than added-signal B-IFDMA. For instance,


00.20.40.60.8

1

SUI power as function of ΔF(u)T

−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5

ΔF(u)T

Joint-DFT B-IFDMAAdded-signal B-IFDMA

(a)

00.20.40.60.8

1

MUI power as function of ΔF(u′)T

−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5

ΔF(u′)T

Joint-DFT B-IFDMAAdded-signal B-IFDMA

(b)

Figure 3: SUI power and MUI power for M = 8, L = 2 and K = 2.

for the same CFO of 0.03/NT , the degradation with the joint-DFT B-IFDMA is 1 dB higher than that with the added-signalB-IFDMA.

For the sake of comparison, we plot the degradationobtained for IFDMA systems. The considered IFDMA systemhas the same number of subcarriers assigned to each user(Q = 64), which are equidistantly distributed over the totalbandwidth [11]. As IFDMA can be regarded as a special caseof joint-DFT B-IFDMA with M = 1, it is straightforward toobtain the degradation expression.

As we observe, the degradation value for IFDMA is veryclose to that of the added-signal B-IFDMA. Hence, as theadded-signal B-IFDMA model is obtained by superimposingIFDMA signals, the behavior of both systems is nearly similarin terms of CFO sensitivity.

In Figure 5, the degradation value is shown as a functionof the number of active users for ΔF = 0.02/NT withthree different sets of values of M and L. First, we considerM = 8 and L = 8, then M = 4 and L = 16, and finallyM = 2 and L = 32. As already mentioned earlier, when theload is maximum, the joint-DFT B-IFDMA is more sensitiveto CFO than the added-signal B-IFDMA. However, for thejoint-DFT B-IFDMA, we observe that the degradation valueis near its maximum with just one active user (above allfor high values of M). This means that the degradation isessentially dominated by the SUI and that contribution ofthe MUI is weak. On the contrary, the MUI contribution isthe dominant one for the added-signal B-IFDMA. Hence,the joint-DFT B-IFDMA is better suited than the added-signal B-IFDMA in terms of CFO sensitivity if an uplink

0

0.5

1

1.5

2

2.5

3

3.5

Deg

rada

tion

(dB

)

Joint-DFT B-IFDMAAdded-signal B-IFDMAIFDMA

0.001 0.01

NΔFT

Figure 4: Degradation as a function of ΔF for the full load withM = 8, L = 8 (yielding Q = 64), and K = 8.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6D

egra

dati

on(d

B)

1 2 3 4 5 6 7 8

Number of active users

Joint-DFT B-IFDMA, M = 8,L = 8Added-signal B-IFDMA, M = 8,L = 8IFDMAJoint-DFT B-IFDMA, M = 4,L = 16Added-signal B-IFDMA, M = 4,L = 16Joint-DFT B-IFDMA, M = 2,L = 32Added-signal B-IFDMA, M = 2,L = 32

Figure 5: Degradation as a function of number of active users withΔF = 0.02/NT , Q = 64, and K = 8.

is considered. On the other hand, for the downlink, it hasbeen shown that the added-signal B-IFDMA is more robustto CFO than the joint-DFT B-IFDMA. Note that this generaltrend may no longer be valid if the number of subcarriersper block M is small. Indeed, when M decreases, B-IFDMAsignal model tends toward IFDMA signal model, and for theparticular case of M = 1, B-IFDMA corresponds to IFDMA.This is observed in Figure 5 wherein the behavior for bothvariants of B-IFDMA tends toward that of IFDMA when Mis decreased.


5. Conclusion

In this paper, the two variants of B-IFDMA, the joint-DFT B-IFDMA and the added-signal B-IFDMA, have beeninvestigated in terms of carrier frequency offset (CFO)sensitivity. CFO gives rise to useful power loss togetherwith interference, leading to performance degradation. Toevaluate this performance degradation, we have determinedthe theoretical expressions of the SINR degradation causedby CFO at the input of the decision device. The results of theanalysis have shown a different behavior for both variants ofB-IFDMA in terms of CFO sensitivity. Hence, when consid-ering the added-signal B-IFDMA, the multiuser interferencecontributions are the dominant ones. For the joint-DFTB-IFDMA, the degradation is found to be dominated byself-user interference. As a consequence, it appears that, interms of sensitivity to CFO, joint-DFT B-IFDMA is bettersuited than added-signal B-IFDMA for the uplink. Indeed,the effect of multiuser interference is far more complex tobe corrected with the uplink case than downlink. Then,the numerical results have shown that the added-signal B-IFDMA is more robust to CFO for the downlink.

Appendix

The purpose of this appendix is to give an outline of themain steps leading to the evaluation of the interference powercaused by CFO for the joint-DFT B-IFDMA. A simple way toperform computations of the useful and interference powerexpressions is to follow an approach similar to the approachused for MC-CDMA with orthogonal spreading sequences(Walsh-Hadamard) [7, 12]. Let Q be the spreading factor.In this approach, although the used spreading sequencescontain no randomness, the authors introduce randomnessby assuming that each of the Q sequences can be assignedwith a probability 1/Q to the first user, each of the remaining(Q − 1) sequences can be assigned with a probability 1/(Q −1), and so on. Thus, we obtain averages over all users ofthe expressions for nonrandom Walsh-Hadamard sequences.On the other hand, as IFDMA can be viewed as a fullyloaded spread spectrum multicarrier transmission scheme[11], where the spreading sequences are Fourier sequences,and since the Fourier sequences are also orthogonal, we cansafely extend this approach to the B-IFDMA. This approachleads to using the following formulas [7]:

E{cq∗n c

qn′} = δn,n′ , (A.1)

where δn,n′ equals 1 if n = n′ and 0 otherwise.

E{cq∗n c

qn′c

q∗m c

qm′} = δn,mδn′,m′ + δn,n′δm,m′ − δn,n′,m,m′

(A.2)

where δn,n′,m,m′ = δn,n′δm,m′δn,m,

E{cq∗n c

q′

n′cq∗m c

q′

m′}

= δn,mδn′,m′ − 1Q − 1

(δn,n′δm,m′ − δn,n′,m,m′

).

(A.3)

To begin with, let us focus on the useful power expres-sion. We first use (14) in (8). Then, by using (A.1), it

is straightforward to find (16). The computation of theinterference power needs more stages. Let us consider theself-interference (SI) power computation. Using (14) in thefirst term of (9) yields a first expression. Then, using (A.2) inthis expression yields after some computations (A.4):

P(u)SI =

1Q2

Q−1∑

p=0

Q−1∑

p′=0

∣∣∣∣DN

(M(u)p′ −M(u)

p

N+ ΔF(u)T

)∣∣∣∣

2

− 1Q

∣∣DN(ΔF(u)T

)∣∣2.

(A.4)

We do the same for the intrablock interference (IBI), (resp.,MUI) by using (A.2) (resp., (A.3)), yielding (A.5) and (A.6):

P(u)IBI =

Q − 1Q2

Q−1∑

p=0

Q−1∑

p′=0

∣∣∣∣DN

(M(u)p′ −M(u)

p

N+ ΔF(u)T

)∣∣∣∣

2

− Q − 1Q

∣∣DN(ΔF(u)T

)∣∣2,

(A.5)

P(u)MUI =

Nu−1∑

u′=0;u′ /=u

E(u′)s

E(u)s

×(

1Q

Q−1∑

p=0

Q−1∑

p′=0

∣∣∣∣DN

(M(u′)p′ −M(u)

p

N+ ΔF(u′)T

)∣∣∣∣

2

−∣∣∣∣DN

((u′ − u)M

N+ ΔF(u′)T

)∣∣∣∣2).

(A.6)

Let us now put those expressions in a concatenatedform in order to facilitate their interpretation. DefineA(u,u′)(ΔF(u′)) as follows:

A(u,u′)(ΔF(u′))= 1Q

Q−1∑

p=0

Q−1∑

p′=0

∣∣∣∣DN

(M(u′)p′ −M(u)

p

N+ ΔF(u′)T

)∣∣∣∣

2

.

(A.7)

We develop A(u,u′)(ΔF(u′)) by first using the definition ofthe joint-DFT specific mapping given in Section 2. Hence,the summation over p (resp., p′) becomes a summationover l and m (resp., l′ and m′), where p = lM + m (resp.,p′ = l′M + m′). Then, the use of (15) in (A.7) yields (A.8)after rearranging the terms:

A(u,u′)(ΔF(u′)) = 1QN2

N−1∑

n=0

N−1∑

n′=0

e j2π(n−n′)(ΔF(u′)T+((u′−u)M/N))

×M−1∑

m=0

M−1∑

m′=0

e j(2π/N)(n−n′)(m′−m)

×L−1∑

l′=0

e j(2π/L)l′(n−n′)L−1∑

l=0

e− j(2π/L)l(n−n′).

(A.8)


The last summation in (A.8) reduces to

L−1∑

l=0

e− j(2π/L)l(n−n′) =⎧⎨⎩L, for n− n′ = αL,

0, otherwise,(A.9)

where α is an integer. Let us now decompose n (n =0, . . . ,N − 1) as n = μL + λ, (μ = 0, . . . ,KM − 1, λ =0, . . . ,L − 1). Similarly, n′ = μ′L + λ′. From (A.9), it followsthat the last summation in (A.8) equals L only for λ = λ′.With this substitution and after some rearrangement, (A.8)becomes

A(u,u′)(ΔF(u′))

= 1M

M−1∑

m=0

M−1∑

m′=0

×∣∣∣∣DKM

(L(ΔF(u′)T +

(u′ − u)M +m′ −mN

))∣∣∣∣2

.

(A.10)

Finally, A(u,u′)(ΔF(u′)) reduces to (19). Thus, we obtainthe interference power expressions given in (17) and (18),with A(u,u′)(ΔF(u′)) defined in (19).

Acknowledgments

This work has been carried out in the framework of theCampus International sur la Securite et l Intermodalitedes Transports (CISIT) project and funded by the FrenchMinistry of Research, the Region Nord Pas de Calais, and theEuropean Commission (FEDER funds).

References

[1] U. Sorger, I. De Broeck, and M. Schnell, “Interleaved FDMA—a new spread-spectrum multiple-access scheme,” in Proceed-ings of the IEEE International Conference on Communications(ICC ’98), vol. 2, pp. 1013–1017, Atlanta, Ga, USA, June 1998.

[2] T. Svensson, T. Franky, D. Falconer, M. Sternad, E. Costa,and A. Klein, “B-IFDMA—a power efficient multiple accessscheme for non-frequency-adaptive transmission,” in Proceed-ings of the 16th IST Mobile and Wireless CommunicationsSummit, pp. 1–5, Budapest, Austria, July 2007.

[3] T. Frank, A. Klein, and E. Costa, “An efficient implemen-tation for block-IFDMA,” in Proceedings of the 18th IEEEInternational Symposium on Personal, Indoor and MobileRadio Communications (PIMRC ’07), pp. 1–5, Athens, Greece,September 2007.

[4] T. Pollet, M. Van Bladel, and M. Moeneclaey, “BER sensitivityof OFDM systems to carrier frequency offset and Wiener phasenoise,” IEEE Transactions on Communications, vol. 43, no. 2–4,pp. 191–193, 1995.

[5] H. Steendam and M. Moeneclaey, “The effect of carrierfrequency offsets on downlink and uplink MC-DS-CDMA,”IEEE Journal on Selected Areas in Communications, vol. 19, no.12, pp. 2528–2536, 2001.

[6] H. Steendam and M. Moeneclaey, “The sensitivity of MC-CDMA to synchronisation errors,” European Transactions onTelecommunications, vol. 10, no. 4, pp. 429–436, 1999.

[7] H. Steendam and M. Moeneclaey, “The effect of carrier phasejitter on MC-CDMA performance,” IEEE Transactions onCommunications, vol. 47, no. 2, pp. 195–198, 1999.

[8] E. Simon, R. Legouable, M. Helard, and M. Lienard, “Impactof phase and timing jitter on IFDMA systems,” EuropeanTransactions on Telecommunications, vol. 19, no. 6, pp. 697–705, 2008.

[9] J. Choi, C. Lee, H. W. Jung, and Y. H. Lee, “Carrier frequencyoffset compensation for uplink of OFDM-FDMA systems,”IEEE Communications Letters, vol. 4, no. 12, pp. 414–416,2000.

[10] Z. Cao, U. Tureli, and Y.-D. Yao, “Deterministic multiusercarrier-frequency offset estimation for interleaved OFDMAuplink,” IEEE Transactions on Communications, vol. 52, no. 9,pp. 1585–1594, 2004.

[11] M. Schnell, I. De Broeck, and U. Sorger, “A promisingnew wideband multiple-access scheme for future mobilecommunications systems,” European Transactions on Telecom-munications, vol. 10, no. 4, pp. 417–427, 1999.

[12] H. Steendam and M. Moeneclaey, “MC-CDMA performancein the presence of timing errors,” in Proceedings of the 2ndConference on Telecommunications (ConfTele ’99), pp. 211–215, Sesimbra, Portugal, April 1999.


Research Article

Effects of Carrier Frequency Offset, Timing Offset, andChannel Spread Factor on the Performance of HexagonalMulticarrier Modulation Systems

Kui Xu and Yuehong Shen

Institute of Communications Engineering, PLA University of Science and Technology, No. 2 Biaoying Road,Yudao Street, Nanjing 210007, China

Correspondence should be addressed to Kui Xu, xiancheng [email protected]

Received 17 May 2008; Revised 6 October 2008; Accepted 31 January 2009

Recommended by Mounir Ghogho

Hexagonal multicarrier modulation (HMM) system is the technique of choice to overcome the impact of time-frequency dispersivetransmission channel. This paper examines the effects of insufficient synchronization (carrier frequency offset, timing offset) onthe amplitude and phase of the demodulated symbol by using a projection receiver in hexagonal multicarrier modulation systems.Furthermore, effects of CFO, TO, and channel spread factor on the performance of signal-to-interference-plus-noise ratio (SINR)in hexagonal multicarrier modulation systems are further discussed. The exact SINR expression versus insufficient synchronizationand channel spread factor is derived. Theoretical analysis shows that similar degradation on symbol amplitude and phase causedby insufficient synchronization is incurred as in traditional cyclic prefix orthogonal frequency-division multiplexing (CP-OFDM)transmission. Our theoretical analysis is confirmed by numerical simulations in a doubly dispersive (DD) channel with exponentialdelay power profile and U-shape Doppler power spectrum, showing that HMM systems outperform traditional CP-OFDM systemswith respect to SINR against ISI/ICI caused by insufficient synchronization and doubly dispersive channel.

Copyright © 2009 K. Xu and Y. Shen. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

Multicarrier modulation (MCM) is a popular transmissionscheme in which the data stream is split into several sub-streams and transmitted, in parallel, on different subcarriers.We consider MCM over time-varying multipath propagationchannels which spread the MCM signal simultaneouslyin both the time and frequency domains. This spreadinginduces both intersymbol interference (ISI) and intercarrierinterference (ICI) which complicate data demodulation. Wewill refer to channels that are time dispersive and frequencydispersive as doubly dispersive (DD) channels.

Orthogonal frequency-division multiplexing (OFDM)systems with guard time interval or cyclic prefix can preventISI, but do not combat ICI because they are based onrectangular-type pulses. In order to overcome the aforemen-tioned drawbacks of OFDM systems, several pulse-shapingOFDM systems were proposed [1–15]. Most works on pulsedesign exclusively dealt with rectangular time-frequency

(TF) lattices. It is shown that transmission in rectangularlattices is suboptimal for doubly dispersive channels [9]. Byusing sphere covering theory, the authors have demonstratedthat lattice OFDM (LOFDM) systems, which are OFDMsystems based on hexagonal-type lattices, provide betterperformance against ISI/ICI. However, LOFDM confinesthe transmission pulses to a set of orthogonal ones. Aspointed out in [2, 10, 13, 16], these orthogonalized pulsesdestroy the time-frequency (TF) concentration of the initialpulses, hence lower the robustness to the time and frequencydispersion caused by the propagation channel.

In [16], the authors abandoned the orthogonality con-dition of the modulated pulses and proposed a multicar-rier transmission scheme on hexagonal lattice named ashexagonal multicarrier modulation (HMM) by regardingsignal transmission as tiling of the TF plane. To optimallycombat the impact of the propagation channels, the latticeparameters and the pulse shape of modulation waveform arejointly optimized to adapt to the channel scattering function.


It is shown that the hexagonal multicarrier transmissionsystems obtain lower energy perturbation, hence outperformOFDM and LOFDM systems from the robustness againstchannel dispersion point of view.

Synchronization is considered as a key factor of designingmulticarrier modulation system receiver. The synchroniza-tion precision significantly affects the receiver performanceand usually depends on the precision of carrier frequencyoffset estimation and symbol timing. A generalized frame-work for the prediction of OFDM system performance inthe presence of carrier frequency offset (CFO) and timingoffset (TO), including the transmitter and receiver pulseshapes as well as the channel, is presented in [17]. Thesignal-to-interference-plus-noise ratio (SINR) performancelow bound on the effects of Doppler spread in OFDM systemis studied in [18].

In this paper, our attention is focused on the analysisof the effects of CFO and TO on the amplitude and phaseof the demodulated symbol by using a straightforward butsuboptimum projection receiver [2, 9, 10, 12, 13] in hexag-onal multicarrier modulation systems. Furthermore, effectsof CFO, TO, and channel spread factor on the performanceof SINR in hexagonal multicarrier modulation systems arefurther discussed. The exact SINR expression versus CFO,TO, and channel spread factor is derived. Both theoreticalanalysis and simulation results show that similar degradationon symbol amplitude and phase caused by insufficient syn-chronization is incurred as in CP-OFDM transmission. Ourtheoretical analysis is confirmed by numerical simulations,showing that HMM systems outperform traditional CP-OFDM systems with respect to SINR against ISI/ICI causedby CFO, TO, and doubly dispersive channel.

2. Signal Transmission and TF Lattice

It is shown in [16, 19] that signal transmission can beviewed as tiling of the TF plane. In practice, almost allcommunication systems transmit the information symbolsin a regular way, and the underlying tiling forms a latticein the TF plane. In a nutshell, a lattice V in K-dimensionalEuclidean space RK is a set of points arranged in a highlyregular manner. Since we consider the signal transmissionin the TF plane in this correspondence, we only confine ourattention to two-dimensional (2D) case.

Specifically, in OFDM system with symbol period Tand subcarrier separation F, the transmission functions ofOFDM system consist of translations and modulations of asingle prototype g(t), which constitute a Weyl-Heisenbergsystem and create a 2D rectangular lattice with generatormatrix

V =[T 0

0 F

]. (1)

Conventional time-division multiplex (TDM) mode andfrequency-division multiplex (FDM) mode can be viewed astransmission on a one-dimensional (1D) lattice along the

time axis and frequency axis, with generators[T 0

]Tand

[F 0

]T, respectively, where the superscript (·)T represents

the transpose.

The lattice density is given by ρ = 1/√

det(VTV),where det(·) denotes the determinant. The quantity ρcorresponds to the symbol density in the TF plane, whichwas known as signaling efficiency to represent the number ofsymbols per second per hertz. For signal transmission withgeneral transmission pattern V, the transmitted signal can beexpressed as

s(t) =∑

z

czg(t, Vz

), (2)

where cz is the data symbol indexed by z, which is usuallytaken from a specific signal constellation and assumed tobe independent and identically distributed (i.i.d.) with zeromean and average power σ2

c ; g(t, Vz) is the modulation pulseassociated with cz and z = [m,n]T, m ∈ M, n ∈ N , whilem and n can be regarded as the generalized time index andsubcarrier index, respectively. Moreover, M and N denotethe sets from which m, n can be taken.

It is well known that when a signal is transmitted overmobile radio channel, the energy of one symbol data willspread out to neighboring symbols due to the time andfrequency dispersion, which produces ISI/ICI and degradesthe system performance. In the view of signal transmissionon lattice in the TF plane, the system performance is mainlydetermined by two factors:

(i) the time-frequency localization of pulse shape g(t);

(ii) the distance between adjacent time-frequency latticepoint.

A better TF-concentrated pulse would lead to morerobustness against the energy leakage. It is obvious thatthe larger the distance, the less the perturbation among thetransmitted symbols.

3. Hexagonal Multicarrier TransmissionSystem [16]

It is well known that the Gaussian pulse

gσ(t) =(

2σ

)1/4

e−(π/σ)t2 (3)

has the best energy concentration in the sense that itachieves the equality in the Heisenberg uncertainty principleWtWf ≥ 1/4π, where W2

t and W2f are the centralized

temporal and spectral second-order moments, respectively[20]. By the Heisenberg uncertainty principle, any signalcannot be arbitrarily concentrated in the time and frequencydirections simultaneously, which suggests that they mustoccupy some area in the TF plane.

The product WtWf characterizes the energy concentra-tion of a pulse in the TF plane. The smaller the value ofWtWf is, the more concentrated the pulse will be. Hence,


the Gaussian pulse is the natural choice as modulationwaveform, in an attempt to achieve minimum energy pertur-bation over TF dispersive channels. Note that the parameterσ determines the energy distribution of the Gaussian pulse inthe time and frequency directions, respectively. To be morespecific, we have σ =Wt/Wf .

The ambiguity function of prototype is defined by

Agσ (τ, υ) =∫∞

−∞gσ(t)gσ∗(t − τ)e− j2πυt dt

= e−(π/2)((1/σ)τ2+σv2)e− jπτv,

(4)

where (·)∗ denotes the complex conjugate. It can be viewedas the 2D correlation between g(t) and its shifted version by τin time and υ in frequency in the TF plane. We can concludefrom (4) that the ambiguity function of the Gaussian pulse isan ellipse in the TF plane.

As pointed out in [16], for a given signaling efficiency, theinformation-bearing pulses arranged on a hexagonal latticecan be separated as sufficiently as possible in the TF plane.An example of hexagonal transmission pattern

Vhex =

⎡⎢⎢⎢⎣

T

20

F

2F

⎤⎥⎥⎥⎦ (5)

which is named as hexagonal multicarrier modulation isillustrated in Figure 1.

For signal transmission with general transmission pat-tern Vhex, the transmitted signal can be expressed as

s(t) =∑

z

czgσ(t, Vhexz

)

=∑

m

∑

n

cm,ngσ(t −mT

2

)e j2π(m+2n)(F/2)t,

(6)

where g(t, Vhexz) is the modulation pulse associated withcz. The signaling efficiency can be easily calculated as ρ =1/√

det(VThexVhex) = 2/TF.

It is shown in [16] that the symbol energy perturbationfunction is dependent on the channel scattering function andthe pulse shape. To optimally mitigate ISI/ICI caused by themobile radio channels, the choices of modulation pulse andlattice generate matrix parameters should be matched to themaximum multipath delay spread and Doppler shift. Theoptimal system parameters for TF dispersive channels withexponential-U scattering function can be chosen as [16]

σ = Wt

Wf= α

τrms

fd= √3

T

F,

σ = Wt

Wf= α

τrms

fd= 1√

3T

F.

(7)

Time

Freq

uen

cy

F

T

Figure 1: Partition of the hexagonal lattice into a rectangularsublattice Vrect1 (denoted by ) and its coset Vrect2 (denoted by ).

The baseband doubly dispersive channel can be modeledas a random linear operator H :

H[s(t)] =∫ τmax

0

∫ fd

− fdH(τ, υ)s(t − τ)e j2πυt dτ dυ, (8)

where H(τ, υ) is called the delay-Doppler spread function,which is the Fourier transform of the time-varying impulseresponse of the channel h(t, τ) with respect to t. Moreoverτmax and fd are the maximum multipath delay spread andthe maximum Doppler frequency, respectively. In wide-sensestationary uncorrelated scattering (WSSUS) assumption, thechannel is characterized by the second-order statistics:

E[H(τ, υ)H∗(τ1, υ1

)] = SH(τ, υ)δ(τ − τ1

)δ(υ − υ1

), (9)

where E[·] denotes the expectation and SH(τ, υ) is called thescattering function, which characterizes the statistics of theWSSUS channel. We assume that E[H(τ, υ)] = 0. Without

loss of generality, we use∫ τmax

0

∫ fd− fd SH(τ, υ)dτ dυ = 1, which

means that the channel has no overall path loss.It is shown in Figure 1 that the original hexagonal lattice

can be expressed as the disjoint union of a rectangularsublattice Vrect1 and its coset Vrect2. The transmitted signal(6) can be expressed as

s(t)=∑

m

∑

n

[c1m,n g(t −mT)e j2πnFt

+ c2m,n g

(t −

(m +

12

)T)e j2π(n+1/2)Ft

]

=∑

m

∑

n

[c1m,n g

1m,n(t) + c2

m,n g2m,n(t)

],

(10)

where c1m,n and c2

m,n represent the symbols coming from Vrect1

and Vrect2, respectively.The received signal is

r(t) = H[s(t)] + n(t), (11)


where n(t) is the AWGN having variance σ2n . To obtain the

data symbol cim,n, the receiver [2, 9, 10, 12, 13] projects ongim,n(t), i = 1, 2, that is,

cim,n =⟨r(t), gim,n(t)

⟩

= ⟨H[s(t)], gim,n(t)⟩

+⟨n(t), gim,n(t)

⟩

=∑

j

∑

m′,n′cjm′,n′

⟨H[gjm′,n′(t)

], gim,n(t)

⟩+⟨n(t), gim,n(t)

⟩.

(12)

4. Effects of Nonideal Transmission Conditions

Without loss of generality, we assume a timing offset Δtand carrier frequency offset Δ f ; the received data symbolby using a projection receiver [2, 9, 10, 12, 13] can beexpressed as

cim,n =⟨e j2πΔ f tr(t), gim,n(t − Δt)⟩

=∑

m′,n′cim′,n′

⟨e j2πΔ f tH

[gim′,n′(t)

], gim,n(t − Δt)⟩

+⟨n(t), gim,n(t − Δt)⟩

= cim,n

⟨e j2πΔ f tH

[gim,n(t)

], gim,n(t − Δt)⟩

ξN+I ⇐=

⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

+∑

m′ /=m, n′ /=ncim′,n′

⟨e j2πΔ f tH

[gim′,n′(t)

], gim,n(t−Δt)⟩

+∑

m′,n′, j /= icjm′,n′

⟨e j2πΔ f tH

[gjm′,n′(t)



= e− j2π(Δ f (m+(i−1)/2)T+(n+(i−1)/2)FΔt)

× cim,nAH(τmax, fd,Δt,Δ f

)+ ξN+I ,

(13)

where

AH(τmax, fd,Δt,Δ f

)

=∫ τmax

0

∫ fd

− fdH(τ, υ)A∗g (τ + Δt, υ + Δ f )

· e j2πυ(m+(i−1)/2)Te− j2π(n+(i−1)/2)Fτ dτ dυ.(14)

The demodulated signal now consists of a useful portionand disturbances ξN+I caused by ISI, ICI, and AWGN.Concerning the useful portion, the transmitted symbols cim,nare attenuated by AH(τmax, fd,Δt,Δ f ) which is caused bydoubly dispersive channel, timing offset, and carrier fre-quency offset. Meanwhile, the transmitted symbols rotatedby a time-variant phasor

φ = − j2π(Δ f(m +

(i− 1)2

)T +

(n +

(i− 1)2

)FΔt

).

(15)

5. Effects of TO, CFO, and DD Channelson SINR

The energy of received signal with TO and CFO over DDchannels can be expressed as

Er(Δt,Δ f ) =E{∣∣∣∣∣

∑

m′,n′cim′,n′

⟨e j2πΔ f tH

[gim′,n′(t)



∣∣∣∣∣

2}.

(16)

Using the assumption of transmitted symbols and theWSSUS channel, we get from (16) that

Er(Δt,Δ f )

= σ2c

∫

τ

∫

vSH(τ, υ)

×[∑

m,n

(∣∣Ag(mT + τ − Δt,nF + υ + Δ f )

∣∣2

+∣∣∣∣Ag((m+

12

)T+τ −Δt,

(n+

12

)F+υ+Δ f

)∣∣∣∣2)]dτ dυ

+ σ2n

∣∣Ag(0, 0)∣∣.

(17)

Let Es(Δt,Δ f , τrms, fd) denote the signal energy

Es(Δt,Δ f , τrms, fd

)

= σ2c

∫

τ

∫

vSH(τ, υ)

∣∣Ag(τ−Δt, υ+Δ f )∣∣2dτ dυ.

(18)

Moreover, let EN(Δt,Δ f , τrms, fd) denote the interference-plus-noise energy

EN(Δt,Δ f , τrms, fd

)

= σ2c

∫

τ

∫

vSH(τ, υ)

×

⎡⎢⎣

∑

z=[m,n]T/= [0,0]T

(∣∣Ag(mT + τ − Δt,nF + υ + Δ f )

∣∣2

+∣∣∣∣Ag

((m +

12

)T + τ−Δt,

(n +

12

)F

+υ+Δ f)∣∣∣∣

2)]

dτ dυ+σ2n

∣∣Ag(0, 0)∣∣.

(19)

We consider a DD channel with exponential delay powerprofile and U-shape Doppler power spectrum; the scatteringfunction [21]

SH(τ, υ) = 1√

1− (υ/ fd)2

(τrms

)e−τ/τrms

(1/π fd

) (20)

with τ ≥ 0, |υ| < fd, where τrms denotes the rms delay spreadand fddenotes the maximal Doppler spread.


0

1

2

3

4

5

6

7

8

9

10

SIN

R(d

B)

−0.5 0 0.5

Timing offset

SNR = 10 dB

HMM τrms fd = 0.02HMM τrms fd = 0.01

HMM τrms fd = 0.005OFDM τrms fd = 0.02

OFDM τrms fd = 0.01OFDM τrms fd = 0.005

0

2

4

6

8

10

12

14

16

18

SIN

R(d

B)

−0.5 0 0.5

Timing offset

SNR = 20 dB

0

2

4

6

8

10

12

14

16

18

20

SIN

R(d

B)

−0.5 0 0.5

Timing offset

SNR = 30 dB

Figure 2: SINR for hexagonal multicarrier system for Δt ∈ [−0.5, 0.5].

Upon substituting the scattering function (20) into (18)and (19), we have

Es(Δt,Δ f , τrms, fd

)

= σ2c

πτrms fd

∫∞

0e−τ/τrmse−(π/σ)(τ−Δt)2

dτ∫ fd

− fd

e−σπ(υ+Δ f )2

√1− (υ/ fd)

2dυ,


)

= σ2c

πτrms fd

{∑

(m,n) /= (0,0)

∫∞

0e−τ/τrmse−π(mT+τ−Δt)2/σ dτ

×∫ fd

− fd

e−σπ(nF+υ+Δ f )2

√1− (υ/ fd

)2dυ

+∑

(m,n) /= (0,0)

∫∞

0e−τ/τrmse−π((m+1/2)T+τ−Δt)2/σ dτ

×∫ fd

− fd

e−σπ((n+1/2)F+υ+Δ f )2

√1−(υ/ fd

)2dυ

}+σ2

n

∣∣Ag(0, 0)∣∣.

(21)

SINR of received signal can be expressed as

RSIN(Δt,Δ f , τrms, fd

) = Es(Δt,Δ f , τrms, fd

)


) . (22)

Plugging (21) into (22), we find

RSIN(Δt,Δ f , τrms, fd

)

= σ2c

πτrms fd

∫∞

0e−τ/τrms−((π/σ)(τ−Δt)2) dτ

∫ fd

− fd

e−σπ(υ+Δ f )2

√1−(υ/ fd

)2dυ

·(

σ2c

πτrms fd

{∑

(m,n) /= (0,0)

∫∞

0e−τ/τrms−(π(mT+τ−Δt)2/σ) dτ

×∫ fd

− fd

e−σπ(nF+υ+Δf)2

√1− (υ/ fd

)2dυ

+∑

(m,n) /= (0,0)

∫∞

0e−τ/τrms−(π((m+1/2)T+τ−Δt)2/σ)dτ

×∫ fd

− fd

e−σπ((n+1/2)F+υ+Δ f )2

√1− (υ/ fd

)2dυ

}

+ σ2n

∣∣Ag(0, 0)∣∣)−1

.

(23)

Equation (23) indicates that RSIN(Δt,Δ f , τrms, fd) can bemodeled as a function of CFO, TO, and channel spread factorτrms fd.


3

4

5

6

7

8

9

10

SIN

R(d

B)

−0.5 0 0.5

Frequency offset

SNR = 10 dB

HMM τrms fd = 0.02HMM τrms fd = 0.01

HMM τrms fd = 0.005OFDM τrms fd = 0.02

OFDM τrms fd = 0.01OFDM τrms fd = 0.005

4

6

8

10

12

14

16

18

SIN

R(d

B)

−0.5 0 0.5

Frequency offset

SNR = 20 dB

4

6

8

10

12

14

16

18

20

SIN

R(d

B)

−0.5 0 0.5

Frequency offset

SNR = 30 dB

Figure 3: SINR for hexagonal multicarrier system for Δ f ∈ [−0.5, 0.5].

6. Numerical Results and Disscussion

Here, we examine the SINR performance of hexagonalmulticarrier systems over a DD channel. All experimentsemployed N = 80, σ = T/

√3F hexagonal multicarrier

system, and τrms/ fd of DD channel is fixed. Obviously,the hexagonal transmission pattern is fixed while the rmsdelay spread and the maximal Doppler spread increasesimultaneously with the increasing of channel spread factorτrms fd. The center carrier frequency is set to fc = 5 GHzand the sampling intervals Ts = 10−6 s, F = 25 kHz, andT = 1× 10−4 s. Δt in all simulation results are normalized toT/2, and Δ f are normalized to F/2.

We fixed Δ f to 0 and the product τrms fd to 0.02, 0.01,and 0.005. We repeat this simulation for a variety of valuesSNR in the range of 10 dB∼ 30 dB. The result is shown inFigure 2. We see that the power of the ISI and ICI causedby TO strongly depends on the channel spread factor ofDD channel. The maximum SINR timing decreases with theproduct τrms fd increasing, and the timing offset Δt increasesas the product τrms fd increases, that is, there is a delaybetween the maximum SINR timing and the first tap ofDD channel from the SINR point of view. It can be seenfrom Figure 2 that the aforementioned delay increases asthe product τrms fd increases. HMM does an excellent job ofmaintaining high SINR. CP-OFDM with guard Ng = N/4perfectly suppresses ISI caused by TO within cyclic prefix, butdoes a poor job of combating the DD channel, and there isan SINR gap between HMM and CP-OFDM about 4 ∼ 7 dBat SNR = 30 dB.

In Figure 3, we fixed Δt to the maximum SINR timingand the product τrms fd to 0.02, 0.01, and 0.005. It is seenthat the SINR depends on the channel spread factor ofDD channel and the SINR obtains its maximum value atΔ f . Meanwhile, SINR decreases with the product τrms fdincreasing. From Figure 3, we see that HMM also does agood job of ISI/ICI suppression, and there is also an SINRgap between HMM and CP-OFDM about 4 ∼ 8 dB atSNR = 30 dB and CFO in the range of −0.5 ∼ 0.5. Effectsof both CFO and TO on the SINR performance of hexagonalmulticarrier systems and CP-OFDM systems at SNR = 30 dBare shown in Figure 4, τrms fd is set to 0.02.

The maximum SINR with the variety of SNR and τrms fdis depicted in Figure 5. A lower bound (LB) on the effectsof Doppler spread in SINR performance of OFDM system[18] is depicted for comparison. It can be seen that there isa degradation of SINR with the increasing of τrms fd. There isabout 1 dB SINR loss of HMM system with τrms fd from 0.01to 0.02 while OFDM SINR loss is about 3 dB. HMM systemdoes a good job of combating DD channel. The degradationof SINR in CP-OFDM system increases as the channel spreadfactor increases.

7. Conclusion

This paper examines the effects of insufficient synchroniza-tion on the amplitude and phase of the demodulated symbolby using a projection receiver in hexagonal multicarriermodulation systems. Furthermore, effects of CFO, TO, and


−5

0

5

10

15

20

SIN

R(d

B)

−0.5 −0.250

0.250.5

Timing offset −0.5 −0.3 −0.10.1 0.3 0.5

Frequency offset

HMMCP-OFDM

Figure 4: Effects of TO and CFO on SINR for hexagonalmulticarrier system with τrms fd = 0.02 at SNR = 30 dB.

−10

−5

0

5

10

15

20

SIN

R(d

B)

0 5 10 15 20 25 30 35 40

SNR (dB)

HMM τrms fd = 0.02HMM τrms fd = 0.01HMM τrms fd = 0.005OFDM τrms fd = 0.02OFDM τrms fd = 0.01

OFDM τrms fd = 0.005LB[18] τrms fd = 0.02LB[18] τrms fd = 0.01LB[18] τrms fd = 0.005

Figure 5: Effects of channel spread factor on SINR for hexagonalmulticarrier system.

channel spread factor on the performance of SINR in hexag-onal multicarrier modulation systems are further discussed.The exact SINR expression versus insufficient synchroniza-tion and channel spread factor is derived. Both theoreticalanalysis and simulation results show that similar degradationon symbol amplitude and phase caused by insufficient syn-chronization is incurred as in common OFDM transmission:(1) CFO and TO introduce interference among subcarriersand symbols; (2) the transmitted symbols experience anamplitude reduction and a time variant phase shift due toCFO; (3) the transmitted symbols are attenuated and rotatedby a phasor whose phase is proportional to the subcarrier

index and TO; (4) the SINR of received symbols decreasesas the channel spread factor increases. Our theoreticalanalysis is confirmed by numerical simulations, showing thatHMM systems outperform traditional CP-OFDM systemswith respect to SINR against ISI/ICI caused by insufficientsynchronization and doubly dispersive channel.

Acknowledgment

This research was supported by China National Science Fundunder Contract no. 60772083.

References

[1] T. S. Rappaport, A. Annamalai, R. M. Buehrer, and W. H.Tranter, “Wireless communications: past events and a futureperspective,” IEEE Communications Magazine, vol. 40, no. 5,pp. 148–161, 2002.

[2] W. Kozek and A. F. Molisch, “Nonorthogonal pulseshapes formulticarrier communications in doubly dispersive channels,”IEEE Journal on Selected Areas in Communications, vol. 16, no.8, pp. 1579–1589, 1998.

[3] D. Schafhuber, G. Matz, and F. Hlawatsch, “Pulse-shapingOFDM/BFDM systems for time-varying channels: ISI/ICIanalysis, optimal pulse design, and efficient implementa-tion,” in Proceedings of the 13th IEEE International Sympo-sium on Personal, Indoor and Mobile Radio Communications(PIMRC ’02), vol. 3, pp. 1012–1016, Lisbon, Portugal, Septem-ber 2002.

[4] R. Haas and J.-C. Belfiore, “A time-frequency well-localizedpulse for multiple carrier transmission,” Wireless PersonalCommunications, vol. 5, no. 1, pp. 1–18, 1997.

[5] B. Le Floch, M. Alard, and C. Berrou, “Coded orthogonalfrequency division multiplex [TV broadcasting],” Proceedingsof the IEEE, vol. 83, no. 6, pp. 982–996, 1995.

[6] H. Bolcskei, P. Duhamel, and R. Hleiss, “Design of pulse shap-ing OFDM/OQAM systems for high data-rate transmissionover wireless channels,” in Proceedings of the IEEE InternationalConference on Communications (ICC ’99), vol. 1, pp. 559–564,Vancouver, Canada, June 1999.

[7] S. Mirabbasi and K. Martin, “Overlapped complex-modulatedtransmultiplexer filters with simplified design and superiorstopbands,” IEEE Transactions on Circuits and Systems II, vol.50, no. 8, pp. 456–469, 2003.

[8] P. Martin, F. Cruz-Roldan, and T. Saramaki, “A windowingapproach for designing critically sampled nearly perfect-reconstruction cosine-modulated transmultiplexers and filterbanks,” in Proceedings of the 3rd International Symposium onImage and Signal Processing and Analysis (ISPA ’03), vol. 2, pp.755–760, Rome, Italy, September 2003.

[9] T. Strohmer and S. Beaver, “Optimal OFDM design fortime-frequency dispersive channels,” IEEE Transactions onCommunications, vol. 51, no. 7, pp. 1111–1122, 2003.

[10] K. Liu, T. Kadous, and A. M. Sayeed, “Orthogonal time-frequency signaling over doubly dispersive channels,” IEEETransactions on Information Theory, vol. 50, no. 11, pp. 2583–2603, 2004.

[11] V. Kumbasar and O. Kucur, “ICI reduction in OFDM systemsby using improved sinc power pulse,” Digital Signal Processing,vol. 17, no. 6, pp. 997–1006, 2007.


[12] P. Jung and G. Wunder, “The WSSUS pulse design problem inmulticarrier transmission,” IEEE Transactions on Communica-tions, vol. 55, no. 10, pp. 1918–1928, 2007.

[13] S. Das and P. Schniter, “Max-SINR ISI/ICI-shaping multicar-rier communication over the doubly dispersive channel,” IEEETransactions on Signal Processing, vol. 55, no. 12, pp. 5782–5795, 2007.

[14] M. Ma, B. Jiao, and W. C. Y. Lee, “A dual-window techniquefor enhancing robustness of OFDM against frequency offset,”IEEE Communications Letters, vol. 12, no. 1, pp. 17–19, 2008.

[15] G. Lin, L. Lundheim, and N. Holte, “Optimal pulses robustto carrier frequency offset for OFDM/QAM systems,” IEEECommunications Letters, vol. 12, no. 3, pp. 161–163, 2008.

[16] F.-M. Han and X.-D. Zhang, “Hexagonal multicarrier mod-ulation: a robust transmission scheme for time-frequencydispersive channels,” IEEE Transactions on Signal Processing,vol. 55, no. 5, part 1, pp. 1955–1961, 2007.

[17] P. Jung and G. Wunder, “On time-variant distortions in multi-carrier transmission with application to frequency offsets andphase noise,” IEEE Transactions on Communications, vol. 53,no. 9, pp. 1561–1570, 2005.

[18] X. Cai and G. B. Giannakis, “Bounding performance and sup-pressing intercarrier interference in wireless mobile OFDM,”IEEE Transactions on Communications, vol. 51, no. 12, pp.2047–2056, 2003.

[19] J. H. Conway and N. J. A. Sloane, Sphere Packings, Lattices andGroups, Springer, New York, NY, USA, 2nd edition, 1993.

[20] L. Cohen, Time-Frequency Analysis, Prentice-Hall, EnglewoodCliffs, NJ, USA, 1995.

[21] M. Patzold, Mobile Fading Channels, John Wiley & Sons, WestSussex, UK, 2002.


Research Article

Multiple CFOs in OFDM-SDMA Uplink: InterferenceAnalysis and Compensation

Malte Schellmann and Volker Jungnickel

Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut, Einsteinufer 37, 10587 Berlin, Germany

Correspondence should be addressed to Malte Schellmann, [email protected]

Received 1 July 2008; Revised 14 November 2008; Accepted 11 March 2009

Recommended by Erdal Panayirci

In OFDM-based space division multiple access (SDMA) systems, multiple users are served by a multiantenna base stationsimultaneously on the same frequency resources. In the uplink, each user’s signal may be distorted by an independent carrierfrequency offset (CFO), which impairs the orthogonality of the subcarrier signals and, if not properly compensated, results inperformance degradations. We analyze the influence of multiusers’ CFOs on the signal transmission in the OFDM-SDMA uplinkand derive suitable bounds for the achievable signal-to-interference conditions. By modifying the signal model suitably, we developa simple scheme for partial compensation of the CFO distortions. It allows to maintain the subcarrier-wise channel equalizationand thus is well suited to be applied for a real-time system implementation. However, as CFOs impair the cyclic structure of theOFDM symbols, our scheme is not able to compensate for the entire distortion. The remaining interference is treated as additionalnoise, which limits the supported size of the CFOs.

Copyright © 2009 M. Schellmann and V. Jungnickel. This is an open access article distributed under the Creative CommonsAttribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work isproperly cited.

1. Introduction

A promising solution to lead wireless communication sys-tems toward high spectral efficiencies is the combinationof the orthogonal frequency division multiplexing (OFDM)together with the space-division multiple access (SDMA)technique [1]. In the SDMA uplink, multiple users commu-nicate simultaneously with a multiantenna base station (BS)on the same frequency resources by transmitting their signalson different spatial layers. OFDM is a favored techniquefor the transmission in frequency-selective channels, asit facilitates the equalization process while at the sametime enabling high spectral efficiencies. However, one ofits deficiencies is its high sensitivity towards time-variantdistortions. In general, these destroy the orthogonality of thesingle subcarrier signals and give rise to the so-called inter-carrier interference (ICI), limiting the achievable systemperformance [2, 3]. One source for time-variant distortionsis the carrier frequency offset (CFO), owing to a mismatchbetween the oscillators at the transmitter and receiver sides.While estimation and compensation of CFO distortions in

a single user link are fairly easy and conveniently solved [4–6], coping with different CFOs from multiple users in anyOFDM-based multiuser uplink is much more challenging,as all CFOs need to be estimated independently, and theconventional techniques for compensation do not apply.

The influence of CFOs from multiple users in an OFDM-based uplink has been studied extensively in the contextof OFDMA systems, where simultaneous access is grantedto multiple users by individually assigning distinct setsof subcarriers to them [7–9]. An overview of existingsynchronization techniques together with a sound summaryof the general requirements for uplink synchronization canbe found in [10]. Estimation of multiple users’ CFOs canbe performed based on blind techniques exploiting specificproperties of the utilized OFDM signals and their statistics[11–15] or based on pilot-based techniques [16, 17]. ForCFO compensation, the simplest approach is to feed backthe estimated CFO to the corresponding user terminal,so that it may adapt its oscillator accordingly or apply aprecompensation to its transmit signal [11]. However, thedrawback of this feedback approach is that large delays may


occur before the CFOs are properly compensated. There alsoexist some proposals for CFO compensation to be carriedout directly at the receiver by adequate signal processing.These approaches are either based on the inversion of a highdimensional matrix representing the ICI-affected channel fora complete OFDMA symbol [18, 19], or they make use ofsuccessive interference cancelation techniques [20], whichmay be performed in an iterative fashion [21]. Unfortunately,all these approaches result in a significant increase of compu-tational complexity compared to common OFDM process-ing, whose favorable property is to enable an independentsubcarrier-wise processing. Although the complexity of theaforementioned approaches based on matrix inversion canbe further reduced if specific properties of the signal modelare exploited [15, 22, 23], it still remains considerable. Asuboptimum solution maintaining the subcarrier-wise signalprocessing at the receiver is presented in [24]. The usersignals are separated first, whereafter they are individuallycompensated for their user-specific CFO. Although not allICI can be removed, a satisfactory performance is achieved ifthe CFOs do not become too large.

The major difference in OFDM-SDMA systems is that thechannel is enhanced by a spatial dimension. To separate theusers’ signals, knowledge of the SDMA channel per subcar-rier is required. With CFO distortions present, solutions toestimate the SDMA channel have been proposed in [25, 26];joint estimation of SDMA channels and the users’ CFOs canbe found in [27–29]. Contributions [26, 28] also provideapproaches to compensate for the CFO distortions at thereceiver, which, however, have complexity demands that aresimilar to the OFDMA techniques mentioned earlier.

The work in this paper was motivated by seeking fora simple receiver-based CFO compensation method for theuplink of an OFDM-SDMA system. Hereby, the subcarrier-wise channel equalization is supposed to be maintained tofacilitate implementation in a real-time system. Therefore,we resort to the basic idea from [24] and develop asystem concept where the user signals are first separated bycommon OFDM-SDMA equalization and compensated fortheir individual CFO distortions afterwards. As this approachis clearly suboptimum, the major focus of our work liesin the proper analysis of the achievable signal conditionswith respect to the amount of interference that remainsin the system after such compensation. In particular, wederive closed-form expressions characterizing the bounds forthe signal-to-interference ratio (SIR) before and after CFOcompensation, which are verified by numerical bit-error rateanalysis. This way, we obtain insights into the suitability ofthe approach and reveal the limits of its application range.

Based on our results, it turns out that the proposed CFOcompensation concept operates conveniently only if the sizeof the CFOs present in the system can be kept below a fewpercent of the subcarrier spacing. Therefore, the approachhas to be seen as a technique for fine-synchronization.Correspondingly, a coarse-frequency synchronization of allusers’ signals has to be ensured. This coarse synchronizationcan be achieved by a frequency-advance, where terminalsprecompensate their signals with the CFO estimated inthe downlink phase. The concept of frequency-advance

was recently realized in a practical system, as reportedin [30]. In [31], we already presented the basic idea ofthis work and initial analytical results. Here we extendthe analysis to support linear receivers providing spatialdiversity gains, add the case of noncompensated CFOs forillustrative comparison and provide a refined update of theCFO compensation process to be carried out in frequencydomain, which facilitates implementation.

The paper is structured as follows: Section 2 introducesthe OFDM signal model based on vector notation. As apreparation for analysis of the OFDM-SDMA system, wedetermine the SIR conditions for a single antenna OFDMlink in Section 3. Hereafter, the model is modified to formthe basis for the simplified CFO compensation process inOFDM-SDMA systems. In Section 4, we analyze the SIRconditions in the OFDM-SDMA system and derive boundsfor the two cases where CFOs are compensated according tothe proposed scheme and where they are not. These boundsare verified by simulation results in Section 5.

2. Signal Model

Notation. We use bold capital letters to denote matrices andbold letters for vectors. Scalars are written in italics. (·)Hand (·)∗ denote conjugate transpose and conjugate operator,respectively. tr(·) refers to the trace operator. diag(x)represents a diagonal matrix, whose diagonal is constitutedof vector x. E{·} denotes the expectation operator.

2.1. Vector Notation of OFDM. Consider an OFDM systemwith a total of N subcarriers. The transmission equation fora CFO-distorted single-input single-output (SISO) link isgiven by

y = FP2C(ϕ)

HP1FH · x, (1)

x is the data vector comprising the N data symbols consti-tuting the OFDM symbol, F is the N × N discrete Fouriertransform (DFT) matrix, and P1 and P2 are permutationmatrices used to append and cut the cyclic prefix (CP) oflength Ng samples. Further, H is the (N + Ng) × (N +Ng) Toeplitz channel matrix constituted from the channelimpulse response (CIR) hl, l ∈ {0, . . . ,L}, where L ≤ Ng .Finally,

C(ϕ) = diag

([exp

(− jϕNg

)· · · exp

(jϕ(N − 1)

)])(2)

is the CFO distortion matrix, where the phase rotation factorϕ is defined as ϕ = 2πω/N , with ω ∈ [−0.5, 0.5] being theCFO normalized to the subcarrier spacing. For ϕ = 0 (noCFO), the effective channel

FP2HP1FH = Λ (3)

yields a diagonal matrix, whose elements on the diagonalλk, k ∈ {1, . . . ,N}, represent the N-point DFT of the CIRhl. By a few simple transformations, the diagonal matrix Λcan be restored in (1), yielding

y = FC(ϕ)

FHFP2HP1FH · x = UΛ · x, (4)


where we introduced

C(ϕ) = diag

([1 exp

(jϕ) · · · exp

(jϕ(N − 1)

)]),

U = FC(ϕ)

FH.(5)

Note that F is unitary thus FHF equals the identity matrix I.

2.2. OFDM-SDMA Signal Model. Next the focus is turned toan OFDM-SDMA system, where Q single-antenna terminalstransmit their signals simultaneously to an M-antennabase station on the same frequency resource. The users’transmission signals propagate via different paths and willbe marked with different spatial signatures, which enablethe multiantenna receiver to separate and recover the users’transmission signals.

For the system model, the OFDM signal vectors xq, q ∈{1, . . . ,Q} from the Q users are stacked into one largevector x of dimension QN . Correspondingly, the M OFDMreception vectors ym are stacked into one large vector y.Each user may have an individual CFO, resulting in Qdifferent CFO distortion matrices Uq, which are generatedfrom individual phase factors ϕq = 2πωq/N . For simplicity,let us assume the number of users to be Q = 2. Based on thesignal model in (4), the transmission equation in the OFDM-SDMA system reads

⎛⎜⎜⎝

y1...

yM

⎞⎟⎟⎠

︸︷︷︸y

=

⎛⎜⎜⎝

U1Λ11 U2Λ12...

...U1ΛM1 U2ΛM2

⎞⎟⎟⎠

︸︷︷︸HC

·(

x1

x2

)

︸︷︷︸x

, (6)

where each of the single user/receive antenna links ischaracterized by its own diagonal channel matrix Λmq.

2.3. Statistical Channel Model. Within this paper, we willassume Rayleigh-fading conditions for the discrete CIR,meaning that the channel coefficients hl are drawn inde-pendently from complex Gaussian distributions with meanpower σ2

l . For l ∈ {0, . . . ,L}, σ2l = E{|hl|2} represents

the power delay profile (PDP) of the channel, which isassumed to be monotonically decreasing for increasing l.Furthermore, we assume the channel to be passive, that is,the sum of the mean powers of all channel coefficients isequal to unity,

∑Ll=0σ

2l = 1. To specify suitable bounds

within our analysis, we will frequently use a uniform PDPwith constant power for all channel taps, which is definedas σ2

l = 1/(L + 1) for all l. From these assumptions, itfollows for the subcarrier channels λk that they behave likerandom variables which are drawn from complex Gaussiandistributions with unit power. The correlation between thechannels at adjacent subcarrier positions is characterized bythe frequency-domain autocorrelation function r(κ), κ ∈{0, . . . ,N − 1}, where κ refers to the distance betweensubcarriers. r(κ) is obtained from the N-point DFT of thePDP, that is,

r(κ) =L∑

l=0

σ2l exp

(− j2πlκ

N

), κ ∈ {0, . . . ,N − 1}. (7)

In the OFDM-SDMA system, the channels of the QMsingle antenna links are characterized by the same statisticalproperties, but are assumed to be statistically uncorrelated.In particular, we assume all channels to have identicalchannel length L and identical PDP, which may be reasonablefor user terminals experiencing non-line-of-sight (NLOS)multipath fading.

3. Analysis of Single-Antenna OFDM Link

To prepare analysis of the SIR conditions in the OFDM-SDMA system, we focus in this section on a separatesingle-antenna OFDM link. In the following, we analyze theimpact of CFO distortions and derive a bound for the SIR(Section 3.1). To enable a simplified equalization process inthe OFDM-SDMA system, where the user signals are firstspatially separated and thereafter individually compensatedfor their CFO distortions, we modify this signal modelaccordingly (Section 3.2). This model introduces an addi-tional error term, which cannot be compensated by simplemeans. Hence, its power and the resulting SIR conditions areanalyzed in Section 3.3. The proper process for partial com-pensation of the CFO distortions after channel equalizationis then presented in Section 3.4.

3.1. Impact of CFO Distortions. In (4), matrix U = FC(ϕ)FH

is a circular matrix, whose rows are circularly shifted versionsof u(κ) being the DFT of the diagonal in C(ϕ), that is,

u(κ) = 1N

N−1∑

n=0

exp(j2πω

n

N

)exp

(j2π

κn

N

)(8)

with κ ∈ {0, . . . ,N − 1}. The aforementioned expressionrepresents a geometric series, and hence it can be simplifiedto [32]

u(κ) = 1N

exp(jπ(ω + κ)

N − 1N

)sin(π(ω + κ))

sin(π(ω + κ)/N). (9)

As the DFT is periodic, the definition range may be changedto κ ∈ {−N/2, . . . ,N/2 − 1}. By doing so, we can use anapproximation for large N based on the si-function si(x) =sin(x)/x, so that u(κ) can be given as

u(κ) = (−1)κ exp(jπω

) · si(π(ω + κ)). (10)

Multiplying the circular matrix U with a frequency-domain signal vector represents a cyclic convolution of thatsignal vector with function u(κ), which introduces the ICI.For κ /= 0, u(κ) specifies the amount of ICI, that is, inducedon any subcarrier from a subcarrier signal which is spaced κsubcarriers apart. u(0) itself represents the attenuation of thepower of each subcarrier signal. From (10), we observe thatmultiplication with function u(κ) imposes a constant phaserotation exp( jπω) on all subcarrier signals. This constantphase factor corresponds to the mean CFO-induced phaserotation observed over the total duration of the time-domainOFDM symbol of N samples length. It is also referred toas the common phase error (CPE) induced by the CFOdistortions.


Next, we will examine the mean power of the ICI and theresulting SIR. From (4), the received signal yk at subcarrierposition k ∈ {1, . . . ,N} can with the aforementioned resultsbe written as

yk = u(0)λkxk +N∑

j=1, j /= ku(j − k)λjxj , (11)

where xk denotes the transmit symbol at subcarrier k.The first term in the aforementioned equation denotes theuseful signal received at subcarrier k, while the second termrepresents the ICI from all other subcarriers. Let the transmitsymbols xk be independent and identically distributed (i.i.d.)with constant mean power Ps. Then, as E{|λk|2} = 1, themean power of the useful signal Pu at subcarrier k amountsto Pu = Ps|u(0)|2. Furthermore, the mean power of theICI from all other subcarriers distorting the useful signalyields PICI = Ps

∑N−1j=1 |u( j)|2, which can be upper bounded

by Ps(1− |u(0)|2). This bound is tight in case all N availablesubcarriers are occupied with data symbols. Using (10), wecan lower bound the SIR resulting from the ICI as follows[3]:

SIRICI = PuPICI

≥ si2(πω)

1− si2(πω). (12)

3.2. Modified Model for Simplified Equalization in SDMA.The common method to compensate distortions from asingle CFO is to rotate the phase ϕ in the received time-domain signal back to zero prior to any DFT operation[33]. Afterwards, the diagonal channel Λ can be equalizedsubcarrier-wise, as common in OFDM. As already men-tioned, this proceeding is not applicable in OFDM-SDMAsystems, as compensating for the CFO of a single user wouldmisalign the signal of any other user. However, to maintainthe simplified subcarrier-wise equalization OFDM systemsare favored for, it would be desirable to interchange theorder of compensation and equalization operation, so thatthe user signals can first be separated and compensated fortheir individual CFOs afterwards. This approach requiresa modification of the signal model (1), where the CFOdistortion matrix U should be moved to the right hand sideof channel matrix H. To achieve that, we insert the matrixproduct C(−ϕ)C(ϕ) = I into (1) to the right next to H andobtain

y = FP2HC(ϕ)

P1FH · x (13)

with the modified channel matrix H = C(ϕ)HC(−ϕ). Thismatrix has the same structure as the original H, but thechannel coefficients are now modified according to hl = hl ·exp( jϕl). The corresponding diagonal matrix Λ in frequencydomain results from (3) based on H, that is,

Λ = FP2HP1FH. (14)

Correspondingly, the diagonal of Λ represents the DFT of hl.To restore the diagonal matrixΛ in (13), the term P1C(ϕ)

has to be used instead of C(ϕ)P1, with C(ϕ) as defined in (5).The difference between these two terms amounts to

Γ = C(ϕ)

P1 − P1C(ϕ). (15)

Matrix Γ will in the following be denoted as the error matrix,as it represents the error that will be introduced if the twomatrix products are replaced directly. Its structure will becharacterized succeedingly. Recall that the (N + Ng) × Ndimensional matrix P1 appends a cyclic prefix of Ng samplesto the N-dimensional input vector x; hence its structurecan be described by an Ng × Ng identity matrix which islocated in the upper right corner on top of an N ×N identitymatrix, and all other elements being zero. The structure ofthe error matrix Γ thus contains mainly zeros except in itsupper right Ng × Ng submatrix, which itself is a diagonalmatrix whose diagonal is composed of the elements γn, n ∈{−Ng , . . . ,−1}, with

γn = exp(jnϕ

)(1− exp

(jNϕ

)). (16)

Plugging C(ϕ)P1 = P1C(ϕ) + Γ into (13) now yields

y = Λ · FC(ϕ)FH︸︷︷︸

U

· x + FP2HΓFH︸︷︷︸V

· x. (17)

The first part of the equation exhibits the desired signalstructure, where the location of the CFO distortion andchannel transmission operations have been interchangedcompared to (4). Thus, the suggested receiver processing canbe enabled. However, we have an additive error term Vxgenerated from the error matrix Γ. The inner product P2HΓof this term is a matrix with mainly zero elements except inits upper right L×L submatrix Vu. This submatrix is an uppertriangular matrix with the following structure:

Vu =

⎛⎜⎜⎜⎜⎜⎜⎝

γ−L · hL γ−L+1 · hL−1 · · · γ−1 · h1

0 γ−L+1 · hL γ−1 · h2...

. . ....

0 0 γ−1 · hL

⎞⎟⎟⎟⎟⎟⎟⎠. (18)

We observe that the elements in this submatrix reflect the(complex) difference of the effective channel echoes seen bythe samples in the CP and their cyclic repetition at the endof the OFDM symbol. If these two signal fractions are nolonger identical owing to the CFO, the periodic property ofthe OFDM signals is violated, resulting in interference to allsubcarrier signals. With this finding, the total CFO-inducedinterference contained in model (17) can be segregated intotwo different types: the first type is given as ICI of the originalsubcarrier signal in x, generated by the cyclic convolution inU, and the second type is given as interference caused by theviolation of the periodicity of the OFDM signals, representedin the term Vx.

If equalization and CFO compensation are carried outas described earlier, the power from Vx will remain in thesystem and distort the signal as interference. In the following,we will therefore analyze its power as well as the resulting SIRconditions.

3.3. Interference Remaining after CFO Compensation. Obvi-ously, V is the all-zero matrix if ϕ = 0 (i.e., no CFOis present) or if the channel is frequency flat (L = 0).


Otherwise, the total power contained in Vx depends on theactual number of the channel echoes L. The mean power PVcontained in this term can be calculated by

PV = tr(E{

VxxHVH}). (19)

The expression given in the argument of the trace operatorrepresents the correlation matrix Re of the error term Vx.As the elements constituting V and x, respectively, can beconsidered to stem from independent stochastic processes,we may write

Re = E{

VxxHVH}= E

{VE{

xxH}

VH}. (20)

With the i.i.d. assumption for the symbols contained in xwith mean power Ps, E{xxH} is a diagonal matrix scaled withPs. In case all subcarriers are occupied with data symbols, itequals Ps · I, and we obtain Re = Ps · E{VVH}. Inserting thematrix product constituting V from (17) and considering thestructure of the inner product P2HΓ with its submatrix Vu,the power PV yields

PV = tr(Re) = Ps · E{

tr(

VuVHu

)}. (21)

With Vu from above, we obtain

tr(

VuVHu

)= 4sin2(πω)

L∑

m=1

L∑

l=m|hl|2, (22)

where the expression 4sin2(πω) results from |γn|2. Takingthe expected value of this expression relates PV to fractionsof the channel’s PDP. Resorting to the characteristics of theconsidered channel model given in Section 2.3, we can upperbound PV according to

PV ≤ PsL · 2sin2(πω), (23)

where the relation holds with equality for a uniform PDP.Once we know the total power of the interference

generated by the error matrix Γ, we examine next howthis interference power affects the single subcarrier signals.For this purpose, we first focus on the correlation of thisadditive interference in the frequency domain. The structureof matrix Vu reveals that the interference affects only thefirst L samples of the time-domain OFDM symbol, hence theinterference in frequency domain will be highly correlated.To obtain more insight, we focus on the N ×N-dimensionaltime-domain correlation matrix, which we obtain from (20)as Re,t = FHReF = E{P2HΓΓHH

HPH

2 }Ps. As the channel tapshl in H are uncorrelated, Re,t is a diagonal matrix with itsdiagonal representing the time-domain interference powerprofile re,t(n). Only the first L elements of re,t(n) differ fromzero and are proportional to partial sums of the PDP:

re,t(n) ∼L∑

l=nσ2l ≤ 1− n

L + 1, n ∈ {1, . . . ,L}. (24)

The values of re,t(n) can be bounded according to theexpression on the right-hand side, holding with equality

for a uniform PDP. The frequency correlation matrix Re =FRe,tFH now is circular, which means the correlation betweenthe subcarriers is independent of the actual subcarrierposition k. We thus conclude that the mean interferencepower Pi that distorts each subcarrier signal amounts to

Pi(ω) = PV/N ≤ PsL · 2sin2(πω)/N , (25)

indicating that the mean interference power PV is uniformlyspread over all the subcarrier signals.

To find out the correlation of the interference over fre-quency, we can determine the frequency correlation functionre(κ), which is calculated from the N-point DFT of the time-domain interference power profile re,t(n). According to (24),re,t(n) can be represented by a linear function with slopeβ = (L + 1)−1 ≤ 0.5 which is multiplied by a rectangularwindow of width L to confine it to the specified range.The corresponding frequency correlation function re(κ) thuscan be generated by a convolution of the DFT of thatlinear function with the DFT of the rectangular window.It is quite evident that for the constrained slope β, therectangular function will dominantly influence the spread ofthe correlation function, and hence we restrict our inspectionon this component only. The DFT of the rectangular functionof width L is

re(κ) ∼ 1N

L∑

n=1

exp(j2π

κn

N

)∼ L

N· si

(πκ

L

N

). (26)

The subcarrier distance |κ|where the normalized correlationdrops down to a value below 0.5 can be estimated by

Kc =∣∣∣si−1(0.5)

∣∣∣ · NπL

≈ 0.2N

L, (27)

Kc can be interpreted as a delimiter of the region around anysubcarrier at position k where the power of the interference ishighly correlated; we thus denote it as interference correlationrange. The distance grows inversely proportional with thechannel length L; a short channel length thus results in ahigh correlation of the interference. We will see later thatthe correlation of the interference supports a simplifiedCFO compensation process, which yields an improved errorperformance.

Further, it has to be considered that the interferencecontained in the term Vx from (17) is constituted oftwo different types, which affect the signal conditions atsubcarrier position k differently. In particular, we encounterself-interference stemming from the signal at subcarrier kitself, which is represented by the diagonal elements in V,and ICI-like distortion stemming from all other subcarriersignals, which is represented by the off-diagonal elementsof V. As the transmit symbols in x are assumed i.i.d., theICI can be assumed to be uncorrelated with the signal atsubcarrier k, and hence the distortion effect due to theICI can be considered similar to the one of additive whiteGaussian noise (AWGN). The self-interference, however, maybe strongly correlated with the signal at subcarrier k andthus may directly influence its signal level in a deterministicfashion. However, in the appendix it is shown that the


0 0.1 0.2 0.3 0.4 0.5

|ω| (CFO normalised to subcarrier spacing)

−5

0

5

10

15

20

25

30

35

SIR

(dB

)

SIRICI

SIRe, N/L = 8SIRe , N/L = 16SIRe , N/L = 32

Figure 1: SIR conditions for uncompensated CFO (12) andcompensated CFO (28) based on the signal model in (17).

influence of the deterministic distortion evoked by the self-interference is negligible if L � N holds, and consequentlywe may consider the total interference from Vx as pure ICI-like distortion here.

The power of the useful signal per subcarrier amounts toPs. Thus, a closed form expression for a lower bound of theSIR resulting from the error matrix Γ can finally be given as

SIRe = PsPi(ω)

≥ N

2L · sin2(πω). (28)

We observe that an increasing channel length L decreasesthe SIR proportionally. As the proposed CFO compensationprocess ignores the error Γ, we will not be able to overcomethis SIR bound, even if the distortion measures Λ and Uneeded for the compensation process are estimated perfectly.

To illustrate the obtained results, Figure 1 compares theSIR bound for an uncompensated CFO from (12) with theSIR bound (28) achievable after applying the simplified CFOcompensation process. The amount of interference powerthat can be removed by the suggested process correspondsto ΔPi = SIR−1

ICI − SIR−1e . If L = 0, the interference can be

removed completely by the CFO compensation process. Forincreasing L, however, an increasing share of the interferencepower is contained in the term Vx in (17), remaining inthe system after compensation. If we set ΔPi = 0 and solvefor L, we obtain the channel length where the compensationprocess is not capable of providing any gain. The minimumvalue for this length L is obtained for |ω| → 0, yielding N/6.This means that if L > N/6, the gains delivered by the CFOcompensation process become vanishingly small, so that itsapplication will no longer be suitable. In Figure 1, this canbe observed as the SIRe curves approach the SIRICI curve fordecreasing values ofN/L. ForN/L = 8, the SIR gains achievedafter compensation for small CFOs |ω| < 0.2 have becomealready very small.

3.4. CFO Compensation after Channel Equalization. Wefocus now in more detail on the CFO compensation processbased on the signal model (17), which is carried out afterchannel equalization by multiplying the equalized signal

vector y = Λ−1

y with the Hermitian matrix UH (notethat matrix U has unitary property). This latter operationrepresents a convolution of the subcarrier signals in y withu∗(−κ), which is given in (10). As the amplitude u(κ) dropswith 1/κ, it may be sufficient to consider only the subcarriersin closest vicinity to the subcarrier k within the convolutionprocess, which would simplify the entire process significantly.Let the vicinity range of subcarriers, therefore, be limited toK , that is, |κ| ≤ K . The convolution operation can then bespecified by

x j =K∑

κ=−Ku∗(κ) · y j+κ, (29)

where yk is the kth symbol of vector y, and x j is thejth subcarrier signal obtained after equalization and CFOcompensation.

To specify a suitable value for the delimiter K , note thaty is distorted by interference from Vx, which is stronglycorrelated over the interference correlation range |κ| ≤Kc specified in (27). Furthermore, note that u(κ) given in(10) is near to being point-symmetric, that is, u(−κ) ≈−u(κ) holds. This near point-symmetric property of u(κ)results in the fact that the correlated interference affectingthe subcarrier signals in close vicinity of subcarrier j iscanceled out almost completely in (29). For that reason,it seems to be reasonable to set the delimiter K = [Kc],where [x] denotes the integer nearest to x. Interestingly,simulation results presented in Section 5 show that we areable to achieve a slight performance improvement with thisselection compared to the full CFO compensation, where theinterference from the total N subcarrier signals is taken intoaccount.

Note that CFO compensation according to (29) canbe realized with comparatively small demands on systemcomplexity. Firstly, for practical system setups, K can belimited to small values. Furthermore, u(κ) in (10) exhibitsa single complex factor independent of κ, which representsthe CPE. Compensation of the CPE can be incorporated intothe channel equalization process. Therefore, (29) reduces toa convolution with a simple, strictly real-valued function.

4. SIR Analysis in OFDM-SDMA System

Recall the OFDM-SDMA transmission equation from (6). Ifwe want to equalize the effective channel HC completely, theonly viable approach based on linear techniques is to invertthe entire channel matrix HC—which relates to the approachfor OFDMA systems conducted in [18, 19]. However, thismatrix is of dimension MN ×QN , and hence the complexityof this approach will quickly become infeasible for practicalrealizations. Although complexity can be reduced by exploit-ing the block-diagonal band structure of this matrix, it stillremains considerably high. Moreover, as CFOs induce phaserotations of the effective subcarrier channels over time, the


r1

r2

FFT

FFT

y1

y2

Spatialseparation

of users

y1

y2

Convolutionwith u∗1 (κ)

Convolutionwith u∗2 (κ)

x1

x2

Figure 2: Receiver processing for simplified signal reconstructionwith CFO compensation in the SDMA uplink.

matrix HC changes every OFDM symbol and thus has to berecomputed frequently, which increases the complexity forthe inversion-based compensation even further.

An equalization approach that maintains the subcarrier-wise signal processing for the equalization and thus requireslow complexity demands can be enabled if we alternativelyadopt the signal model (17) derived in Section 3.2. Herewith,the compound channel HC can be written in the structuredform:

HC =

⎛⎜⎜⎝

Λ11 Λ12...

...ΛM1 ΛM2

⎞⎟⎟⎠ ·

(U1 0

0 U2

)+

⎛⎜⎜⎝

V11 V12...

...VM1 VM2

⎞⎟⎟⎠. (30)

The OFDM-SDMA transmission equation then yields

y = (ΛC ·UC + VC)x, (31)

where ΛC, UC , and VC are the matrices constituting thecompound channel matrix HC above. Evidently, this nota-tion enables the two-step equalization process introduced inthe previous section. We first equalize the channel containedin matrix ΛC by a subcarrier-wise equalization of the flat-fading SDMA channel and thereby spatially separate thesingle user signals. The separated user signals may then becompensated individually for their CFO distortions Uq asdescribed in Section 3.4. The entire receiver processing forthe simplified CFO compensation in the SDMA system isillustrated in Figure 2.

In what follows, we will analyze how the CFO-inducedinterference will affect spatial diversity gains that can beachieved with a linear multiantenna receiver. As there issome correlation between signal and interference channels,distortion effects from the interference can no longer beexpected to be similar to the one of AWGN. In particular, wewill analyze the degree of correlation between the channelof the useful signal and the interference channels andderive SIR bounds describing the equivalent situation forAWGN. Analysis will be carried out for the case of no CFOcompensation and compensation according to the proposedscheme separately.

4.1. Spatial Diversity Gain. In a brief intermezzo, we derivethe basic relations concerning spatial diversity gains thatare achievable with linear receivers in case of correlatedsignals. These relations form the basis for the analysis of thesignal conditions in CFO-distorted OFDM-SDMA systems,which will be performed in the succeeding subsections. Inparticular, we examine here how interference that propagates

via a correlated channel will affect the signal conditionsat a multiantenna receiver providing spatial diversity gainμ. Following the notion from [34], the spatial diversitygain can be illustrated by assuming a maximum ratiocombining (MRC) receiver that combines the signals from μindependent receive antennas. Assume a signal x1 with meanpower Ps, which is transmitted via μ independent Rayleigh-fading channels hm1, m ∈ {1, . . . ,μ} with unit mean power.At each receiving antennam, the signal is distorted by AWGNwith power N0. MRC operation then yields a post-MRCsignal-to-noise ratio (SNR) of μPs/N0. The SNR thus isincreased by factor μ compared to the SNR of the signal ata single receive antenna.

Instead of AWGN, we consider an interfering signal x2

with mean power N0 now. The signal at mth receive antennareads

ym = hm1x1 + hm2x2. (32)

Let the two signal xq be uncorrelated, while some correlationbetween the two channels hmq is assumed. Both variableshmq are assumed to be zero-mean Gaussian variables withvariance var (hmq) = E{h∗mqhmq}. The correlation betweenboth variables can be characterized by the correlationcoefficient defined as [35]

ρ = cov(hm1,hm2)√var(hm1)var(hm2)

, ρ ∈ [0, 1], (33)

where cov(hm1,hm2) stands for the covariance of the two vari-ables given in the parentheses. According to [35, Theorem10.1], the distribution of hm2 conditioned on hm1 can becharacterized by the two measures:

E{hm2 | hm1} =ρ√

var(hm2)var(hm1)

hm1,

var(hm2 | hm1) = (1− ρ2)var(hm2).

(34)

Accordingly, hm2 can be rewritten as

hm2 = ρ

√var(hm2)var(hm1)

hm1 +√(

1− ρ2)

var(hm2) zm, (35)

where we introduced a new Gaussian variable zm withzero mean and unit power, which is independent of hm1.Substituting this equation in (32) yields

ym = hm1x1︸︷︷︸s

+ ρ

√var(hm2)var(hm1)

hm1x2

︸︷︷︸f1

+√

(1− ρ2) var(hm2) zm · x2︸︷︷︸

.

f2

(36)

MRC operation delivering the spatial diversity gain is carriedout by multiplying each received signal ym with the conjugatechannel seen by the useful signal x1 and summing up thesignals over all μ receive antennas: yMRC = ∑μ

m=1h∗m1ym.

Within this summation, the signal portions from the first two


components in (36), s and f1, which both depend on hm1, addup constructively, yielding a mean power of

μ2E{ss∗} = μ2 var(hm1)Ps,

μ2E{f1 f

∗1

} = μ2ρ2 var(hm2)N0

(37)

after MRC operation. In contrast to that, the signal portionsfrom the third component in (36), f2, add up with arbitraryphase, so that the mean power for these signal portions yieldsafter MRC

μE{f2 f

∗2

} = μ(1− ρ2)var(hm2)N0. (38)

Now let, for simplicity, var(hm1) = var(hm2) = 1. With theaforementioned results, we obtain for the post-MRC SIR

SIRMRC = μPs[(μ− 1

)ρ2 + 1

]N0

= ν · μ PsN0

, (39)

clearly revealing that the spatial diversity gain factor μ isdiminished by

ν = [(μ− 1)ρ2 + 1]−1< 1. (40)

Thus, ν represents the effective SNR loss factor owing to thechannel correlation ρ > 0.

4.2. No Compensation of CFO Distortions. Now we turn ourfocus back on the signal conditions in the OFDM-SDMAsystem in case the ICI distortions are not compensated.Consider the signal received at antenna m, which, accordingto (6), is given as

ym = U1Λm1x1 + U2Λm2x2. (41)

In case all subcarriers carry signals with identical transmitpower, the statistical properties of the ICI are identical forall the elements contained in ym. Therefore, we carry outthe analysis exemplarily for the first element of vector ym,denoted as ym. To separate the ICI from the useful signal,we define uq as the first row vector of matrix Uq, where thefirst element has been replaced by zero. The transmissionequation then yields

ym = u(0)(λm1x1 + λm2x2) + u1Λm1x1 + u2Λm2x2︸︷︷︸,ICI

(42)

where λmq is the channel coefficient for the first subcarrierextracted from matrix Λmq, and xq is the transmit signal ofuser q at the first subcarrier. We set x1 as the useful signal.An appropriate equalizer is able to remove the signal portionλm2x2 if estimates of the channels λmq can be obtainedwith sufficient quality. The two scalar products within (42),however, will remain in the system as ICI. The signalstructure in the aforementioned equation is now similar to(32), and hence we can use the results from the precedingsubsection to determine achievable spatial diversity gainshere. Clearly, the two scalar products representing the ICI in(42) are constituted of multiple interfering signals. However,as all elements within vector xq are assumed i.i.d., each scalar

product can be modeled by a single random variable, whosepower is constituted from the sum of powers from the singleelements in uqΛmq =: fq. In particular, we yield for the powerof the interfering channel:

E{

fqfHq}= E

{uqΛmqΛ

HmquHq

}=

N−1∑

j=1

|u( j)|2 ≤ 1− si2(πωq

),

(43)

where we used the upper bound for PICI presented inSection 3.1. The covariance between the useful channel λm1

and the interference channels uqΛmq is determined by

zq = E{λ∗m1uqΛmq

}. (44)

As channels from different users are assumed uncorrelated,zq yields a vector with nonzero entries for q = 1 only. The Nelements of the covariance vector z1 can be characterized bythe function

z(κ) =⎧⎨⎩

0, κ = 0,

u(κ) · r(κ), κ ∈ {1, . . . ,N − 1},(45)

with r(κ) being the subcarrier correlation function definedin Section 2.3. The total power of the covariance vector z1 isdetermined as

zH1 z1 =N−1∑

κ=1

|u(κ)|2|r(κ)|2, (46)

which can be read as the power of the covariance |cov|2of an equivalent random process based on a single randomvariable. With these results, we can determine a measurerepresenting the correlation between the useful channeland the sum of interference channels, which is calculatedequivalently to the correlation coefficient in (33):

ρ2 = zH1 z1

E{λ∗m1λm1

}E{

f1fH1} ≥

∑N−1κ=1 |u(κ)|2|r(κ)|21− si2(πω1)

. (47)

Evaluating this measure for varying L reveals that ρ2 ≈ 1for L � N , suggesting that the useful channel and theinterference channels for the ICI generated from x1 are nearlyfully correlated. Evidently, this results mainly from the high-frequency correlation of subcarrier channels that is valid forL� N .

Consequently, we can conclude here that if an MRC-like signal combination is performed at the multiantennareceiver, not only the signal portions of the useful signal x1

but also the ones of the interference from x1 will add upfully coherently. In contrast to that, there is no correlationbetween the useful channel λm1 and the interference channelsuqΛmq, q /= 1, as the covariance of the corresponding chan-nels yields zero. Consequently, this distortion will behavesimilarly to AWGN. Resorting to the derivation of the SIRin (39) in the preceding section, we yield for the achievableSIR in an OFDM-SDMA system with spatial diversity gain μ:

SIRMRCICI ≥ μsi2(πω1)

μ(

1− si2(πω1))

+(

1− si2(πω2)) , (48)


where we have used the bounds for Pu and PICI fromSection 3.1 for the total power of useful signal and ICI,respectively. This result shows that in the OFDM-SDMAsystem with diversity gain μ, the ICI power generated byany user q /= 1 is effectively attenuated by factor μ (i.e., thediversity gain can be realized completely), while the ICIgenerated from the CFO of user q = 1 himself is fullypreserved (i.e., no diversity gain is achievable). If all usersQ have a CFO of the same size, ωq = ω for all q, thenthe effective reception SIR (referring to the mean power ofeach user’s signal measured at any receive antenna m) for theequivalent AWGN case can be given as

SIRICI ≥ [μ +Q − 1]−1 si2(πω)

1− si2(πω). (49)

This result is equivalent to the SIR bound for the single-antenna case (12), reduced by the effective SIR-loss factorη = [μ +Q − 1]−1.

4.3. Compensation of CFO Distortions. Next we consider thecase where the CFO distortions are compensated accordingto the proposed concept. Then interference results from thesignal components contained in matrix VC in (31) only, andthe signal received at antenna m reads

ym = Λm1U1x1 +Λm2U2x2 + Vm1x1 + Vm2x2. (50)

Again, we define x1 as the useful signal. The proposedequalization and ICI compensation concept removes theinterference from Λm2U2x2 as well as the the ICI induced byU1, and correspondingly solely the interference from Vmqxqremains in the system. Equivalently to the analysis carriedout in the preceding subsection, we will now determine thecorrelation between useful channels and the channels of theresidual interference to specify achievable spatial diversitygains. However, to ease analysis here, we initially focus on theentire channel matrices Λm1 and Vmq to specify the overallstatistical properties. Afterwards, we determine the signalconditions per subcarrier signal by averaging over the totalN subcarriers of the system.

The mean power of the interfering channel Vmq persubcarrier amounts to

1N

tr(E{

VHmqVmq

})≤ L

N· 2sin2

(πωq

). (51)

For the bound, we used the result from (23). Correspond-ingly, the mean power of the useful channel Λm1 yields

1N

tr(E{ΛHm1Λm1

})= 1N

tr(I) = 1. (52)

The covariance between useful channel and interferingchannel can be characterized by the covariance matrix:

Zq = E{ΛHm1Vmq

}. (53)

As Vmq is constituted of the channel coefficients related tochannel Λmq, the covariance matrix Zq will have nonzero

entries for q = 1 only. The corresponding matrix Z1

can be determined as follows. Using the definitions of Λ

from (14) and V from (17), ΛHm1Vm1 can be written as

(FPH1 H

Hm1PH

2 )(P2Hm1Γ1F). For the moment we will excludethe outer DFT matrices F and determine the expectation

value of the inner matrix product. PH1 H

Hm1PH

2 is a circularToeplitz matrix based on the channel impulse response hl,and P2Hm1Γ1 was shown in Section 3.2 to be a matrix withzero entries except for the submatrix Vu found in its upperright corner. The expectation value of the product of thesetwo components thus yields a matrix with zero entries exceptfor the L×L antidiagonal submatrix in its upper-right corner,whose L antidiagonal elements ξi represent partial sums ofthe channel power weighted by γ−i:

ξi = γ−iL∑

l=iσ2l , i ∈ {1, . . . ,L} (54)

with γn defined in (16). From the covariance matrix, we candetermine the mean power of the covariance between usefuland interfering channels per subcarrier signal according to

1N

tr(

Z1Z1H)=

L∑

i=1

|ξi|2

≤ ∣∣γn∣∣2 1

(L + 1)2

L∑

i=1

i2

= |γn|2 L(2L + 1)6(L + 1)

,

(55)

where the upper bound is obtained for a uniform PDP.Note that |γn|2 = 4sin2(πω1). Similar to (47), we can nowdetermine a measure equivalent to the squared correlationcoefficient:

ρ2 =N−1 tr

(Z1Z1

H)

N−1 tr(E{ΛHm1Λm1

})N−1 tr

(E{

VHm1Vm1

})

≈ 2L + 13(L + 1)

<23.

(56)

Assuming again a receiver with spatial diversity gain μ,we may now determine the SIR for the useful signalafter an MRC-like signal combination over μ independentobservations. Resorting to the derivation of the SIR in (39),we yield for the interference from Vm1x1 a mean powerof μν−1Pi(ω1), with the interference power Pi(ω) accordingto (25) and the SIR loss factor ν from (39). As all otherinterference channels Vmq, q /= 1, are uncorrelated with theuseful channel Λm1, the corresponding interference Vmqxqadds up incoherently, yielding a mean power of μPi(ω2).Hence, we obtain the post-MRC SIR

SIRMRCe = μPs

ν−1Pi(ω1) + Pi(ω2). (57)


If we have multiple usersQ who all have a constant CFO, thatis, ωq = ω for all q, the effective reception SIR at any antennam for the equivalent AWGN case can be bounded by

SIRe ≥ [(μ− 1)ρ2 +Q]−1 N

2L sin2(πω), (58)

where we used the bound for Pi(ω) from (25), and ρ2 shouldbe used as specified in (56). This expression is equivalentto the SIR bound found for the single-antenna case in (28)reduced by the effective SIR-loss factor ηe = [(μ−1)ρ2+Q]−1.Note here that the CFO-induced interference scales with thenumber of parallel SDMA users Q. In case of full correlation(ρ = 1), the SIR-loss factor ηe is identical to η, the factorfound in case of no CFO compensation in (49). As a majorresult, we conclude here that the correlated interference fromthe CFO distortion results in an increase of the effective SIR-loss if a receiver with spatial diversity gain μ > 1 is employed.


In this section we will provide numerical simulations toverify our analytical results found in the previous sections.For the simulations, we assume OFDM signal transmissionvia a noisy channel, that is, the transmission equation (6) isnow given by

y = HC · x + n, (59)

where n is a vector consisting of MN AWGN sampleswith power N0. Thus, the mean reception SNR amountsto Ps/N0 for the signal of any user at any receive antenna.As we have indicated that the CFO-induced interferencecan be expected to behave like AWGN, it can be assumedthat this interference degrades the interference-free AWGNperformance (i.e., no CFO is present) according to theamount of interference power. In particular, if the SNR Ps/N0

is equal to the CFO-induced SIR, we can expect that thetransmission experiences a performance degradation of 3 dBcompared to the interference-free case. (As interference andAWGN are assumed to be independent, their joint distortioncan be considered as Gaussian-like with power equal to thesum of powers from the two independent processes.) Thisbasic principle will be used to verify the SIR bounds derivedin the preceding sections.

We consider an OFDM-SDMA system with N = 64subcarriers, where Q = 2 single-antenna user terminals aregranted simultaneous access. For the bounds to be tight,all N subcarriers are occupied with transmission symbolsfrom both users. The channel between each antenna link ismodeled as Rayleigh-fading with L + 1 = 5 channel taps anda uniform PDP. The normalized CFO is fixed to ω = 0.1.As a performance measure, we use the bit-error rate (BER)that is achieved for an uncoded transmission of uncorrelated16QAM symbols, averaged over both users. We use a zeroforcing (ZF) equalizer to equalize the channel distortionsand spatially separate the user signals per subcarrier. Thediagonal channel ΛC from (31) as well as the CFOs ωq areassumed to be known perfectly at the receiver.

10 12 14 16 18 20 22 24 26 28 30

Ps/N0 (dB)

10−2

10−1

BE

R

SIR

ICI

SIRe

3 dB loss

No CFOFull CFO compensation

K = 3No compensation (K = 0)

Figure 3: BER performance of SISO system distorted by normalizedCFO ω = 0.1.

Based on the signal model (17), we first examine theachievable performance for a single-antenna link (SISO).Results are given in Figure 3. The solid bold line shows theachievable BER performance in case no CFO is present.The suggested compensation approach shows a significantlydegraded performance. At an SNR Ps/N0 equal to the SIRbound (28), which amounts to 19 dB for the given parametersetting, it clearly exhibits a performance loss of 3 dB. Thisobservation thus verifies the bound derived in (28).

The performance curve of the CFO compensated systemruns into an error floor for high SNR that corresponds to theBER performance achievable with the CFO-free performanceat about 22 dB—which is about 3 dB higher than the SIRbound. The reason for that can be found in the distributionof the interference generated from the distortion terms inVx in (17). Note that the values in Vx are generated fromproducts of the independent random variables hl in V andthe data symbols in x, which are all assumed to be Gaussian.The resulting distribution function for the values in Vx isthus in general no longer Gaussian. Instead, we observe thatthe majority of the values from this distribution is muchmore concentrated around their mean than in the Gaussiancase. Due to this fact, the achieved error floor is significantlylower than it would be if the interference behaved likeGaussian noise with identical power. However, it is worthnoting that with increasing L and thus with an increasingnumber of independent variables hl in V, the distribution ofthe values in Vx approaches the Gaussian case—thanks tothe central limit theorem.

If we apply the CFO compensation technique thatremoves the ICI from the subcarriers in close vicinity κ ≤K only (see Section 3.4), we obtain the performance givenby the dashed line for K = [Kc] = 3. Interestingly, forthe choice of K according to Kc given in (27), the CFOcompensation accounting only for some of the ICI distortionachieves a slight performance improvement compared to the


full CFO compensation. Obviously, this is a benefit relatedto the correlated interference from Vx in (17), as detailed inSection 3.4.

If we do not compensate for the ICI caused by the CFObut compensate for the CPE only, which corresponds to thecase of applying the compensator (29) with K = 0, weobtain the performance represented by the uppermost curve.For an SNR equal to the bound in (12), which amountsto 15 dB for the given parameter setting, we clearly observea performance loss of 3 dB compared to the performancewhere no CFO is present.

For the 2-user SDMA case, we consider ZF equalizationto separate the signals of the different users. In [34] thediversity gain delivered by the ZF receiver has been shownto yield μ = M − Q + 1. For our examinations, we considertwo cases: a receiver with M = 2 and M = 3 antennas,providing a diversity gain of μ = 1 and μ = 2, respectively.Performance results are shown in Figure 4. The dashedcurves refer to μ = 1, while the solid curves refer to μ = 2.The curves representing full CFO compensation accordingto the proposed scheme exhibit a 3 dB performance lossat an SNR equal to the SIR from (58) compared to thecurve of CFO-free transmission, which amounts to 16 dBfor μ = 1 and 15 dB for μ = 2, respectively, for the givenparameter setting. These losses are highlighted in Figure 4 bythe horizontal black lines, clearly verifying the bound derivedin (58). As in the SISO case, we observe that we can achievea slight performance improvement if we use the simplifiedCFO compensation process based on (29) withK = 3. In casewe do not compensate the ICI caused by the CFO, we achievea severely degraded performance, which clearly exhibits a3 dB performance loss at an SNR of 12 dB for μ = 1 and 10 dBfor μ = 2, respectively, corresponding to the analytical bound(49).

In Figure 5 we examine the behavior of the BER whenthe CFO compensation process based on (29) is applied fordifferent values of the delimiter K . We focus on a constantSNR Ps/N0 = 20 dB, which reflects the BER of the error floorfor μ = 1. For the selected values of N/L, the subcarriercorrelation range Kc from (27) amounts to 3.2 and 1.6,respectively. Interestingly, the corresponding curves exhibittheir minimum atK = 3 andK = 2, respectively, which is thenearest integer toKc. Hence, selectingK = [Kc] indeed seemsto be a good choice. This result leads us to the conclusion thatit definitely suffices to consider only the subcarrier signals inclosest vicinity within the CFO compensation via (29).

To illustrate the performance degradation caused by theincomplete compensation of the CFO effects in the OFDM-SDMA system, we specify the effective SNR loss ΔSNR basedon the ratio of the interference power bound from (58) andthe AWGN power N0 as done in [8], which yields (in dB)

ΔSNR = 10 log10

(1 +

ηeNPs2L sin2(πω)N0

). (60)

The numerical evaluation of the effective SNR loss forvarious CFO sizes ω is depicted versus the SNR Ps/N0 inFigure 6; the corresponding parameter setting is specifiedin its caption. In accordance with the observations drawn

10 12 14 16 18 20 22 24 26 28 30

Ps/N0 (dB)

10−3

10−2

10−1

BE

R

SIRe

3 dB loss

No CFOFull CFO compensation

K = 3No compensation (K = 0)

Figure 4: BER performance of 2-user SDMA system distorted bynormalized CFO ω = 0.1 with ZF receiver. Dashed line: diversitygain μ = 1. Solid line: μ = 2.

0 1 2 3 4 5 Full

Delimiter K

10−1.4

10−1.3

BE

R

N/L = 16N/L = 8

Figure 5: BER performance at SNR = 20 dB versus delimiter K . μ =1, Q = 2, ω = 0.1.

from Figure 4, where evaluations where based on a CFO ofsize ω = 0.1, the corresponding curve indicates here a 3 dBSNR loss at an SNR Ps/N0 = 16 dB. For comparison, we alsoadded the SNR loss for the case of no ICI compensation(dashed curves), where we used the interference powerbound from (49). Although we observe that the proposedCFO compensation is able to reduce the SIR loss significantly,it still increases steeply for increasing CFO size ω. If the CFOamounts to 20% of the subcarrier spacing, the performanceof the system is degraded by 3 dB already at an SNR level ofabout 10 dB.

These results show that the system’s sensitivity towardCFO errors is still very high, and hence we conclude that withthe suggested approach, we can conveniently compensateCFOs of small size only. Thus, the method is suitable for a


0 2 4 6 8 10 12 14 16 18 20 22 24

Ps/N0 (dB)

0

1

2

3

4

5

6SN

Rlo

ss(d

B)

ω = 0.025ω = 0.05ω = 0.1

ω = 0.15ω = 0.2

Figure 6: SNR loss after CFO compensation versus SNR for CFOsof different size ω. Solid line: ICI compensation. Dashed line: nocompensation. μ = 1, Q = 2, N/L = 16.

fine-frequency synchronization only, and hence it has to relyon a coarse synchronization, which has to be established inadvance. In a practical system, such a coarse synchronizationcan be achieved if the terminals use their frequency estimatesobtained during the preceding downlink phase for a properfrequency precompensation of their transmit signals. Wedenote this as frequency advance, which has been the basicconcept for our real-time system implementation that hasbeen reported in [30]. It is worth noting that the analysispresented in this paper and in particular the derived boundsfor the SIRs served as an important guideline in preparing theexperiments that have been summarized in that reference,which have shown that a convenient system operation in apractical setup can be achieved.

Finally, note that if the CFOs are kept small, the signaldegradation from ICI is limited, and thus common pilot-based channel estimation techniques can still be used toobtain channel estimates of sufficient quality. The morepilots available in one OFDM symbol can be used for thatchannel estimation, the better the ICI can be suppressed,as the ICI behaves similar to AWGN. Moreover, the CFOsωq of the single users q ∈ {1, . . . ,Q} can be obtainedfrom observing the phase drift of the estimated subcarrierchannels λk over several successive OFDM symbols. With(10), the ICI coefficients u(κ) can then be determined, whichcan finally be applied in (29) for proper ICI compensation ofthe single users’ signals.

6. Conclusion

We have investigated OFDM-SDMA uplink transmission inthe presence of multiple users’ CFOs. We modified the com-mon signal model suitably to enable a subcarrier-wise SDMAequalization followed by a user-specific CFO compensation,

yielding a simple equalization process ready to be applied inpractice. However, as CFOs violate the periodic structure ofthe OFDM signals, some interference remains in the systemafter CFO compensation, which cannot be compensated aslong as simple frequency domain processing is targeted. TheSIR conditions in OFDM-SDMA systems have been analyzedif CFOs are compensated according to the proposed schemeas well as if they are not. We derived suitable upper boundsfor the SIRs depending on the system parameters, whichhave been verified by numerical simulations. To enable aconvenient operation of the proposed scheme, we concludefrom the results that the users’ CFOs should not exceedvalues that are much larger than a few percent of the OFDMsubcarrier spacing, which classifies this scheme as a tech-nique for fine frequency synchronization. Correspondingly,coarse-frequency synchronization has to be ensured, whichcan easily be established if the CFO estimates from thedownlink are used in the uplink for a proper predistortionof each user’s transmit signal, as suggested also in [10]and practically realized in [30]. Together with this concept,the proposed scheme can be regarded as a convenientsolution to synchronize the OFDM-SDMA uplink. Note thatthis concept based on coarse synchronization also enablesto estimate user channels based on common pilot-basedchannel estimation techniques. Suitable estimates of theusers’ CFOs can then be obtained from the phase drift of theestimated channels observed over several consecutive OFDMsymbols.

Appendix

A. Correlation between Self-Interference andUseful Signal

To determine the correlation between the self-interferenceand the useful signal at any subcarrier k, we determine thecovariance between the self-interference coefficient (i.e., thekth diagonal element of matrix V) and the channel coef-ficient λk. As indicated earlier, the interference conditionsevoked by matrix V are independent of the actual subcarrierposition k, and hence it suffices to determine the covarianceat a single subcarrier position; specifically we choose k = 1.The channel coefficient is given as λ1 =

∑Ll=0hl. Denote the

first diagonal element of V as v11. Considering the structureof matrix V based on the submatrix Vu (see Section 3.2), v11

can be calculated as

v11 = 1N

L−1∑

m=0

L∑

l=m+1

γ−l+mhl. (A.1)

As both coefficients λ1 and u11 have an expectation value ofzero, the covariance is defined as cov = E{λ∗1 v11}. Withthe uniform PDP, we yield for the covariance of the twocoefficients

cov = γ−1

N(L + 1)

L∑

m=1

L−m∑

l=0

exp(− jϕl). (A.2)


The second sum term on the right hand side represents ageometric series, so that similarly to (10) the si-function canbe used to obtain an approximation, which is given as

L−m∑

l=0

exp(− jϕl) = exp

(− jπωL−m

N

)

︸︷︷︸≈1

si(πω

L−m + 1N

)

︸︷︷︸≈1

· (L−m).(A.3)

As usually L � N holds, the exponential function as wellas the si-function generate values that are very close to onefor any m ∈ {1, . . . L}. Hence, both terms can be upperbounded with constant value one. Herewith the covariancecan be upper bounded by

cov <γ−1L

2N. (A.4)

Assuming the signals λ1, v11 to be Gaussian, the amountof power Pc devoted to the self-interference can with [35,Theorem 10.1] be determined by

Pc = |cov|2 · Ps < L2

N2sin2(πω) · Ps. (A.5)

With this result, we can assess the ratio of the self-interference power to the total interference power Pi(ω) givenin (25), yielding

PcPi(ω)

≈ L

2N. (A.6)

For L� N , we conclude that the amount of self-interferenceis vanishingly small; hence there is no need to considerthe self-interference separately to account for its specialproperties.

References

[1] P. Vandenameele, L. van der Perre, M. G. E. Engels, B.Gyselinckx, and H. J. De Man, “A combined OFDM/SDMAapproach,” IEEE Journal on Selected Areas in Communications,vol. 18, no. 11, pp. 2312–2321, 2000.

[2] X. Cai and G. B. Giannakis, “Bounding performance and sup-pressing intercarrier interference in wireless mobile OFDM,”IEEE Transactions on Communications, vol. 51, no. 12, pp.2047–2056, 2003.

[3] P. Jung and G. Wunder, “On time-variant distortions inmulticarrier tansmission with application to frequency offsetsand phase noise,” IEEE Transactions on Communications, vol.53, no. 9, pp. 1561–1570, 2005.

[4] T. Keller, L. Piazzo, P. Mandarini, and L. Hanzo, “Orthogonalfrequency division multiplex synchronization techniques forfrequency-selective fading channels,” IEEE Journal on SelectedAreas in Communications, vol. 19, no. 6, pp. 999–1008, 2001.

[5] B. Ai, Z.-X. Yang, C.-Y. Pan, J.-H. Ge, Y. Wang, and Z. Lu, “Onthe synchronization techniques for wireless OFDM systems,”IEEE Transactions on Broadcasting, vol. 52, no. 2, pp. 236–244,2006.

[6] L. Haring and A. Czylwik, “Synchronization in MIMO-OFDMsystems,” Advances in Radio Sciences, vol. 2, pp. 147–153, 2004.

[7] A. M. Tonello, N. Laurenti, and S. Pupolin, “Analysis of theuplink of an asynchronous multi-user DMT OFDMA systemimpaired by time offsets, frequency offsets, and multi-pathfading,” in Proceeding of the 52nd IEEE Vehicular TechnologyConference (VTC ’00), vol. 3, pp. 1094–1099, Boston, Mass,USA, September 2000.

[8] M. S. El-Tanany, Y. Wu, and L. Hazy, “OFDM uplink forinteractive broadband wireless: analysis and simulation in thepresence of carrier, clock and timing errors,” IEEE Transactionson Broadcasting, vol. 47, no. 1, pp. 3–19, 2001.

[9] L. Kuang, J. Lu, Z. Ni, and J. Zheng, “Nonpilot-aided carrierfrequency tracking for uplink OFDMA systems,” in Proceed-ings of IEEE International Conference on Communications (ICC’04), vol. 6, pp. 3193–3196, Paris, France, June 2004.

[10] M. Morelli, C.-C. Kuo, and M.-O. Pun, “Synchronizationtechniques for orthogonal frequency division multiple access(OFDMA): a tutorial review,” Proceedings of the IEEE, vol. 95,no. 7, pp. 1394–1427, 2007.

[11] J.-J. van de Beek, P. O. Borjesson, M.-L. Boucheret, et al.,“A time and frequency synchronization scheme for multiuserOFDM,” IEEE Journal on Selected Areas in Communications,vol. 17, no. 11, pp. 1900–1914, 1999.

[12] H. Bolcskei, “Blind high-resolution uplink synchronization ofOFDM-based multiple access schemes,” in Proceedings of the2nd IEEE Workshop on Signal Processing Advances in WirelessCommunications (SPAWC ’99), pp. 166–169, Annapolis, Md,USA, May 1999.

[13] S. Barbarossa, M. Pompili, and G. B. Giannakis, “Channel-independent synchronization of orthogonal frequency divi-sion multiple access systems,” IEEE Journal on Selected Areasin Communications, vol. 20, no. 2, pp. 474–486, 2002.

[14] Y. Yao and G. B. Giannakis, “Blind carrier frequency offsetestimation in SISO, MIMO, and multiuser OFDM systems,”IEEE Transactions on Communications, vol. 53, no. 1, pp. 173–183, 2005.

[15] Z. Cao, U. Tureli, and Y.-D. Yao, “User separation andfrequency-time synchronization for the uplink of interleavedOFDMA,” in Proceedings of the 36th Asilomar Conference onSignals, Systems and Computers, vol. 2, pp. 1842–1846, PacificGrove, Calif, USA, November 2002.

[16] M. Morelli, “Timing and frequency synchronization forthe uplink of an OFDMA system,” IEEE Transactions onCommunications, vol. 52, no. 2, pp. 296–306, 2004.

[17] M.-O. Pun, M. Morelli, and C.-C. J. Kuo, “Maximum-likelihood synchronization and channel estimation forOFDMA uplink transmissions,” IEEE Transactions on Commu-nications, vol. 54, no. 4, pp. 726–736, 2006.

[18] Z. Cao, U. Tureli, Y.-D. Yao, and P. Honan, “Frequency syn-chronization for generalized OFDMA uplink,” in Proceedingsof IEEE Global Telecommunications Conference (GLOBECOM’04), vol. 2, pp. 1071–1075, Dallas, Tex, USA, November 2004.

[19] C. Ibars and Y. Bar-Ness, “Inter-carrier interference cancel-lation for OFDM systems with macrodiversity and multiplefrequency offsets,” Wireless Personal Communications, vol. 26,no. 4, pp. 285–304, 2003.

[20] R. Fantacci, D. Marabissi, and S. Papini, “Multiuser interfer-ence cancellation receivers for OFDMA uplink communica-tions with carrier frequency offset,” in Proceedings of IEEEGlobal Telecommunications Conference (GLOBECOM ’04), vol.5, pp. 2808–2812, Dallas, Tex, USA, November 2004.

[21] D. Huang and K. B. Letaief, “An interference-cancellationscheme for carrier frequency offsets correction in OFDMAsystems,” IEEE Transactions on Communications, vol. 53, no.7, pp. 1155–1165, 2005.


[22] W. G. Jeon, K. H. Chang, and Y. S. Cho, “An equalizationtechnique for orthogonal frequency-division multiplexing sys-tems in time-variant multipath channels,” IEEE Transactionson Communications, vol. 47, no. 1, pp. 27–32, 1999.

[23] C.-Y. Hsu and W.-R. Wu, “Low-complexity CFO compen-sation for uplink OFDMA systems,” in Proceedings of the17th IEEE International Symposium on Personal, Indoor andMobile Radio Communications (PIMRC ’06), pp. 1–5, Helsinki,Finland, September 2006.

[24] J. Choi, C. Lee, H. W. Jung, and Y. H. Lee, “Carrier frequencyoffset compensation for uplink of OFDM-FDMA systems,”IEEE Communications Letters, vol. 4, no. 12, pp. 414–416,2000.

[25] M. Schellmann and S. Stanczak, “Multi-user MIMO channelestimation in the presence of carrier frequency offsets,” inProceedings of the 39th Asilomar Conference on Signals, Systemsand Computers, pp. 462–466, Pacific Grove, Calif, USA,October 2005.

[26] S. Ahmed, S. Lambotharan, A. Jakobsson, and J. Chambers,“MIMO frequency-selective channels with multiple frequencyoffsets: estimation and detection techniques,” IEE Proceedings:Communications, vol. 152, no. 4, pp. 489–494, 2005.

[27] L. Haring, S. Bieder, and A. Czylwik, “Closed-form estimatorsof carrier frequency offsets and channels in the uplink of mul-tiuser OFDM systems,” in Proceedings of IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP’06), vol. 4, pp. 661–664, Toulouse, France, May 2006.

[28] K.-H. Wu, W.-H. Fang, and J.-T. Chen, “Joint DOA-frequencyoffset estimation and data detection in uplink MIMO-OFDMnetworks with SDMA techniques,” in Proceedings of the 63rdIEEE Vehicular Technology Conference (VTC ’06), vol. 6, pp.2977–2981, Melbourne, Canada, May 2006.

[29] S. Sezginer and P. Bianchi, “Asymptotically efficient reducedcomplexity frequency offset and channel estimators for uplinkMIMO-OFDMA systems,” IEEE Transactions on Signal Pro-cessing, vol. 56, no. 3, pp. 964–979, 2008.

[30] V. Jungnickel, M. Schellmann, A. Forck, et al., “Demonstrationof virtual MIMO in the uplink,” in IET Smart Antennas andCooperative Communications Seminar, London, UK, October2007.

[31] M. Schellmann and V. Jungnickel, “Effects of multiple users’CFOs in OFDM-SDMA up-link: an interference model,” inProceedings of IEEE International Conference on Communica-tions (ICC ’06), vol. 10, pp. 4642–4647, Istanbul, Turkey, June2006.

[32] P. H. Moose, “Technique for orthogonal frequency divisionmultiplexing frequency offset correction,” IEEE Transactionson Communications, vol. 42, no. 10, pp. 2908–2914, 1994.


[34] J. H. Winters, J. Salz, and R. D. Gitlin, “The impact of antennadiversity on the capacity of wireless communication systems,”IEEE Transactions on Communications, vol. 42, no. 234, part 3,pp. 1740–1751, 1994.

[35] S. M. Kay, Fundamentals of Statistical Signal Processing, Volume1: Estimation Theory, Signal Processing Series, Prentice HallPTR, Upper Saddle River, NJ, USA, 1993.


Research Article

A Practical Scheme for Frequency Offset Estimation inMIMO-OFDM Systems

Michele Morelli, Marco Moretti, and Giuseppe Imbarlina

Dipartimento di Ingegneria dell’Informazione, University of Pisa, Via Caruso 16, 56122 Pisa, Italy

Correspondence should be addressed to Marco Moretti, [email protected]

Received 27 June 2008; Revised 3 October 2008; Accepted 25 December 2008


This paper deals with training-assisted carrier frequency offset (CFO) estimation in multiple-input multiple-output (MIMO)orthogonal frequency-division multiplexing (OFDM) systems. The exact maximum likelihood (ML) solution to this problem iscomputationally demanding as it involves a line search over the CFO uncertainty range. To reduce the system complexity, we dividethe CFO into an integer part plus a fractional part and select the pilot subcarriers such that the training sequences have a repetitivestructure in the time domain. In this way, the fractional CFO is efficiently computed through a correlation-based approach, whileML methods are employed to estimate the integer CFO. Simulations indicate that the proposed scheme is superior to the existingalternatives in terms of both estimation accuracy and processing load.

Copyright © 2009 Michele Morelli et al. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

1. Introduction

Orthogonal frequency-division multiplexing (OFDM) isan attractive modulation technique for wideband wirelesscommunications due to its robustness against multipathdistortions and flexibility in allocating power and data rateover distinct subchannels. For these reasons, it is adopted ina variety of applications, including digital audio broadcasting(DAB), digital video broadcasting (DVB), and the IEEE802.11a wireless local area network (WLAN) [1]. CombiningOFDM with the multiple-input multiple-output (MIMO)technology is an effective solution to increase the capacity ofpractical commercial systems. The deployment of multipleantennas at both the transmitter and receiver ends canbe exploited to improve reliability by means of space-timecoding techniques and/or to increase the data rate throughspatial multiplexing [2].

Similar to single-input single-output (SISO) OFDM,MIMO-OFDM is extremely sensitive to carrier frequencyoffsets (CFOs) induced by Doppler shifts and/or oscillatorinstabilities. The CFO destroys orthogonality among sub-carriers and must be accurately estimated and compensatedfor to avoid severe error rate degradations [3]. WhileCFO recovery is a well-studied problem for single antennasystems, only few solutions are available for MIMO-OFDM.

A blind kurtosis-based scheme is presented in [4], while amethod for jointly estimating the CFO and MIMO channelis derived in [5] by placing null subcarriers and pilot tonesacross adjacent OFDM blocks. Unfortunately, these methodsare quite complex as they require a large-point discreteFourier transform (DFT) operation and a computationallydemanding line search. Furthermore, they provide the CFOestimate upon observation of several OFDM blocks, andaccordingly, are not suited for packet-oriented applications,where synchronization must be completed shortly afterthe reception of a packet. In order to achieve fast timingand frequency recovery, training sequences with a periodicstructure are commonly employed in SISO-OFDM systems[6–8]. Extending this approach to MIMO-OFDM, however,is not straightforward as signals emitted from differentantennas give rise to multistream interference (MSI) at thereceiver station, which may degrade the accuracy of thesynchronization algorithms. The detrimental effect of MSIcan be alleviated by a careful design of the MIMO preambles.For instance, in [9], it is shown that the performance ofthe least-squares (LSs) channel estimator is optimized if thetraining sequences at different TX branches are orthogonaland shift-orthogonal for at least the channel length. To meetsuch requirement, a time-orthogonal design is employed in[10], where different TX antennas transmit their preambles


over disjoint time intervals. In this way, however, thepreamble length grows linearly with the number of TXbranches, thereby, increasing the system overhead. The useof chirp-like polyphase sequences is suggested in [11], whilea training block composed of repeated PN sequences withgood cross-correlation properties is employed in [12]. Inboth cases, the CFO estimate is obtained by cross-correlatingthe repetitive parts of the received preambles in a way similarto SISO-OFDM. This approach is also adopted in [13, 14],where the pilot sequences are obtained by repeating Chuor Frank-Zadoff codes with a different cyclic shift appliedat each TX antenna. Alternative criteria for MIMO-OFDMpreamble design can be found in [15, 16].

A subspace-based method for CFO estimation in MIMO-OFDM has recently been proposed in [17]. In this scheme,pilot symbols at different transmit antennas are frequency-division multiplexed (FDM) and placed over equally spacedsubcarriers. The resulting preambles are characterized by aninherent periodic structure in the time domain which canbe effectively exploited at the receiver to separate signalsarriving from different TX antennas. This approach isreminiscent of the multiple-signal-classification (MUSIC)-based frequency recovery scheme employed in [18] in thecontext of orthogonal frequency division multiple access(OFDMA). The main advantage with respect to [18] is that in[17], the CFO estimate is obtained with reduced complexityby looking for the roots of a real-valued polynomial function.A root-based approach is also adopted in [19] after writingthe CFO metric in polynomial form.

In this paper, the repetitive slots-based CFO estimatordiscussed in [8] is extended to MIMO-OFDM transmissions.In order to enlarge the frequency acquisition range, however,we decompose the CFO into a fractional part plus aninteger part. The fractional CFO is computed first by cross-correlating the repetitive segments of the received preamblesin a way similar to [8], while the integer CFO is subsequentlyestimated by resorting to maximum likelihood (ML) meth-ods. This results into an algorithm of affordable complexitywhich can estimate large CFOs and whose accuracy attainsthe relevant Cramer-Rao bound (CRB).

The rest of this paper is organized as follows. Section 2describes the system model and introduces basic notation. InSection 3, we review the joint ML estimation of the CFO andMIMO channel, while Section 4 is devoted to the trainingsequences design and CFO recovery scheme. Simulationresults are presented in Section 5 and some conclusions aredrawn in Section 6.

Notation 1. Matrices and vectors are denoted by boldfaceletters, with WN and IN being the DFT matrix and identitymatrix of order N , respectively. A = diag{a(n);n =1, 2, . . . ,N} denotes an N × N diagonal matrix with entriesa(n) along its main diagonal, while B−1 is the inverse ofa square matrix B. We use E{·}, (·)∗, (·)T , and (·)Hfor expectation, complex conjugation, transposition, andHermitian transposition, respectively. The notation ‖ · ‖represents the Euclidean norm of the enclosed vector, whileRe {x}, |x|, and arg{x} stand for the real part, modulus, andprincipal argument of a complex number x. Finally, [B]k,l

denotes the (k, l)th entry of a matrix B, while λ is a trial valueof the unknown parameter λ.

2. System Model

We consider a MIMO-OFDM system with NT transmittingand NR receiving antennas. We denote by N the number ofavailable subcarriers which are enumerated from n = 0 ton = N − 1 and call ci = [ci(0), ci(1), . . . , ci(N − 1)]T the fre-quency domain pilot sequence at the ith TX antenna. Beforetransmission, this sequence is converted in the time domainthrough an inverse discrete Fourier transform (IDFT) oper-ation and a cyclic prefix (CP) of length Ng is insertedto avoid inter-block interference (IBI). The signal emittedfrom the ith TX branch arrives at the mth RX antennaafter propagating through a multipath channel with discrete-time impulse response hm,i = [hm,i(0),hm,i(1), . . . ,hm,i(L −1)]T , where L is a design parameter that depends on theduration of the transmit/receive filters and on the channeldelay spread. Since one single oscillator is used for frequencyconversion at both ends of the wireless link, the sameCFO is assumed for all transmit/receive antenna pairs. Wedenote by xm = [xm(0), xm(1), . . . , xm(N − 1)]T the timedomain samples available at the mth RX antenna and defineΓ(ν) =diag{e j2πνk/N ; 0 ≤ k ≤ N−1}, where ν is the frequencyoffset normalized by the subcarrier spacing. Assuming idealtiming recovery and Ng ≥ L, we have

xm = Γ(ν)sm + nm, (1)

where nm is an N-dimensional vector of AWGN sampleswith zero-mean and variance σ2

n , while sm = [sm(0), sm(1),. . . , sm(N − 1)]T is the useful signal component, which ismodeled as

sm =NT∑

i=1

Aihm,i. (2)

In (2) , we have set Ai = WHNCiFL, where Ci = diag{ci(n); 0 ≤

n ≤ N − 1} collects the pilot sequence emitted by the ith TXantenna, while FL is an N × L matrix with entries[

FL]n,l = e− j2πnl/N , 0 ≤ n ≤ N − 1, 0 ≤ l ≤ L− 1. (3)

In Section 3 we show how to exploit vectors {xm; 1 ≤ m ≤NR} for jointly estimating the CFO ν and the MIMO channelH = {hm,i; 1 ≤ m ≤ NR, 1 ≤ i ≤ NT}. In doing so, weadopt the FDM training sequences suggested in [17], whichoptimize the performance of the LS channel estimator thanksto their shift orthogonality properties [9]. Such sequences areexpressed by

ci(n) =

⎧⎪⎨⎪⎩di(n′) n = n′Q + μi, 0 ≤ n′ ≤ N

Q− 1

0, otherwise,(4)

where Q is a power of two not smaller than NT , {μi} areinteger parameters satisfying 0 ≤ μ1 < μ2 < · · · < μNT <Q, and di(n′)} are pilot symbols with constant modulus|di(n′)| =

√Q/NT . In this way, the total energy allocated to

training amounts to ET = N and is equally split between theTX antennas.


3. Maximum Likelihood Frequency Estimation

Given the unknown parameters (H, ν), from (1), it turns outthat vectors {xm} are statistically independent and Gaussiandistributed with mean Γ(ν)sm and covariance matrix σ2

nIN .Hence, bearing in mind (2), the log-likelihood function(LLF) for (H, ν) takes the form

Λ(H, ν) = −NRN ln(πσ2

n

)

− 1σ2n

NR∑

m=1

∥∥∥∥∥xm − Γ(ν)NT∑

i=1

Aihm,i

∥∥∥∥∥

2

.(5)

As a consequence of the FDM property of the employedtraining sequences, we observe that AH

i1 Ai2 = FHL CHi1 Ci2 FL is

the null matrix for any i1 /= i2. Using this fact, after neglectingirrelevant terms independent of H and ν, we may rewrite theLLF as

Λ1(H, ν) = 2Re

{ NR∑

m=1

NT∑

i=1

hH

m,i AHi Γ

H(ν)xm

}

−NR∑

m=1

NT∑

i=1

∥∥Aihm,i∥∥

2

,

(6)

where we have borne in mind that ΓH(ν)Γ(ν) = IN . Thejoint ML estimate of the unknown parameters is the location

where Λ1(H, ν) achieves its global maximum. After standardcomputations, the CFO estimate is found to be

ν = arg maxν{g(ν)}, (7)

where

g(ν) =NR∑

m=1

∥∥LHΓH(ν)xm∥∥2

, (8)

and LLH is the following Cholesky decomposition:

LLH =NT∑

i=1

Ai(

AHi Ai

)−1AHi . (9)

In the sequel, we refer to (7) as the maximum likelihoodfrequency estimator (MLFE). The following remarks are inorder.

(1) Observing that AHi Ai = FHL CH

i CiFL with rank{FL} =L and rank {CH

i Ci} = N/Q, it turns out thatrank{AH

i Ai} ≤ min{L,N/Q}. Since AHi Ai has dimen-

sions L× L, a necessary condition for the existence of(AH

i Ai)−1 in the right-hand-side of (9) is that L ≤

N/Q. On the other hand, from (4), it follows thatAHi Ai has entries

[AHi Ai

]�1,�2

= Qej2πμi(�1−�2)/NN/Q−1∑

n′=0

e j2πn′(�1−�2)Q/N ,

0 ≤ �1, �2 ≤ L− 1,

(10)

and reduces to N · IL if L ≤ N/Q. In such a case, thefrequency metric simplifies to

g(ν) = 1N

NR∑

m=1

NT∑

i=1

∥∥AHi Γ

H(ν)xm∥∥2. (11)

(2) By invoking the asymptotic efficiency property ofthe MLFE, the frequency estimate (7) is expectedto be unbiased with an accuracy that approachesthe corresponding CRB for large data records andsufficiently high signal-to-noise ratios (SNRs). Usingthe LLF in (5), it is found that [19]:

CRB(ν) = σ2n

2∑NR

m=1 yHm(

IN − LLH)

ym, (12)

where ym = [ym(0), ym(1), . . . , ym(N − 1)]T is an N-dimensional vector with entries

ym(k) = 2πkN

· sm(k), 0 ≤ k ≤ N − 1. (13)

4. Frequency Estimation withReduced Complexity

4.1. Problem Formulation. Direct maximization of g(ν) in(8) undertakes heavy computational burden. One possibleway to reduce the system complexity is indicated in [19],where g(ν) is transformed into a real-valued polynomialfunction, and the CFO estimate is indirectly obtained bymeans of a polynomial rooting procedure. In this paper, wefollow the alternative approach outlined in [8], by whicha periodicity is first introduced in the MIMO trainingsequences, and CFO recovery is then accomplished bymeasuring the phase rotations between the repetitive partsof the received preambles. For this purpose, the sequences in(4) are modified so as to simultaneously satisfy the followingconstraints:

(C1) pilot symbols are equipowered, equispaced in thefrequency domain and modulate distinct subcarriersat different TX antennas according to the FDMprinciple;

(C2) each vector WHN ci (i = 1, 2, . . . ,NT) of time domain

samples is obtained by the repetition of R identicalsegments, where R is some power of two.

Condition C1 implies that the NT preambles remainshift-orthogonal in the time domain, which is desirableto enhance the accuracy of the channel estimates, whilecondition C2 facilitates CFO recovery by ensuring that thepreambles are periodic with period P = N/R.

To proceed further, let Q be a power of two with Q ≥NT . Then, it can be easily shown that C1 and C2 aresimultaneously met if pilot symbols at each TX antenna areequispaced in the frequency domain at a distance ofM = QR


subcarriers and their positions are shifted by R subcarriersfrom one TX branch to the next. This amounts to putting

ci(n) =

⎧⎪⎨⎪⎩di(n′), n = n′M + (i− 1)R, 0 ≤ n′ ≤ N

M− 1,

0, otherwise,

(14)

where we set |di(n′)| =√M/NT to ensure that the total

energy allocated to training is still ET = N . It is worthobserving that the use of time-repetitive FDM trainingsequences for MIMO-OFDM has also been suggested in [16]to make the CRB of the frequency estimates independent ofthe channel realization. However, our design (14) is moregeneral as it applies to any triple

(N ,NT ,L), whereas in [16],

the number of subcarriers is constrained to be a multiple ofNTL. Recalling that in practical OFDM systems N is alwaysa power of two, it turns out that the sequence design in [16]can only be adopted on condition that both NT and L arepowers of two.

As it is known, the use of OFDM preambles composed byR repetitive slots restricts the acquisition range of the CFOestimator to±R/2 times the subcarrier spacing. To cope withsuch a drawback, we decompose ν into a fractional part, lessthan R/2 in magnitude, plus an integer part which is multipleof R. The normalized CFO is thus rewritten as

ν = R(ε + η), (15)

where η is an integer parameter referred to as the integerCFO (ICFO), while ε is the fractional CFO (FCFO) andbelongs to the interval (−1/2, 1/2]. Since the transmittedpreambles remain periodic after passing through the channel(apart from the presence of thermal noise and from a phaseshift induced by the CFO), each vector of received timedomain samples can be decomposed into R segments xm =[xTm(0), xTm(1), . . . , xTm(R− 1)]T , with

xm(r) = ume j2πεr + nm(r), 0 ≤ r ≤ R− 1. (16)

In(16), um is a P-dimensional vector with elements

um(k) = e j2πνk/N sm(k), 0 ≤ k ≤ P − 1, (17)

while {nm(r); r = 0, 1, . . . ,R−1} are statistically independentGaussian vectors with zero-mean and covariance matrixσ2nIP .

4.2. Estimation of the Fractional CFO. Our first goal isthe estimation of ε based on the observations {xm}NR

m=1.Inspection of (16) reveals that this task is complicated bythe presence of the nuisance vectors {um}. One possibleapproach is to consider such vectors as deterministic butunknown parameters and proceed to the joint ML estimationof the parameter set (u, ε), with u = [

uT1 uT2 · · · uTNR

]T .This approach has been used in [8] in the context of SISO-OFDM, and its extension to MIMO transmissions leads tothe following FCFO metric:

q(ε) =NR∑

m=1

R−1∑

r=1

Re{Rm(r)e− j2πεr

}, (18)

where Rm(r) is the rP—lag sample correlation functionevaluated at the mth RX branch, that is,

Rm(r) =N−1∑

k=rPxm(k)x∗m(k − rP). (19)

The ML estimate of ε is eventually found by locating theglobal maximum of q(ε). Unfortunately, no closed formsolution is available except when R = 2. The more generalcase can be approached by an exhaustive search over theinterval ε ∈ (−1/2, 1/2] which may be cumbersome inpractice. For this reason, we suggest a suboptimal but simplerprocedure which develops in two steps. In the first step acoarse FCFO estimate is obtained as

ε(c) = 12π

arg

{ NR∑

m=1

Rm(1)

}. (20)

The rationale behind the above expression is easily under-stood after substituting (16) into (19). This yields

Rm(r) = (R− r)∥∥um∥∥2e j2πεr +Nm(r), (21)

where Nm(r) is a zero-mean disturbance term collectingsignal × noise and noise × noise interactions. Inspection of(21) reveals that, in the absence of noise, the right-hand-side of (20) is just the true FCFO. In order to improve theestimation accuracy, ε(c) is refined in the second step bylooking for an estimate of the residual error Δε = ε − ε(c).

For this purpose, we let R(c)m (r) = Rm(r)e− j2πε

(c)r and rewrite(18) in the following form:

q(Δε) =NR∑

m=1

R−1∑

r=1

∣∣Rm(r)∣∣ cos

[ϕ(c)m (r)− 2πΔεr

], (22)

where we have defined Δε = ε − ε(c) and ϕ(c)m (r) =

arg{R(c)m (r)}. Setting to zero the derivative of (22) with

respect to Δε and assuming that Δε is small enough such that

sin[ϕ(c)m (r)−2πΔεr] � ϕ(c)

m (r)−2πΔεr, an estimate of Δε canbe computed in closed form as

Δε = 12π

∑NRm=1

∑R−1r=1 r

∣∣Rm(r)∣∣ϕ(c)

m (r)∑NR

m=1

∑R−1r=1 r2

∣∣Rm(r)∣∣ . (23)

The final FCFO estimate is given by

ε = ε(c) + Δε. (24)

4.3. Estimation of the Integer CFO. If the normalized CFO isguaranteed to be less than R/2 in magnitude, the quantity εRcan be regarded as an estimate of ν. Otherwise, ν is expressedas in (15), and an estimate of the integer offset η must befound. This problem is now addressed using ML methods.

In order to compensate for the fractional offset ε, thereceived samples at each RX branch are first counter-rotatedat an angular speed 2πεR/N . This produces the NR vectorszm = [zm(0), zm(1), . . . , zm(N − 1)]T , with

zm = ΓH(εR)xm, 1 ≤ m ≤ NR. (25)


Substituting (1)-(2) into (25) and assuming ideal FCFOcompensation, we obtain

zm = Γ(ηR)NT∑

i=1

Aihm,i + n′m, (26)

where n′m = ΓH(εR)nm is the noise contribution, which isstatistically equivalent to nm. Vectors {zm} are next used toget the joint ML estimate of (H,η). Bearing in mind (26), thecorresponding LLF is found to be

Υ(H, η) = −NRN ln(πσ2

n

)

− 1σ2n

NR∑

m=1

∥∥∥∥∥zm − Γ(ηR)NT∑

i=1

Aihm,i

∥∥∥∥∥

2

,(27)

by which, maximizing with respect to hm,i, we obtain

hm,i(η) = (AHi Ai

)−1AHi Γ

H(ηR)zm. (28)

Now, we observe that AHi Ai = FHL CH

i CiFL is an L × L matrixwhose rank is not greater than min {L,N/M}. Hence, anecessary condition for the existence of (AH

i Ai)−1 is that

L ≤ N/M. In such a case, if the pilot sequences are thosedefined in (14), we have AH

i Ai = N · IL so that (28) simplifiesto

hm,i(η) = 1N

AHi Γ

H(ηR)zm. (29)

The concentrated likelihood function for η is found bysubstituting (29) into the right-hand-side of (27). Neglectingirrelevant terms independent of η, we obtain

ψ(η) =NR∑

m=1

NT∑

i=1

∥∥AHi Γ

H(ηR)zm∥∥2

, (30)

and the ML estimate of η is computed as

η = arg max|η|≤|η|max

{ψ(η)

}, (31)

where |η|max represents the largest expected value of |η|,which is determined by the stability of the transmitter andreceiver oscillators. Recalling that Ai = WH

NCiFL, afterstandard manipulations, we may put ψ(η) in the equivalentform

ψ(η) =NR∑

m=1

NT∑

i=1

L−1∑

�=0

∣∣∣∣∣

N−1∑

n=0

c∗i (n)Zm(n + ηR)e j2πn�/N∣∣∣∣∣

2

, (32)

where {Zm(n)} is the repetition with period N of the DFT ofzm, that is,

Zm(n) =N−1∑

k=0

zm(k)e− j2πkn/N for −∞ < n < +∞. (33)

On the other hand, from (14), we see that symbols ci(n) aredifferent from zero only when n = pi(n′), where pi(n′) =

n′M + (i − 1)R are the indices of the pilot subcarriers at theith TX antenna. Function ψ(η) can thus be rewritten as

ψ(η)

=NR∑

m=1

NT∑

i=1

L−1∑

�=0

∣∣∣∣∣

N/M−1∑

n′=0

d∗i (n′)Zm[pi(n′) + ηR

]e j2πpi(n

′)�/N

∣∣∣∣∣

2

.

(34)

Once the ICFO is obtained as indicated in (31), an estimateof the CFO is computed from (15) in the form

ν = R(ε + η). (35)

In the sequel, we refer to (35) as the reduced complexityfrequency estimator (RCFE).

4.4. Remarks. (1) As mentioned previously, matrix AHi Ai in

(28) is nonsingular provided that L ≤ N/M. Such conditionis more restrictive than the constraint L ≤ N/Q that wasfound in the previous section for MLFE. In particular,recalling that M = QR, it turns out that the maximumchannel length that RCFE can manage is R times smaller thanfor MLFE.

(2) Assuming for simplicity that the ICFO has beenperfectly estimated, from (35), it follows that E{(ν − ν)2} =R2 · E{(ε− ε)2}. Since parameters (u, ε) are jointly estimatedthrough ML methods, we expect that E{(ε − ε)2} asymp-totically approaches the corresponding CRB. The latter isprovided in [8] and reads

CRB(ε) = 32π2

σ2n/σ

2s

NRN(R2 − 1

) , (36)

where σ2s denotes the average signal power at each RX branch,

that is,

σ2s =

1PNR

NR∑

m=1

P−1∑

k=0

∣∣sm(k)∣∣2. (37)

The frequency MSE is thus given by

E{

(ν− ν)2} = 32π2

σ2n/σ

2s

NRN(1− 1/R2

) . (38)

(3) The computational load of RCFE can be assessedas follows. Computing the correlations {Rm(r)}R−1

r=1 in (19)requires a total of 2(R−1)(2N−1) real operations (additionsplus multiplications) for each RX branch, while 8NR(R −1) operations are needed to obtain Δε in (23). QuantitiesZm(n) in (33) are computed through an N-point DFT foreach receiving antenna, with a corresponding complexityof 5NRN log2 N . Finally, evaluating ψ(η) in (34) needsadditional 8NLNTNR/M operations for each η. The overallcomplexity of RCFE is summarized in the first row of Table 1,where a distinction has been made between the FCFO andICFO recovery tasks, and we have denoted byNη = 2|η|max +1 the number of hypothesized ICFO values.

(4) Our FCFO recovery algorithm is an improvedversion of the correlation-based frequency estimator (CBFE)


Table 1: Complexity of FCFO and ICFO estimation schemes.

FCFO recovery ICFO recovery

RCFE 2NR(R− 1)(2N + 3) NRN(5 log2N + 8LNTNη/M)

PBFE 4NRNQ + 30(Q − 1)3 NRN(5 log2N + 4NT)

CBFE 4NR(3N − 2P − 1)

proposed in [12]. Actually, both schemes employ trainingpreambles composed by R repetitive parts and operate in twosteps. A coarse estimate ε(c) is firstly computed by CBFE in away similar to (20), and it is next refined by evaluating thequantity

Δε = 1πR

arg

{ NR∑

m=1

R(c)m

(R

2

)}. (39)

The final CFO estimate is obtained as νCBFE = R(ε(c) + Δε),and its MSE is given by [12]

E{(

νCBFE − ν)2} = 2

π2

σ2n/σ

2s

NRN. (40)

Comparing this results with (38), we see that the loss (indB) with respect to RCFE is 10·Log[4(1 − 1/R2)/3], whichapproaches 1.25 dB for large values of R. Furthermore, sinceno ICFO estimation is attempted in [12], the estimationrange of CBFE is restricted to |ν| ≤ R/2, while RCFE cancope with CFOs as large as ±N/2. The overall complexityof CBFE is shown in the third line of Table 1. Compared toFCFO recovery by means of RCFE, the computational savingof CBFE is in the order of R/3.


Computer simulations have been run to check and extendthe analytical results of the previous sections. The simulationscenario is summarized as follows.

5.1. Simulation Model. The investigated MIMO-OFDMsystem has N = 1024 subcarriers and operates in the5 GHz frequency band. The signal bandwidth is 5 MHz,corresponding to a subcarrier distance of approximately4.9 kHz. The sampling period is Ts = 0.2 microsecond,so that the useful part of each OFDM block has length0.205 millisecond. Each channel is characterized by L =12 independent Rayleigh fading taps with an exponentiallydecaying power delay profile

E{∣∣hm,i(�)

∣∣2} = σ2h · exp

(− 4�

L

), � = 0, 1, . . . ,L− 1.

(41)

In (41), the constant σ2h is chosen such that the channel

power is normalized to unity, that is, E{‖hm,i‖2} = 1. A newchannel snapshot is generated at each simulation run andkept fixed over the training period. Vectors hm,i are assumedto be statistically independent for different TX/RX antenna

10−6

10−5

10−4

MSE

0 3 6 9 12 15 18

SNR (dB)

RCFEPBFECBFE

EMCB

NT = 3,NR = 2

Figure 1: MSE of the FCFO estimators versus SNR with NT = 3 andNR = 2.

pairs . The training sequences employed by RCFE are givenin (14), where we have set R = 8 and Q = 4. In thisway, each TX antenna transmits a total of 32 pilot symbolswhich are randomly taken from a QPSK constellation withpower |di(n′)|2 = 32/NT . Parameters NT and NR are variedthroughout simulations to assess their impact on the systemperformance.

Comparisons are made between RCFE, CBFE, and thepolynomial-based frequency estimator (PBFE) proposed in[17]. This scheme employs the training sequences defined in(4) and performs initial ICFO recovery by maximizing thefollowing cost function:

ψPBFE(η) =NR∑

m=1

NT∑

i=1

N/Q−1∑

n′=0

∣∣Xm(n′Q + μi + η

)∣∣2 (42)

over the set η ∈ {−Q/2,−Q/2 + 1, . . . ,Q/2 − 1}, with{Xm(n); 0 ≤ n ≤ N − 1} being the N-point DFTof xm. After ICFO compensation, the fractional CFO iseventually estimated by looking for the roots of a real-valuedpolynomial function that is obtained by applying the MUSICprinciple. As mentioned in [17], the estimation range ofPBFE is |ν| ≤ Q/2. Its computational requirement is mainlyascribed to the need for evaluating the correlation matrix ofthe received time domain samples and is summarized in thesecond row of Table 1.

5.2. Performance Assessment. Figure 1 compares the perfor-mance of the fractional CFO estimators in terms of theirMSE E{(ν − ν)2} versus the signal-to-noise ratio at eachreceiving antenna. The latter is defined as SNR = σ2

s /σ2n ,

where σ2n is the noise power, and σ2

s is given in (37). Marksindicate simulation results, while solid lines are drawn to


10−6

10−5

10−4

MSE

0 3 6 9 12 15 18

SNR (dB)

NT = 2NT = 3NT = 4

EMCB

RCFENR = 2

Figure 2: Accuracy of RCFE versus SNR withNT=2, 3, 4 andNR=2.

ease the reading of the graphs. The number of TX and RXantennas is NT = 3 and NR = 2, respectively. The sametraining sequences are used for both CBFE and RCFE, whilePBFE employs the pilot design specified in (4) with Q = 32and {μ1,μ2,μ3} = {0, 1, 5}. This means that the numberof pilot symbols transmitted by each TX antenna is 32 forall the considered schemes. As suggested in [17], the pilotsymbols {di(n′)} for PBFE belong to a Chu sequence. TheCFO is randomly generated at each simulation run withuniform distribution within the interval [−0, 4; 0.4), whichcorresponds to having η = 0 and ε = ν/R. For the time being,we concentrate on the accuracy of the FCFO estimates andassume ideal ICFO recovery for both RCFE and PBFE. Weuse the average CRB to benchmark the performance of theconsidered schemes. The latter corresponds to the extendedMiller and Chang bound (EMCB) [20] and is obtainedby numerically averaging the right-hand-side of (12) withrespect to the channel statistics. Inspection of Figure 1 revealsthat RCFE outperforms the other schemes, and its accuracyis close to the EMCB at all investigated SNR values. Aspredicted by the theoretical analysis shown in (38) and (40),the loss of CBFE with respect to RCFE is approximately1.25 dB. Looking at the system complexity, from Table 1, itturns out that in the considered scenario, RCFE requires atotal of 57 500 operations for FCFO recovery, while PBFEand CBFE need 1 156 000 and 24 000 operations, respectively.Combining these figures with the results of Figure 1 indicatesthat RCFE is superior to PBFE in terms of both estimationaccuracy and processing load, while CBFE is a valid solutionwhen limiting the computational requirement is an issue ofconcern.

Figure 2 illustrates the impact of the number of transmitantennas NT on the accuracy of RCFE. The simulation

10−6

10−5

10−4

MSE

0 3 6 9 12 15 18

SNR (dB)

NR = 2NR = 3NR = 4

EMCB

RCFENT = 3

Figure 3: Accuracy of RCFE versus SNR withNT=3 andNR=2, 3, 4.

scenario is the same as in Figure 1, except that now NT =2, 3 or 4. As it is seen, the frequency MSE is virtuallyindependent of NT and the same occurs for the EMCB. Suchbehavior can be ascribed to the fact that signals emittedby different TX antennas combine incoherently at each RXbranch, so that higher values of NT do not result into acorresponding increase of the array gain. As it is known,array gain exploitation by means of multiple TX antennasrequires channel knowledge at the transmitter in conjunctionwith suitable precoding techniques.

Figure 3 shows how the performance of RCFE is affectedby the number NR of receiving antennas. In such a case,NT isfixed to three while NR = 2, 3 or 4. As predicted by (38), theestimation accuracy improves with NR, and this trend is alsoevident in the EMCB. The physical reason behind such SNRadvantage is that the presence of multiple receiving antennasincreases the length of the data record x = [xT1 , xT2 , . . . , xTNR

]T

used for CFO recovery. This provides the system with anarray gain of 10·Log(NR) dB.

The performance of the ICFO estimators is illustrated inFigure 4 in terms of probability of failure Pf = Pr{η /=η}versus SNR. Comparisons are made between RCFE and PBFEusing the same simulation setup of Figure 1. The RCFEmetric defined in (34) is evaluated for η ∈ {−2,−1, 0, 1, 2},while PBFE looks for the maximum of ψPBFE(η) over the setη ∈ {−16,−15, . . . , 15}. In this way, the estimation range is|ν| ≤ 20 for RCFE and |ν| ≤ 16 for PBFE. As it is seen, forSNR > −10 dB, the best performance is obtained with RCFE.From Table 1, it follows that the total number of operationsneeded to get the CFO estimate ν is 1 283 000 for PBFEand 252 500 for RCFE, thereby leading to a reduction of theprocessing load by a factor greater than 5. It is fair to say,however, that the complexity of PBFE can be controlled by


10−4

10−3

10−2

10−1

100

Pf

−18 −15 −12 −9 −6

SNR (dB)

RCFEPBFE

NT = 3,NR = 2

Figure 4: Probability of failure versus SNR for RCFE and PBFE withNT = 3 and NR = 2.

a judicious design of parameter Q. Specifically, decreasing Qalleviates the computational requirement at the expense of areduced CFO acquisition range.

6. Conclusions

We have addressed the problem of training-assisted CFOrecovery in MIMO-OFDM systems. To reduce the compu-tational burden required by the exact ML solution, we havedivided the CFO into a fractional part plus an integer partand have designed FDM pilot sequences that are periodic inthe time domain. The fractional CFO is estimated in closedform by measuring the phase rotations between the repetitiveparts of the received preambles, while the integer CFO isestimated in a joint fashion with the MIMO channel matrixby resorting to the ML principle. The proposed scheme hasaffordable complexity and exhibits improved performancewith respect to existing alternatives. For these reasons, webelieve that it provides an effective approach for frequencysynchronization in beyond third generation (3G) widebandMIMO-OFDM transmissions.

References

[1] “Wireless LAN medium access control (MAC) and physicallayer (PHY) specifications, higher speed physical layer exten-sion in the 5 GHz band,” 1999.

[2] G. L. Stuber, J. R. Barry, S. W. Mclaughlin, Y. E. Li, M. A.Ingram, and T. G. Pratt, “Broadband MIMO-OFDM wirelesscommunications,” Proceedings of the IEEE, vol. 92, no. 2, pp.271–294, 2004.

[3] T. Pollet, M. van Bladel, and M. Moeneclaey, “BER sensitivityof OFDM systems to carrier frequency offset and Wiener phasenoise,” IEEE Transactions on Communications, vol. 43, no. 234,pp. 191–193, 1995.


[5] X. Ma, M.-K. Oh, G. B. Giannakis, and D.-J. Park, “Hoppingpilots for estimation of frequency-offset and multiantennachannels in MIMO-OFDM,” IEEE Transactions on Communi-cations, vol. 53, no. 1, pp. 162–172, 2005.


[7] M. Morelli and U. Mengali, “An improved frequency offsetestimator for OFDM applications,” IEEE CommunicationsLetters, vol. 3, no. 3, pp. 75–77, 1999.

[8] M. Ghogho, A. Swami, and P. Ciblat, “Training design forCFO estimation in OFDM over correlated multipath fadingchannels,” in Proceedings of the 50th Annual IEEE GlobalTelecommunications Conference (GLOBECOM ’07), pp. 2821–2825, Washington, DC, USA, November 2007.

[9] I. Barhumi, G. Leus, and M. Moonen, “Optimal trainingsequences for channel estimation in MIMO OFDM systems inmobile wireless channels,” in Proceedings of the InternationalZurich Seminar on Broadband Communications: Accessing,Transmission, Networking, pp. 441–446, Zurich, Switzerland,February 2002.

[10] A. van Zelst and T. C. Schenk, “Implementation of a MIMOOFDM-based wireless LAN system,” IEEE Transactions onSignal Processing, vol. 52, no. 2, pp. 483–494, 2004.

[11] A. N. Mody and G. L. Stuber, “Synchronization for MIMOOFDM systems,” in Proceedings of IEEE Global Telecommuni-catins Conference (GLOBECOM ’01), vol. 1, pp. 509–513, SanAntonio, Tex, USA, November 2001.

[12] C. Yan, S. Li, Y. Tang, and X. Luo, “Frequency synchronizationin MIMO OFDM system,” in Proceedings of the 60th IEEEVehicular Technology Conference (VTC ’04), vol. 3, pp. 1732–1734, Los Angeles, Calif, USA, September 2004.

[13] T. C. W. Schenk and A. van Zelst, “Frequency synchronizationfor MIMO OFDM wireless LAN systems,” in Proceedings of the58th IEEE Vehicular Technology Conference (VTC ’03), vol. 2,pp. 781–785, Orlando, Fla, USA, October 2003.

[14] J. Zheng, J. Han, J. Lv, and W. Wu, “A novel timing andfrequency synchronization scheme for MIMO OFDM system,”in Proceedings of the International Conference on WirelessCommunications, Networking and Mobile Computing (WiCOM’07), pp. 420–423, Shanghai, China, September 2007.

[15] H. Minn, N. Al-Dhahir, and Y. Li, “Optimal training signalsfor MIMO OFDM channel estimation in the presence offrequency offset and phase noise,” IEEE Transactions onCommunications, vol. 54, no. 10, pp. 1754–1759, 2006.

[16] M. Ghogho and A. Swami, “Training design for multipathchannel and frequency-offset estimation in MIMO systems,”IEEE Transactions on Signal Processing, vol. 54, no. 10, pp.3957–3965, 2006.

[17] Y. Jiang, H. Minn, X. Gao, X. You, and Y. Li, “Frequency offsetestimation and training sequence design for MIMO OFDM,”IEEE Transactions on Wireless Communications, vol. 7, no. 4,pp. 1244–1254, 2008.

[18] Z. Cao, U. Tureli, and Y.-D. Yao, “Deterministic multiusercarrier-frequency offset estimation for interleaved OFDMAuplink,” IEEE Transactions on Communications, vol. 52, no. 9,pp. 1585–1594, 2004.


[19] Y. Jiang, X. You, X. Gao, and H. Minn, “MIMO OFDMfrequency offset estimator with low computational com-plexity,” in Proceedings of IEEE International Conference onCommunications (ICC ’07), pp. 5449–5454, Glasgow, Scotland,June 2007.

[20] F. Gini and R. Reggiannini, “On the use of Cramer-Rao-likebounds in the presence of random nuisance parameters,” IEEETransactions on Communications, vol. 48, no. 12, pp. 2120–2126, 2000.


Research Article

Estimation of CFO and Channels in Phase-Shift OrthogonalPilot-Aided OFDM Systems with Transmitter Diversity

Carlos Ribeiro1 and Atılio Gameiro2

1 Escola Superior de Tecnologia e Gestao, Instituto Politecnico de Leiria, Morro do Lena, Alto Vieiro, 2411-901 Leiria, Portugal2 Instituto de Telecomunicacoes, Universidade de Aveiro, 3810-193 Aveiro, Portugal

Correspondence should be addressed to Carlos Ribeiro, [email protected]

Received 1 July 2008; Revised 4 November 2008; Accepted 23 January 2009


We present a CFO estimation algorithm and an associated channel estimation method for broadband OFDM systems withtransmitter diversity. The CFO estimation algorithm explores the TD structure of the transmitted symbols carrying pilots anddata, relying solely on the data component present on the symbols to estimate the CFO, thus avoiding additional overhead liketraining symbols or null subcarriers. An intermediate output of the CFO algorithm provides an easy-to-get initial CIR estimatethat will be improved with the utilization of a TD LMMSE filter. The feasibility of the investigated methods is substantiated bysystem simulation using indoor and outdoor broadband wireless channel models. Simulation results show that the joint algorithmsprovide a near optimal system’s performance.

Copyright © 2009 C. Ribeiro and A. Gameiro. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

1. Introduction

Future mobile broadband applications will require reliablehigh data-rate wireless communication systems. In recentyears, multiple-input multiple-output orthogonal frequencydivision multiplexing (MIMO-OFDM) transmission systems[1–4] emerged as the scheme with the potential to fulfillthese conditions, with bandwidth efficiency and robustnessto frequency selective channels, common in mobile personalcommunication systems.

Various forms of OFDM have been adopted in differentstandards: WIMAX, LTE, IEEE.802.11a/g [5], IEEE.802.16[6], and DAB/DVB [1]. However, the long symbol durationmakes OFDM systems particularly sensitive to carrier fre-quency offsets (CFOs) that always exist between the basestation (BS) and mobile terminal (MT). The presence ofCFO destroys the orthogonality among subcarriers leading tointercarrier interference (ICI), that causes severe degradationof the system’s bit error rate (BER) [7–9].

The estimation and removal of the CFO has been thefocus of a considerable number of works published in recentyears. The algorithms can be categorized as blind or data-aided. The first category explores the properties of the

received symbols (commonly the cyclic prefix (CP)) [10–12].The data-aided algorithms use dedicated training symbols[13, 14] or exploit the presence of null subcarriers [15, 16].

The accurate extraction of the channel state informationis crucial to realize the full potential of the MIMO-OFDMsystem. The performance of the channel estimator is vital fordiversity combining, coherent detection and decoding, andresource allocation operations. The cochannel interferenceinherent to the system, where the received signal is thesuperposition of the signals transmitted simultaneous fromthe different antennas, puts an additional challenge on thedesign of the channel estimation algorithm.

A decision-directed channel estimation scheme thatattempted to minimize the cochannel interference waspublished in [17]. The proposed algorithm exhibits a highcomputational load. A simplified and enhanced algorithm,introducing a data-aided scheme for the data transmissionmode, is presented in [18]. The topic attracted a significantattention and has been the focus of investigation in multiplepublications [19–21] and references therein.

The design of training symbols and pilot sequenceswith the ability to decouple the cochannel interference andminimize the channel estimation mean square error (MSE)


for MIMO-OFDM was addressed in several publications[18, 22, 23].

Most publications on the topic of training-signal or pilot-aided channel estimation use the frequency-domain (FD)least squares (LS) estimates as the starting point for theanalysis of the estimation algorithm or the design of thetraining sequence. It was established in [24] that in single-input single-output (SISO) OFDM a time-domain (TD)equivalent to LS estimate could be obtained using a simplelinear operation on the received signal, if the used pilotsequence fulfills certain conditions.

This paper contains a proposal for a CFO estimationalgorithm and associated channel estimation method forOFDM systems with transmitter diversity that exploits astandardized transmission format, where FD pilot symbolsare regularly spread in the OFDM symbols. To minimizethe pilot overhead, the pilot subcarriers are shared amongall transmit antennas. To mitigate the resulting cochannelinterference, the system adopts phase-shifted pilot sequencesper transmit antenna [18]. By exploring the TD propertiesof the received symbols, the proposed algorithms are able toestimate and remove the CFO, separate each of the CIRs, andgenerate the final channel estimate, without requiring anyadditional overhead (training symbols or null subcarriers).By performing most of the operations on the TD receivedsymbols and sharing operations, the overall computationalload required to implement both algorithms is affordable forreal-time implementations.

The paper is organized as follows. Section 2 gives abrief introduction to the wireless multipath channel and theOFDM baseband model. In Section 3, the investigated CFOand channel estimation algorithms are developed. The feasi-bility of the developed method is substantiated by simulationresults presented in Section 4. Finally, conclusions are drawnin Section 5.

2. OFDM in Mobile Wireless Channels

Before introducing the investigated method, we will brieflyoverview the mobile wireless multipath channel and theconsidered OFDM baseband model.

Throughout the text, the notation (∼) is used for TDvectors and elements, and its absence denotes frequency-domain (FD) vectors and elements. The index n denotes TDelements and k FD elements. Unless stated otherwise, thevectors involved in the transmission/reception process arecolumn vectors with NC complex elements. The superscripts(·)T and (·)H denote transpose and Hermitian transpose,respectively.

2.1. The Wireless Multipath Channel. Let us consider thatthe system transmits over multipath Rayleigh fading wirelesschannels modeled by the discrete-time channel impulseresponse (CIR):

h[n] =Lp−1∑

l=0

αlδ[n− τl

], (1)

where Lp is the number of channel paths, αl and τl arethe complex value and delay of path l, respectively. Thepaths are assumed to be statistically independent, with

normalized average power,∑Lp−1

l=0 σ2h [l] = 1, where σ2

h [l] isthe average power of path l. The channel is time variantdue to the motion of the mobile terminal (MT), but wewill assume that the CIR is constant during one OFDMsymbol. The time dependence of the CIR is not present inthe notation for simplicity. Assuming that the insertion ofa long enough cyclic prefix (CP) in the transmitter assuresthat the orthogonality of the subcarriers is maintained aftertransmission, the channel frequency response (CFR) can beexpressed as

h[k] =Lp−1∑

l=0

αl exp(− j

2πNC

kτl

), (2)

where NC is the total number of subcarriers of the OFDMsystem.

2.2. OFDM Baseband Model. Consider the OFDM basebandsystem with nS transmit antennas depicted in Figure 1. ThenS vectors ds hold the M-ary PSK or QAM coded data to betransmitted.

To assist in the channel estimation process, pilot symbolsare added in each transmit antenna path. The nS vectorsps hold the pilot values for each path. The pilots aretransmitted in dedicated subcarries (vectors ps and dscontain nonzero values in disjoint positions). The resultingFD signal transmitted by antenna s is ss=ds+ps. All transmitantennas use the common set of subcarriers ℘ to convey theoverlapping pilot sequences. The pilots are regularly spreadevery Nf subcarriers. The pilot separation can range from 1(particular case where all subcarriers in the OFDM symbolare dedicated to transmit pilots–training symbol) to NC ,fulfilling the condition NC/Nf nS ∈ N.

The system uses distinct phase-shifted pilot sequencesin each transmit antenna to allow the separation of thesequences in the receiver. The kth element of the vector psis defined by

ps[k] =Nt−1∑

m=0

δ[k − kini −mNf

]exp

(− j2π

s

nSm)

, (3)

whereNt = NC/Nf , and kini ∈ {0, . . . ,Nf −1} is the first pilotsubcarrier.

The inverse discrete Fourier transform (DFT) blockpresent in each antenna path transforms the input vectorinto the TD vector ss, using an efficient NC-points inversefast Fourier transform (FFT) algorithm.

An L samples long guard interval, in the form of CP, isprefixed to vector ss, resulting in the TD transmitted vector

xs=ACPFHss=ACP(ds+ps

), (4)


d0 s0 s0 x0 y r rc r d bp0

... h

dns−1 sns−1 sns−1 xns−1pns−1

Frame IFFT(Nc)

CP CPCFO &ch. est.

FFT(Nc)

De-frame

Decode

FrameIFFT(Nc)

CP

Figure 1: OFDM baseband system model.

where F � N−1/2C exp (− j(2π/NC)kn)NC−1,NC−1

k,n=0,0 is the NC ×NC DFT matrix, and ACP = [INC , L INC ]

Tis the matrix that

adds the CP, with INC denoting the NC × NC identity matrixand INC ,L denoting the last L columns of INC . The TD vectors

ds and ps collect, respectively, the components of the datasymbols and pilot symbols present in ss. The nS vectors ss aresimultaneously transmitted to the receiver’s antenna.

Let wo = 2π foΔt be the normalized angular CFO, wherefo is the frequency offset due to the frequency mismatch ofthe oscillators of the transmitter and the receiver, and Δt isthe sampling interval.

The nth received signal sample of the ith symbol can beexpressed as

yi[n]

= exp[jwo(i(NC + L

)+ n)]nS−1∑

s=0

Lp−1∑

l=0

hs,i[l]xs,i[n− l] + ni[n],

(5)

where ni[n] is a sample of independent and identicallydistributed (iid) zero mean additive white Gaussian noise(AWGN) with variance σ2

n . Collecting the (NC + L) samplesof the symbol,

yi = exp[jwoi

(NC + L

)]C(NC+L)(wo)

nS−1∑

s=0

Hlins,i xs,i + zi + ni,

(6)

where the vector ni collects the noise samples that affectthe ith symbol, the vector zi represents the intersymbolinterference (ISI) caused by the channel dispersion, and thematrix Hlin

s,i is the (NC+L)×(NC+L) lower triangular Toeplitz

channel convolution matrix with first column hs,i (column(NC + L)-vector with the discrete-time CIR (its elements aredefined by (1)) padded with zeros). The (NC + L)× (NC + L)diagonal matrix that holds the phase rotation that affectseach symbol sample is

C(NC+L)(wo)

= diag([

1 exp(jwo) · · · exp

[jwo(NC + L− 1

)]]).

(7)

The receiver starts by removing the CP from the receivedsymbol. Dropping the symbol index, the resulting vector is

r = RCPy

= exp[jwoi

(NC + L

)]RCPC(NC+L)

(wo)

×nS−1∑

s=0

Hlins xs + RCPz + n

= θiniCNC

(wo)nS−1∑

s=0

RCPHlins ACPss + RCPz + n,

(8)

where RCP = [0(NC×L) INC ] is the matrix that removes theCP with 0(NC×L) representing the (NC × L) null matrix, n =RCPn is the resulting TD noise column NC-vector, and θini =exp[ jwo(i(NC + L) + L)] is the common phase that affects allsamples of the ith symbol. The last step in (8) was possibleconsidering the structure of the matrices involved

RCPC(NC+L)(wo) = exp

(jwoL

)CNC

(wo)

RCP. (9)

With the assumption that the length of the CP is largerthan the duration of CIR, the ISI is completely removed, and(8) can be written as

r = θiniCNC

(wo)nS−1∑

s=0

Hcircs ss + n

= θiniCNC

(wo)nS−1∑

s=0

FHHsss + n,

(10)

where Hcircs = RCPHlin

s ACP is the NC × NC circulant matrixwith circulant vector hs and, due to the properties of the DFT,Hs = FHcirc

s FH = diag(hs), with the elements of hs defined by(2).

The CFO and channel estimation block is responsible forestimating both the CFO that affects the received samplesand the nS channels that disturbed the transmission process.Both estimation algorithms will be introduced in the nextsection. Moreover, this block is also responsible for reducingthe CFO, using the estimated CFO value wo. This operationcan be described by

rc = θiniCHNC

(wo)

CNC

(wo)nS−1∑

s=0

FHHsss + CHNC

(wo)

n. (11)

It is clear that if wo = wo, then CHNC

(wo)CNC (wo) = INC , andthe CFO is completely removed. As it will be demonstrated


in the next section, the CFO ambiguity remaining afterthis block is an integer multiple of the pilot subcarrierseparation NfΔ f , where Δ f is the subcarrier separation.This acquisition range should be sufficient for currentOFDM systems; however coarse CFO estimation techniques[25] can be used to tackle this limitation, if proven necessary.

The DFT block transforms the vector r to FD withan efficient FFT operation. Assuming that the CFO iscompletely eliminated, the resulting FD column NC-vectorcan be expressed as

r = F rc = θini

nS−1∑

s=0

Hsss + n, (12)

where n = FCHNC

(wo)n is the resulting FD noise vector. Theremaining phase-rotation θini is naturally removed in thechannel estimation process, assuming that the pilot-aidedscheme calculates the LS estimates (back-rotated receivedsignal).

The deframing block separates the signals in the subcarri-ers conveying pilots and data symbols. The values in the data

subcarriers are collected in vector d and fed to the decoding

block. Together with the channels’ estimate hs, this block isnow able to decode the received symbols, according to somedecision rule, and generate the estimate of the transmitted

data b.

3. CFO and Channel Estimations byExploring the TD Properties ofPhase-Shifted Pilot Sequences

The algorithms implemented in this block estimate boththe nS channels over which the transmission occurred andthe CFO that affects the received signal. The inputs to theCFO estimation algorithm are the TD symbols carryingboth pilots and data, according to the model defined in theprevious section. The channel estimation algorithm reuses anintermediate output of the previous operation to attain aninitial CIR estimate with minimal computational load.

3.1. Analysis of the TD Symbol’s Structure. From (10), eachelement of the TD received symbols (carrying pilots anddata), after CP extraction, can be expressed by

r[n]

= θini exp(jwon

)nS−1∑

s=0

L−1∑

l=0

hs[l](ds[n− l] + ps[n− l]

)+ n[n],

(13)

where the elements of the TD data vector ds are

ds[n] = N−1/2C

NC−1∑

k=0k /∈℘

ds[k] exp(j

2πNC

kn)

, (14)

where ds[k] is the kth element of ds (complex data symbolconveyed by the kth subcarrier of the s transmit antennapath), and the elements of the TD pilot vector ps are

ps[n] = N−1/2C

NC−1∑

k=0

Nt−1∑

m=0

δ[k − kini −mNf

]

× exp(− j2π

s

nSm)

exp(j

2πNc

kn)

= N1/2C N−1

f exp(j

2πNc

kinin)Nf −1∑

m=0

δ[n− s

nSNt −mNt

].

(15)

Replacing (14) and (15) in (13),

r[n] = θiniN−1/2C exp

(jwon

)

×nS−1∑

s=0

L−1∑

l=0

NC−1∑

k=0k /∈℘

hs[l]ds[k] exp[j

2πNC

k(n− l)]

+ n[n] + θiniN1/2C N−1

f exp(jwon

)

×nS−1∑

s=0

L−1∑

l=0

Nf −1∑

m=0

hs[l] exp[j

2πNc

kini(n− l)]

× δ[n− l − s

nSNt −mNt

]

= rd[n] + r p p[n] + n[n],

(16)

where rd and rp hold the data-dependent and pilot-dependent components in r, respectively.

By expanding the pilot-dependent vector rp

r p[n] = θiniN1/2C N−1

f exp(jwon

)

×nS−1∑

s=0

Nf −1∑

m=0

exp[j

2πNc

kini

(mNt +

s

nSNt

)]

× hs[n− s

nSNt −mNt

]

= θiniN1/2C N−1

f exp(jwon

)

×Nf −1∑

m=0

exp(j

2πNc

kinimNt

)h0[n−mNt] + · · ·

+ θiniN1/2C N−1

f exp(jwon

)

×Nf −1∑

m=0

exp[j

2πNc

kini

(mNt +

nS − 1nS

Nt

)]

× h(nS−1)

[n− nS − 1

nSNt −mNt

],

(17)

it becomes clear that it is made up of Nf frequency-shiftedand scaled replicas of each of the nS CIR. Moreover, thereplicas of each CIR are separated byNt samples and transmit


antenna s CIR replicas are time-shifted (s/nS)Nt samplesfrom the reference position mNt , m ∈ {0, . . . ,Nf − 1}.

3.2. CFO Estimation. The CFO estimation method intro-duced in the following uses the pilot structures, introducedprimarily for channel estimation purposes, to estimatethe CFO present in the received samples. Therefore, it isabsolutely bandwidth efficient, as it does not require anyadditional specific overhead. The algorithm exhibits a fastacquisition, being able to output an estimate with lowdeviation from a single OFDM frame. It proves adequate forburst mode transmission, where the frequency offset variesfrom frame to frame.

The algorithm requires a search within the acquisitionrange to find the minimum value of the cost function. Aninitial candidate angular frequency offset w is applied to theinput signal ri, together with the TD equivalent of the FDmultiband filter that selects the pilot subcarriers [24] (phase-shifted sum of the samples in the same relative position in allNf segments ofNt samples). This operation can be describedby

g =

Tdiag([

1 exp(− j 2π

NCkini

)· · · exp

[− j 2π

NCkini(NC−1

)]])

× CHNC

(w)

r = gd + gp + v,(18)

where the (Nt × NC) matrix T = [INt · · · INt ], thecolumn Nt-vectors gd and gp hold the data-dependent andpilot-dependent components in g, respectively, and v is theresulting noise vector. The elements of gp can be expressedby

g p[n] =Nf −1∑

m=0

exp(− j2πkini

n +mNt

NC

)

× exp[− jw

(n +mNt

)]r p[n +mNt

]

= θiniN1/2C N−1

f exp[j(wo − w

)n]

×nS−1∑

s=0

Nf −1∑

m=0

Nf −1∑

q=0

exp[− j

2πNc

kini

×(n + (m− q)Nt − s

nSNt

)]

× exp[j(wo − w

)mNt

]hs

[n− s

nSNt + (m− q)Nt

].

(19)

If channel s maximum delay τs (normalized to thesystem’s sampling interval Δt) is short enough so that theadjacent CIR replicas in (19) to not overlap (fulfils thesampling theorem),

τs ≤ NC

Nf nS, (20)

(19) can be further simplified to

g p[n] = θiniN1/2C N−1

f exp[j(wo − w

)n]

×nS−1∑

s=0

exp[− j

2πNc

kini

(n− s

nSNt

)]hs

[n− s

nSNt

]

×Nf −1∑

m=0

exp[j(wo − w

)mNt

],

(21)

that clearly shows that the pilot-dependent samples arelimited to the nS sets of samples {Lp} (with Lp elements),where the corresponding phase-shifted CIRs have energy.The remaining samples will depend only on the transmitteddata and noise.

The elements of gd can be expressed by

gd[n]=Nf −1∑

m=0

exp(− j2πkini

n +mNt

NC

)

× exp[− jw

(n +mNt

)]rd[n +mNt

]

= θiniN−1/2C

×Nf −1∑

m=0

(exp

(− j2πkini

n +mNt

NC

)

× exp[j(wo − w

)(n +mNt

)]

×nS−1∑

s=0

L−1∑

l=0

NC−1∑

k=0k /∈℘

hs[l]ds[k]

× exp[j

2πNC

k(n− l +mNt

)])

= θiniN−1/2C exp

[j(wo − w

)n]

exp(− j2πkini

n

NC

)

×nS−1∑

s=0

L−1∑

l=0

hs[l]Ψ,

(22)

where

Ψ =NC−1∑

k=0k /∈℘

ds[k] exp[j

2πNC

k(n− l)]

×Nf −1∑

m=0

exp[j(wo − w

)mNt

]exp

[j

2πNf

(k − kini

)m]

=NC−1∑

k=0k /∈℘

ds[k] exp[j

2πNC

k(n− l)]

×Nf −1∑

m=0

exp[j2πmNf

[(fo − f

)NCΔt +

(k − kini

)]],

(23)

where f is the initial candidate frequency offset.


0

0.1

0.2

0.3

0.4

0.5

Mag

nit

ude

0 50 100 150 200 250

Time sample

Samples to use in theCFO cost function

CIR 1CIR 2Data-dependent

Figure 2: Example of the constitution of vector g.

The elements of the noise vector v can be expressed by

v[n] =Nf −1∑

m=0

exp(− j2πkini

n +mNt

NC

)

× exp[− jw

(n +mNt

)]n[n +mNt

].

(24)

Figure 2 depicts an example of the constitution of vectorg for a system with 2 transmission antennas, NC = 1024subcarriers and pilot separationNf = 4(Nt = 256). The plotsput in evidence that the CIRs energy is limited to 2 sets ofsamples and the data-dependent component spans the entiresymbol duration.

A careful inspection of (23) reveals that the factorΨ (andthe data-dependent component) is zero for

{φ = ( fo − f

)NCΔt +

(k − kini

):φ ∈ Z∧ φ /=mNf , m ∈ Z

},

(25)

independently of the considered sample. Keeping in mindthat k /∈℘ and (k− kini) /=mNf , m ∈ Z, the solution for (25)is

(fo − f

)NCΔt = mNf ⇐⇒

(fo − f

)

= mNf

NCΔt= mNfΔ f , m ∈ Z,

(26)

whereΔ f is the subcarrier separation. It should be noted thatthe solution in (26) presents a periodicityNfΔ f and includesthe condition when the CFO is completely eliminated ( fo −f = 0). A similar analysis reveals that the factor Ψ hasmaximum magnitude for

{γ = ( fo − f

)NCΔt : γ ∈ Z∧ γ /=mNf , m ∈ Z

}

=⇒ (fo − f

) = l

NCΔt= lΔ f , l ∈ Z∧ l /=mNf .

(27)

We can conclude that Ψ has minimum values spreadNfΔ f Hz, with (Nf − 1) maximum magnitude values inbetween, separated by Δ f Hz.

−15

−10

−5

0

5

10

J(w

)

−6 −4 −2 0 2 4 6×105

Remaining CFO (Hz)

(a)

−15

−10

−5

0

5

10

J(w

)

−1 −0.5 0 0.5 1×105

Remaining CFO (Hz)

(b)

7

7.5

8

8.5

9

J(w

)

−3 −2−1 0 1 2 3×105

Remaining CFO (Hz)

(c)

Figure 3: The cost function J(w).

Let us define the column (Nt − nSLp)-vector j thatcollects the samples of g with no CIRs energy (only data-dependent and noise; example depicted in Figure 2) and thecost function J(w) as the energy in j:

J(w) = jH j. (28)

The definition of the cost function guarantees that,if within the acquisition range, its minimum value willconverge to the true estimate as the number of elements inj increases (and the noise term tends to a floor in J(w)). Theelements in j may be obtained from one OFDM symbol or aset of symbols (with data and pilots) if higher accuracy on theestimate is required. From the previous analysis of the factorΨ, it is clear that the acquisition range of our cost function is] − NfΔ f /2,NfΔ f /2[. The CFO estimate can be found by aline search within the acquisition range to find the minimumvalue of the cost function:

wo = arg{

minwJ(w)}

, (29)

where wo is the estimated CFO value. The exhaustive linesearch is computationally demanding, depending on thesearch’s granularity. Hence, there is a tradeoff betweencomplexity and estimate’s variance.

The cost function has a closed form expression, andits behavior is perfectly described. In the acquisition range,there are Nf maximum values; in the interval limited by themaximum values that surround the perfect estimate, J(w)presents a smooth shape with a single minimum. Using theknowledge we possess of the cost function, we propose a 2-step approach to find its minimum value. The initial stepperforms a coarse line search to locate the global minimuminterval. Testing Nf candidate CFO values evenly spacedby Δ f Hz should suffice. The candidate CFO will be theone with the lowest cost. If the number of elements in


0

200

400

600

800

Rem

ain

ing

CFO

(Hz)

0 5 10 15 20

Eb/N0 (dB)

Increasing numberof samples

Figure 4: Estimated CFO standard deviation versus the number ofsamples in J(w).

j is small and SNR is very low, the probability of wrongidentification may not be negligible and the number ofcandidate CFO values can be increased thus decreasing thewrong identification probability. In the final step, we use thegradient descent method [26] to find the global minimum.

Figure 3 depicts an example of the cost function for a2 × 1 Alamouti OFDM system with NC = 1024 subcarriers,sampling interval Δt = 10 nanoseconds and pilot separationNf = 4. The values in the plot were acquired using anSNR = 20 dB. In Figure 3(a), the separation of ≈390 kHzbetween consecutives minimum values is visible. Figure 3(b)shows in detail the interval around (wo − w) = 0. It is clearthat it has a unique global minimum that is easy to find(no problem with local minimum values). In Figure 3(c),the Nf − 1 maximum values between consecutive minimumvalues are clearly visible. It also shows in detail that theseparation of the maximum values around (wo − w) = 0 is≈195 kHz.

Figure 4 shows the evolution of the estimated CFOstandard deviation with the number of samples used inestimation algorithm (elements of j). The plots depict thestandard deviation when the number of samples goes from200 to 2000, in steps of 200 samples.

3.3. Channel Estimation. Assuming that the CFO is com-pletely eliminated, the output of the initial operation ofthe CFO algorithm is made up of the pilot-dependentcomponent and noise g = gp + v. The data-dependentcomponent was eliminated from this vector, opening way toeasily obtain an initial CIR estimate.

The channel estimation algorithm starts by isolating eachof the nS phase-shifted CIRs from g and removing themodulating exponential factor. The elements of the resulting

vectors hs,LS can be expressed by

hs,LS[n] = exp

[j

2πNc

kin]Nt−1∑

m=0

δ[n +

s

nSNt −m

]g[n]

= θiN1/2C N−1

f hs[n] + exp[j

2πNc

kin]vi

[n +

s

nSNt

].

(30)

101

102

103

104

105

Rem

ain

ing

CFO

(Hz)

0 2 4 6 8 10 12 14 16 18 20

Eb/N0 (dB)

Our method mean errorOur method STD

[16] mean error[16] STD

Figure 5: Remaining CFO.

In [24], it was demonstrated that for a single transmittingantenna OFDM system with perfect synchronization, (30) isthe TD counterpart of FD LS estimate. By using phase-shiftedpilot sequences that allow the separation of the differentCIRs, the same result holds in the present model.

This initial estimate can be significantly improved byincorporating a TD linear minimum MSE (LMMSE) filterWs to reduce the estimate’s error, taking advantage of theCIR energy concentration. The improvements provided bythis filter are especially significant for low values of SNR.

The LMMSE filter can be expressed by [27]

Ws

= diag

(σ2hs

[0]

σ2hs

[0]+N−1t σ2

n, . . . ,

σ2hs

[Lp − 1

]

σ2hs

[Lp−1

]+N−1

t σ2n

, 0, . . . , 0

).

(31)

The resulting CIR and CFR estimates are, respectively,hs = Ws

hs,LS and hs = Fhs.


A simulation scenario was implemented using an Alamouti2× 1 OFDM system with Nc = 1024 modulated subcarriers,sampling interval Δt = 50 nanoseconds and a CP with100 samples. The transmitted OFDM symbols carried pilotsand data, with a pilot separation N f = 8. The OFDMframe consists of 16 symbols. The CFO value was randomlygenerated in each frame with a value inside the acquisitionrange ] − NfΔ f /2,NfΔ f /2[. The CFO estimation andremoval was performed on a frame basis. Two channelmodels with exponentially decaying power delay profile(PDP) were used to simulate indoor (50 nanoseconds rmsdelay spread) and outdoor environments (250 nanosecondsrms delay spread). To validate the proposed method, severalsimulations were performed using Eb/N0 values in the rangeof 0 dB to 20 dB.

Figure 5 shows the remaining CFO at the output ofthe CFO mitigate block when using the indoor channelmodel. The dashed lines represent the average remaining


−40

−35

−30

−25

−20

−15C

han

nel

esti

mat

ion

MSE

(dB

)

0 5 10 15 20

Eb/N0 (dB)

Channel estimationChannel & CFO estimation

QPSK

16-QAM

64-QAM

(a) Indoor

−40

−35

−30

−25

−20

−15

−10

Ch

ann

eles

tim

atio

nM

SE(d

B)

0 5 10 15 20

Eb/N0 (dB)

Channel estimationChannel & CFO estimation

QPSK

16-QAM

64-QAM

(b) Outdoor

Figure 6: Joint channel and CFO estimation MSE.

10−4

10−3

10−2

10−1

BE

R

0 5 10 15 20

Eb/N0 (dB)

QPSK

64-QAM

16-QAM

Perfect CSI & no CFOChannel estimationChannel & CFO estimation

(a) Indoor

10−4

10−3

10−2

10−1B

ER

0 5 10 15 20

Eb/N0 (dB)

QPSK

64-QAM

16-QAM

Perfect CSI & no CFOChannel estimationChannel & CFO estimation

(b) Outdoor

Figure 7: System BER performance.

CFO. The solid lines represent the standard deviation ofthe CFO estimate. In our method, the gradient descentwas stopped for a step of 10 HZ. For the method in [16],one null subcarrier was added to each OFDM symbol, andan exhaustive search was performed with a 10 HZ step.The results show that our method is unbiased for the allrange of Eb/N0 values. The algorithm generates estimateswith small deviation from the true value using a limitednumber of symbols (16). The estimate deviation is ∼1% ofthat of [16] for low values of SNR. The method in [16]requires a large number of symbols and/or more receiveantennas to generate accurate estimates. This result showsthat the investigated method is quite adequate for burstsystems.

Figure 6 depicts the MSE of the channel estimationalgorithm (blue plots) when the system has perfect syn-chronization and the MSE of the joint CFO and channelestimation process (red plots), when using QPSK, 16-QAM,and 64-QAM modulations. Figure 6(a) presents the results

for the indoor channel and Figure 6(b) for the outdoorchannel.

Figure 7 depicts the system BER for 3 scenarios: theideal situation where the receiver has perfect channel stateinformation (CSI) (with no pilot overhead) and the CFO isabsent (green plots), the situation when the receiver mustestimate the channel from the received samples (blue plots),and the more realistic scenario where the receiver needs toestimate both the CFO and channel (red plots). Simulationsresults were obtained also for QPSK, 16-QAM, and 64-QAMmodulations as identified in the figures. Figure 7(a) presentsthe results for the indoor channel and Figure 7(b) for theoutdoor channel.

The channel estimation MSE improvement that can beobserved for higher-order modulations is due to the factthat the ratio between the powers in the pilot symbolsand data symbols is kept constant in all simulations. Thelarge increase of delay spread between both channels is theorigin of the∼3 dB MSE degradation when moving from the


indoor channel to the outdoor channel plots. This acceptabledegradation shows the ability of the estimator to deal withthe increasing channel delay spread by always weighing theenergy of channel taps versus noise variance. The channelestimation BER plots present a degradation of ∼1,2 dB thatcan be largely attributed to the 12.5% pilot overhead.

The joint CFO and channel estimation MSE is an effec-tive measure of the degradation caused by both algorithms.In these plots, the estimated channel was compared againstthe true channel affected by the same CFO that distortedthe received signal (according to (10)). The results plottedin Figures 6 and 7 show that the performance degradationof the joint process is marginal when compared withchannel estimation only, substantiating the performance ofthe proposed algorithms.

5. Conclusions

We have investigated a CFO estimation algorithm and anassociated channel estimation block for OFDM with trans-mitter diversity that explores the TD structure of transmittedsymbols carrying pilots and data. The CFO algorithm reliessolely on the data component present on the symbols toestimate the CFO, avoiding additional overhead like trainingsymbols or null subcarriers. Simulation results show thatthe residual CFO has a minimal impact in the system’s per-formance, confirming that the CFO estimates have minimaldeviation from the true value. The definition and shape ofthe cost function determine a very low-complexity scheme.An intermediate output of the CFO algorithm providesan easy to get initial CIR estimate minimizing the overallcomplexity. By incorporating a TD LMMSE filter, the initialCIR estimate is significantly improved. Simulation resultsof the joint algorithms confirm a reduced degradation ofthe system’s performance when compared with the idealscenario.

Acknowledgments

The authors wish to thank Fundacao para a Ciencia e aTecnologia that partially supported this work through theproject “PHOTON—Distributed and Extendible Heteroge-neous Radio Architectures using Fibre Optic Networks”(PTDC/EEA-TEL/72890/2006).

References

[1] R. van Nee and R. Prasad, OFDM for Wireless MultimediaCommunications, Artech House, London, UK, 1st edition,2000.

[2] G. L. Stuber, J. R. Barry, S. W. McLaughlin, Y. Li, M. A.Ingram, and T. G. Pratt, “Broadband MIMO-OFDM wirelesscommunications,” Proceedings of the IEEE, vol. 92, no. 2, pp.271–294, 2004.

[3] H. Sampath, S. Talwar, J. Tellado, V. Erceg, and A. Paulraj, “Afourth-generation MIMO-OFDM broadband wireless system:design, performance, and field trial results,” IEEE Communi-cations Magazine, vol. 40, no. 9, pp. 143–149, 2002.

[4] A. J. Paulraj, D. A. Gore, R. U. Nabar, and H. Bolcskei,“An overview of MIMO communications—a key to gigabitwireless,” Proceedings of the IEEE, vol. 92, no. 2, pp. 198–218,2004.

[5] IEEE Std 802.11, “Wireless LAN medium access control(MAC) and physical layer (PHY) specifications: high-speedphysical layer in the 5 GHz band,” 1999.

[6] I. Koffman and V. Roman, “Broadband wireless accesssolutions based on OFDM access in IEEE 802.16,” IEEECommunications Magazine, vol. 40, no. 4, pp. 96–103, 2002.

[7] K. Sathananthan and C. Tellambura, “Probability of errorcalculation of OFDM systems with frequency offset,” IEEETransactions on Communications, vol. 49, no. 11, pp. 1884–1888, 2001.

[8] T. Pollet, M. Van Bladel, and M. Moeneclaey, “BER sensitivityof OFDM systems to carrier frequency offset and Wiener phasenoise,” IEEE Transactions on Communications, vol. 43, no. 2–4,pp. 191–193, 1995.

[9] L. Rugini and P. Banelli, “BER of OFDM systems impaired bycarrier frequency offset in multipath fading channels,” IEEETransactions on Wireless Communications, vol. 4, no. 5, pp.2279–2288, 2005.

[10] J.-J. van de Beek, M. Sandell, and P. O. Borjesson, “MLestimation of time and frequency offset in OFDM systems,”IEEE Transactions on Signal Processing, vol. 45, no. 7, pp. 1800–1805, 1997.

[11] H. Bolcskei, “Blind high-resolution uplink synchronization ofOFDM-based multiple access schemes,” in Proceedings of the2nd IEEE Workshop on Signal Processing Advances in WirelessCommunications (SPAWC ’99), pp. 166–169, Annapolis, Md,USA, May 1999.

[12] N. Lashkarian and S. Kiaei, “Class of cyclic-based estimatorsfor frequency-offset estimation of OFDM systems,” IEEETransactions on Communications, vol. 48, no. 12, pp. 2139–2149, 2000.


[14] M. Morelli and U. Mengali, “Improved frequency offsetestimator for OFDM applications,” IEEE CommunicationsLetters, vol. 3, no. 3, pp. 75–77, 1999.

[15] H. Liu and U. Tureli, “A high-efficiency carrier estimator forOFDM communications,” IEEE Communications Letters, vol.2, no. 4, pp. 104–106, 1998.


[17] Y. Li, N. Seshadri, and S. Ariyavisitakul, “Channel estimationfor OFDM systems with transmitter diversity in mobilewireless channels,” IEEE Journal on Selected Areas in Commu-nications, vol. 17, no. 3, pp. 461–471, 1999.

[18] Y. Li, “Simplified channel estimation for OFDM systems withmultiple transmit antennas,” IEEE Transactions on WirelessCommunications, vol. 1, no. 1, pp. 67–75, 2002.

[19] M. Shin, H. Lee, and C. Lee, “Enhanced channel-estimationtechnique for MIMO-OFDM systems,” IEEE Transactions onVehicular Technology, vol. 53, no. 1, pp. 261–265, 2004.

[20] H. Zhang, Y. Li, A. Reid, and J. Terry, “Channel estimation forMIMO OFDM in correlated fading channels,” in Proceedings ofIEEE International Conference on Communications (ICC ’05),vol. 4, pp. 2626–2630, Seoul, Korea, May 2005.

[21] H. Zamiri-Jafarian and S. Pasupathy, “Robust and improvedchannel estimation algorithm for MIMO-OFDM systems,”


IEEE Transactions on Wireless Communications, vol. 6, no. 6,pp. 2106–2113, 2007.

[22] I. Barhumi, G. Leus, and M. Moonen, “Optimal trainingdesign for MIMO OFDM systems in mobile wireless chan-nels,” IEEE Transactions on Signal Processing, vol. 51, no. 6, pp.1615–1624, 2003.

[23] H. Minn and N. Al-Dhahir, “Optimal training signals forMIMO OFDM channel estimation,” in Proceedings of IEEEGlobal Telecommunications Conference (GLOBECOM ’04), vol.1, pp. 219–224, Dallas, Tex, USA, November-December 2004.

[24] C. Ribeiro and A. Gameiro, “Direct time-domain channelimpulse response estimation for OFDM-based systems,” inProceedings of the 66th IEEE Vehicular Technology Conference(VTC ’07), pp. 1082–1086, Baltimore, Md, USA, September-October 2007.

[25] M. Morelli, A. N. D’Andrea, and U. Mengali, “Frequencyambiguity resolution in OFDM systems,” IEEE Communica-tions Letters, vol. 4, no. 4, pp. 134–136, 2000.

[26] S. Boyd and L. Vandenberghe, Convex Optimization, Cam-bridge University Press, Cambridge, UK, 2004.

[27] J.-J. van de Beek, O. Edfors, M. Sandell, S. K. Wilson, and P.O. Borjesson, “On channel estimation in OFDM systems,” inProceedings of the 45th IEEE Vehicular Technology Conference(VTC ’95), vol. 2, pp. 815–819, Chicago, Ill, USA, July 1995.


Research Article

Turbo Processing for Joint Channel Estimation, Synchronization,and Decoding in Coded MIMO-OFDM Systems

Hung Nguyen-Le,1 Tho Le-Ngoc,1 and Chi Chung Ko2

1 Department of Electrical and Computer Engineering, Faculty of Engineering, McGill University, Montreal, QC, Canada H3A 2K62 Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117576

Correspondence should be addressed to Tho Le-Ngoc, [email protected]

Received 2 July 2008; Revised 11 November 2008; Accepted 25 December 2008


This paper proposes a turbo joint channel estimation, synchronization, and decoding scheme for coded multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. The effects of carrier frequency offset (CFO),sampling frequency offset (SFO), and channel impulse responses (CIRs) on the received samples are analyzed and explored todevelop the turbo decoding process and vector recursive least squares (RLSs) algorithm for joint CIR, CFO, and SFO tracking.For burst transmission, with initial estimates derived from the preamble, the proposed scheme can operate without the need ofpilot tones during the data segment. Simulation results show that the proposed turbo joint channel estimation, synchronization,and decoding scheme offers fast convergence and low mean squared error (MSE) performance over quasistatic Rayleigh multipathfading channels. The proposed scheme can be used in a coded MIMO-OFDM transceiver in the presence of multipath fading,carrier frequency offset, and sampling frequency offset to provide a bit error rate (BER) performance comparable to that in anideal case of perfect synchronization and channel estimation over a wide range of SFO values.

Copyright © 2009 Hung Nguyen-Le et al. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

1. Introduction

Coded multiple-input multiple-output orthogonal fre-quency division multiplexing (MIMO-OFDM) has beenintensively explored for broadband communications overmultipath-rich, time-invariant frequency-selective channels[1]. Turbo processing has been considered for coded MIMOand MIMO-OFDM systems for performance enhancement[2–5]. In particular, iterative detection and decoding issuesin MIMO systems to achieve near-Shannon capacity limit[2] and performance gain [5] were investigated under theassumption of perfect channel estimation and synchroniza-tion. Taking into account the effects of imperfect channelknowledge on the system performance, [4] developed acombined iterative detection/decoding and channel estima-tion scheme to improve the overall performance of MIMO-OFDM systems with perfect synchronization.

Under imperfect synchronization conditions, multicar-rier transmissions such as OFDM and MIMO-OFDM arehighly susceptible to synchronization errors such as carrier

frequency offset (CFO) and sampling frequency offset (SFO)[6–11], especially for operation at low signal-to-noise ratio(SNR) regimes in case of high-performance coded systems.Therefore, estimation of frequency offsets (CFO and SFO)and channel impulse responses (CIRs) are of crucial impor-tance in (coded) MIMO-OFDM systems using coherentdetection. So far, most studies on the issue have beenfocused on separate and sequential CFO/SFO and channelestimation [7, 11–14]. More specifically, channel estimationis performed by assuming that perfect synchronization hasbeen established [12–14], even though channel estimationwould be degraded by imperfect synchronization and viceversa. In most practical systems (e.g., WiFi, WiMAX), datais transmitted in bursts, and each burst is appended with apreamble that contains known training sequences to facilitatethe initial synchronization and channel estimation. However,the insufficient accuracy of initially estimated CFO, SFO,and channel responses as well as their time variation stillrequire known pilot tones inserted in the data segment ofthe burst to update and enhance the CFO, SFO, and channel


estimation accuracy in order to maintain the high systemperformance at the cost of reduced transmission/bandwidthefficiency (due to inserted pilot tones), for example, in theIEEE802.11 [15], 4 pilot tones are inserted in every block of48 data tones, representing an overhead of 8.33%.

Since synchronization and channel estimation are mutu-ally related, joint channel estimation and synchronizationwould provide better performance [10]. Recently, a fewalgorithms [8, 16–19] have been proposed for the estimationof CIRs and CFO in uncoded MIMO-OFDM systems butthese algorithms have neglected the SFO effect in theirstudies. However, the detrimental effect of the SFO (even fora very small SFO) will likely lead to a significant degradationof the OFDM receiver performance even when perfect CIRand CFO knowledge is available [20]. Specifically, the SFOinduces a sampling delay that drifts linearly in time over anOFDM symbol [21]. Without any SFO compensation, thisdelay hampers the OFDM receiver as soon as the productof the relative SFO and the number of subcarriers becomecomparable to one [9]. Consequently, OFDM receiversbecome more vulnerable to the SFO effect as the used FFTsize increases. For instance, an SFO of 40 ppm can causea window shift of up to six samples [21] in a burst of1000 OFDM symbols used in multiband OFDM systems[22]. As another example, in the presence of sampling clockoffset of 1 ppm in the DVB-T 2 K mode [23], the FFTwindow will move one sample around every 400 symbols[10].

Various SFO, CFO, and channel schemes have beeninvestigated. In [24], a correlation-based SFO estimationscheme for MIMO-OFDM systems in the absence of CFOwas proposed. Under the assumption of perfect channel esti-mation, decision-directed (DD) techniques were proposedfor joint CFO/SFO estimation and tracking [21] and forphase noise and residual frequency offset compensation [25]in OFDM systems. Unlike [21, 25], under the assumptionof perfect channel estimation, maximum likelihood (ML-)-based joint CFO and channel estimation schemes using pilotsignals in multiuser MIMO-OFDM systems were consideredin [18, 19]. An overview of CFO/SFO estimation andcompensation schemes using pre-FFT nondata-aided (NDA)acquisition, post-FFT data-aided (DA) acquisition, and post-FFT DA tracking can be found in [6, 26]. However, existingjoint channel estimation and synchronization algorithms forcoded MIMO-OFDM systems have omitted the SFO in theirinvestigations regardless of its detrimental effect [9, 10, 20,21, 24].

In this paper, we propose a joint synchronization,channel estimation, and decoding turbo processing schemefor coded MIMO-OFDM systems in the presence of qua-sistatic multipath channels, CFO, and SFO. By analyzingthe nonlinear interrelation between CFO, SFO, channelresponses, and received subcarriers, we develop an iterativevector recursive least-squares (RLSs-)-based joint CIR, CFO,and SFO tracking scheme that can be incorporated in theturbo processing between the MIMO-demapper and soft-input soft-output (SISO) decoder for the coded MIMO-OFDM receiver. Conceptually, more accurate estimates ofCFO, SFO, and CIR can be obtained by using more reliably

detected data and also help to enhance the MIMO-demapperoutput reliability that will improve the performance ofthe SISO decoder in the next iteration of the turbo pro-cess. Furthermore, the use of soft estimates alleviates thedetrimental effect of error propagation that usually occurswhen hard decisions are used in a feedback tracking loopor in decision-directed modes. As a result, better accuracyin CFO/SFO/CIR estimation and tracking can be achievedwithout the need of overhead pilot tones, that is, removingsignificant transmission efficiency loss and enhancing thespectral efficiency. As initial values of the CFO, SFO, andCIR play an important role in the convergence of the jointsynchronization, channel estimation, and decoding turboprocessing, we also develop a coarse CFO, SFO, and CIRestimation scheme (that was not studied in [27]) appliedto the preamble of the burst and based on the combinedCFO-SFO perturbation in order to provide the accuratelyestimated initial values of the CFO, SFO, and CIR.

The rest of the paper is organized as follows. Section 2describes the coded MIMO-OFDM signal model. Section 3analyzes the effects of CFO, SFO, and channel responseson the received samples. These interrelations are furtherexplored to develop the turbo joint channel estimation,synchronization, and decoding scheme in Section 4, andthe vector RLS-based joint CIR, CFO, and SFO trackingalgorithm is delineated in Section 5. Section 6 presents thecoarse estimation of the CFO, SFO, and CIR. Simulationresults for various scenarios are discussed in Section 7.Finally, Section 8 summarizes the paper.

2. System Model

Figure 1 shows a simplified block diagram of a convo-lutional-coded MIMO-OFDM transmitter using Nt trans-mit antennas and M-ary quadrature amplitude modula-tion (M-QAM). This transmitter architecture is similarto the space-time (ST) bit-interleaved coded modulation(BICM) in [28]. Using a serial-to-parallel (S/P) converter,the input convolutional-encoded bitstream is first splitinto Nt parallel sequences. Each sequence is further bit-interleaved and then organized as a sequence of Q-bittuples, {dum,k}, where Q = log2M, u = 1, . . . ,Nt, and

each Q-bit tuple, dum,k = [dum,k,0 · · ·dum,k,Q−1]T , is mappedto a complex-valued symbol, Xu,m(k) ∈ A. A is the M-ary modulation signaling set, and u, m, and k denotethe indices of the transmit antenna, OFDM symbol, andsubcarrier, respectively. (Notation: Upper and lower casebold symbols are used to denote matrices and columnvector, respectively. (·)T denotes transpose. (·)H denotesHermitian transpose. (·)∗ stands for conjugation. E{·} isexpectation operator. Re{·} and Im{·} denote real andimaginary parts, respectively. IN is theN×N identity matrix,⊗ denotes Kronecker product, and P(·) is the probabilityoperator.)

Each OFDM symbol consists of K < N informa-tion bearing subcarriers, where N is the size of the fastFourier transform (FFT) or inverse-FFT (IFFT). After IFFT,cyclic-prefix (CP) insertion and digital-to-analog conversion


Transmitter

S/P IFFT Insert CP DAC RF

ClkOsc

RFLO

Pilot insertionencoder

P/S

S/P

MQAMmapping

MQAMmapping S/P IFFT Insert CP DAC RFP/S

Conv.

c1m,k,q d1

m,k,q

Π1

ΠNt

cNtm,k,q dNtm,k,q

ci

Informationbits, ui

Figure 1: Coded MIMO-OFDM transmitter.

(DAC), the transmitted baseband signal at the uth transmitantenna can be written as

su(t)= 1N

+∞∑

m=−∞

K/2−1∑

k=−K/2Xu,m(k)e j(2πk/NT)(t−Tg−mTs)U

(t−mTs

),

(1)

Where T is the sampling period at the output of IFFT, Ng

denotes the number of CP samples, Tg = NgT , Ts = (N +Ng)T is the OFDM symbol length after CP insertion, u(t) isthe unit step function, andU(t) = u(t)−u(t−Ts). Practically,the colocated DACs are driven by a common sampling clockwith frequency of 1/T .

The multiple coded OFDM signals are transmitted overa frequency-selective, multipath fading channel. We assumefading conditions are unchanged within an OFDM burstinterval, so that the quasistatic channel response betweenthe uth transmit antenna and the vth receive antenna can berepresented by

hu,v(τ) =L−1∑

l=0

hu,v,lδ(τ − τl

), (2)

where hu,v,l and τl are the complex gain and delay of thelth path, respectively. L is the total number of resolvable(effective) paths.

3. Effects of CFO, SFO, and Channel Responseson Received Samples

Frequency discrepancies between oscillators used in the radiotransmitters and receivers, and channel-induced Dopplershifts cause a net carrier frequency offset (CFO) of Δ f inthe received signal, where f is the operating radio carrierfrequency. Practically, it is reasonable to assume that all pairsof transmit-receive antennas experience the same CFO [8],and the received signal at the vth receive antenna element canbe written as

rv(t) = e j2πΔ f tNt∑

u=1

L−1∑

l=0

hu,v,lsu(t − τl

)+wv(t). (3)

The impinging signals at all receive antennas are thensampled for analog-to-digital conversion (ADC) by thecommon receive clock at rate 1/T′. Since T′ /=T , the timealignment of received samples is also affected by the samplingfrequency offset (SFO). After sampling and CP removal, thesample of the mth OFDM symbol of the received signal rv(t)at time instant tn = nT′ is given by

rv,m,n = e j(2π/N)(Nm+n)εη

N

K/2−1∑

k=−K/2e j(2πk/N)n(1+η)e j(2πk/N)ηNm

×Nt∑

u=1

Xu,m(k)Hu,v(k) +wv,m,n,

(4)

where n = 0, 1, . . . ,N − 1, Nm = Ng + m(N + Ng).The complex-valued Gaussian noise sample, wv,m,n, has zeromean and a variance of σ2. Hu,v(k) = ∑L−1

l=0 hu,v,le− j(2πk/N)l isthe channel frequency response (CFR) at the kth subcarrierfor the pair of the uth transmit antenna and the vth

receive antenna, and hu,v = [hu,v,0 hu,v,1 · · · hu,v,L−1 ]T

isthe corresponding effective time-domain channel impulseresponse (CIR). The SFO and CFO terms are representedin terms of the transmit sampling period T as η = ΔT/T ,ΔT = T′ − T , and ε = Δ f NT = (Δ f / f )(NT f ), respectively,and εη = (1 + η)ε.

As observed in (4), the CFO and SFO induce the time-domain phase rotation that will translate into intercarrierinterference (ICI), attenuation, and phase rotation in thefrequency domain as shown in the following derivations.

After FFT, the received FD sample at the vth receiveantennais Yv,m(k) = ∑N−1

n=0 rv,m,ne− j(2π/N)nk. Based on (4), weobtain

Yv,m(k) =K/2−1∑

i=−K/2e j(2π/N)Nmεiρi,k

Nt∑

u=1

Xu,m(i)Hu,v(i) +Wv,m(k),

(5)

where εi = iη + εη, Wv,m(k) =∑N−1n=0 wv,m(n +Nm)e− j(2π/N)nk,

the ICI coefficient ρi,k = (1/N)∑N−1

n=0 ej(2π/N)n(εi+i−k) ≈

sinc(εi + i − k)e jπ(εi+i−k), and sinc(x) = sin(πx)/(πx). It isnoted that the frequency-domain expression of the received


samples in [6, Equation 37] corresponds to an approxima-tion of (5) for the case of the single-input single-outputconfiguration (Nt = 1, Nr = 1). In the first summation in(5), the term i = k corresponds to the subcarrier of interest,while the other terms with i /= k represent ICI. As can beobserved from the above expression for ρi,k, the term εi = iη+εη needs to be removed in order to suppress ICI. Obviously,in an ideal case with zero SFO and CFO, εi = 0, ρi,k = 1for i = k and ρi,k = 0 (i.e., no ICI) for i /= k. Therefore,Yv,m(k) = ∑Nt

u=1 Xu,m(k)Hu,v(k) + Wv,m(k), and perfectorthogonality among subcarriers is preserved at the receiver.In addition, the coefficient ρi,k ≈ sinc(εi + i − k)e jπ(εi+i−k)

quantifies the CFO-SFO-induced attenuation and phaserotation of received subcarriers. Thus, to mitigate ICI andattenuation, the effects of CFO and SFO on the receivedsamples have to be compensated. Hence, the estimates ofCFO and SFO are needed to compensate for the detrimentaleffects (phase rotation) of synchronization errors, while thechannel estimates are required for the MIMO demappingas illustrated in Figure 2. More specifically, the CFO andSFO compensations will be performed in the time domain(before FFT implementation at receiver) as described in thefollowing derivations.

Following the same approach in [20], the received time-domain sample in (4) can be multiplied by exp[− j2πεcηn/N]prior to FFT to mitigate ICI as shown in Figure 2, that is,

rcv,m,n = rv,m,ne− j(2π/N)nεcη , (6)

where εcη = (1+ηc)εc, εc, and ηc are the estimates of CFO andSFO, respectively.

After FFT, the resulting subcarriers at the vth receiveantenna are

Ycv,m(k) =

N−1∑

n=0

rcv,m,ne− j(2π/N)nk. (7)

After some manipulation, (7) can be rewritten as

Ycv,m(k) =

K/2−1∑

i=−K/2e j(2π/N)Nmεiρci,k

Nt∑

u=1

Xu,m(i)Hu,v(i) +Wcv,m(k),

(8)

where

Wcv,m(k) =

N−1∑

n=0

wv,m(n +Nm)e− j(2π/N)n(1+ηc)εc e− j(2π/N)nk,

ρci,k =1N

N−1∑

n=0

e j(2π/N)n[iη+(1+η)ε−(1+ηc)εc+i−k].

(9)

Based on (8), the vector representation of the frequency-domain (FD) received samples at all receive antennas can beexpressed by

Ycm(k) = e j(2π/N)Nmεkρck,kH(k)Xm(k) + W

c

m(k), (10)

where the (u, v)th entry of H(k) is given by [H(k)]u,v =Hu,v(k). Note that W

c

m(k) includes both AWGN and residual

ICI parts, Xm(k) = [X1,m(k) · · ·XNt ,m(k)]T , and each of thecomplex elements in W

c

m(k) has a variance of N0.

Equation (10) provides an insight of the nonlinearinterrelation between CFO, SFO, channel responses, andreceived subcarriers. It indicates that the estimation of CFO(εc), SFO (ηc), and channel responses requires knowledgeof subcarrier data Xm(k), while the decoding of subcarrierdata Xm(k) also needs to know the CFO, SFO, and channelresponses in addition to the binary convolutional codingstructure in Xm(k). This interrelation can be exploited todevelop a high-performance turbo joint channel estimation,synchronization, and decoding scheme that can mutuallyenhance the estimation accuracy and decoding reliabilityin an iterative manner. To reduce the number of estimatedparameters for the MIMO channel, it is desired to esti-mate the channel impulse response {hu,v,0,hu,v,1, . . . ,hu,v,L−1}instead of the channel frequency response Hu,v(k) as Hu,v(k)can be derived from the channel impulse response bya simple Fourier transform. The CFO, SFO, and CIRestimation needs to deal with the nonlinear relation asshown in (10) and will be discussed in Section 5. Thedevelopment of the turbo processing will be addressed inSection 4.

4. Turbo Joint Channel Estimation,Synchronization, and Decoding

The binary convolutional coding structure in Xm(k) isused to develop the constituent soft-input soft-output(SISO) decoder (shown in Figure 2) to provide morereliable soft estimates of the coded bits, P(c;O), basedon the extrinsic soft-bit information received from theMIMO-demapper, P(c; I), using the computations pre-sented in [29]. P(c;O) are then split into Nt streamsand interleaved to form Nt soft-bit estimate streamsP(dum,k,q; I) that are used as extrinsic information for MIMOdemapping and CIR, CFO, and SFO estimation as fol-lows.

The purpose of MIMO-demapper is to compute theextrinsic soft bit information:

P(dum,k,q = b;O

) =P(dum,k,q = b | Yc

m(k), H(k), ε, η)

P(dum,k,q = b; I

) ,

(11)

where b ∈ {0, 1}, and the letters I and O refer to, respectively,the input and output of the MIMO-demapper. Based on(10), the term P(dum,k,q = b | Yc

m(k), H(k), ε, η) can bedetermined as

P(dum,k,q = b | Yc

m(k), H(k), ε, η)

=∑

x∈X(b)u,m,k,q

P(

Xm(k) = x | Ycm(k), H(k), ε, η

),

(12)


RF

LO

Clk

Osc

T′ CFO/SFOcompensation

Ycv,m(k)

P(dNtk,q ; I)

RF ADC CP S/P FFT

...

e− j

2πN

nεc(

1+ηc)

...

RF ADC CP S/P FFT

RLS-basedestimation of

CIR/CFO/SFO

ε, η

ej2πNmεkρ

ck,k

NHu,v(k)

hu,v,l

SimplifiedFFT

Hu,v(k)

Xu,m(k)

Preamblegenerator

Xu,m(k)Soft

mapper ...

P(d1k,q ; I)

P(dNtk,q ; I)

......

P(d1k,q ; I)

Π−11

Π−1Nt

P/SSISO

decoder S/P

P(ci; I) P(ci;O)

P(ui;O)

Harddecision

MIMOdemapper

Π1

ΠNt

Receive

Figure 2: MIMO-OFDM receiver using turbo joint decoding, synchronization, and channel estimation.

where X(b)u,m,k,q is the set of the vectors Xm(k) =

[X1,m(k) · · ·XNt ,m(k)]T corresponding to dum,k,q = b,

P(

Xm(k) = x | Ycm(k), H(k), ε, η

)

= P(

Ycm(k) | Xm(k) = x, H(k), ε, η

)

× P(Xm(k) = x)/P(

Ycm(k)

),

P(


)

= (πN0)−Nr exp

(− ∥∥Ycm(k)

− e j(2π/N)Nmεk ρck,kH(k)x∥∥2No − 1

)

P(

Ycm(k)

) =∑

x∈Xm

P(


)

× P(Xm(k) = x),

(13)

where Xm is the set of all possible values of Xm(k),P(Xm(k) = x) = ΠuΠqP(dum,k,q = dum,k,q(x); I) due to theuse of interleaving, and dum,k,q(x) denotes the value of thecorresponding bit dum,k,q in the vector x.

The above equations, (11) and (12), indicate that unlikethe cases of perfect channel estimation and synchronizationin [2] and perfect synchronization in [4], the MIMOdemapper herein employs the estimated channel responses,

CFO and SFO, H(k), ε, η to derive the extrinsic soft bitinformation.

The estimation of channel responses, CFO and SFO,H(k), ε, η, is also based on (10) and hence, needs knowledgeof subcarrier data Xm(k). For this, based on the computedP(Xm(k) = x), the soft mapper (shown in Figure 2) generatesthe soft estimate, Xm(k), as its mean, that is,

Xm(k) = E[

Xm(k)] =

∑

x∈Xm

xP(

Xm(k) = x). (14)

Due to the close interaction between the CIR, CFO, and SFOestimates and the MIMO-demapper, the proposed turboprocessing is performed in a joint detection estimationmanner (as described above) instead of a serial fashion (i.e.,updating H(k), ε, η only after a few iterations for simplicity).As shown in Section 6, convergence to the good performancecan be achieved with only 2 or 3 iterations.

The Nt extrinsic soft bit information streams,P(duk,q;O), u = 1, . . . ,Nt, are then deinterleaved andparallel-to-serial converted to form the extrinsic softbitstream P(c; I) for the constituent soft-input soft-output(SISO) decoder that will provide more reliable soft estimatesof the coded bits, P(c;O), for the next iteration. At anyiteration, hard decision can be applied on P(u;O) to producethe decoded data bits. The information flow graph of theproposed turbo joint channel estimation, synchronization,


and decoding scheme, shown in Figure 3, illustrates theiterative exchange of the extrinsic information betweenthe constituent functional blocks in the receiver. By usingthe known training sequence Xm(k) in the preamblesegment of a burst, initial estimates of CFO and SFOcan be accurately obtained by using the conjugate delaycorrelation property and then used to establish the initialCIR estimates by the vector RLS algorithm as discussed inSection 5.

5. Vector RLS-Based Joint Tracking ofCIR, CFO, and SFO

Due to the nonlinear effects of CFO and SFO on the receivedsamples as shown by (10) in both time and frequencydomains, the joint estimation of CIR, CFO, and SFO wouldrequire highly complex nonlinear estimation techniques.To avoid such complexity, the paper uses Taylor series toapproximately linearize the nonlinear estimation problem.In addition, under the assumption that all transmit-receiveantenna pairs experience common CFO and SFO values[7, 8, 11], we can develop a fast-convergence, vector RLS-based joint CIR, CFO, and SFO estimation and trackingalgorithm suitable for MIMO-OFDM receivers as follows.

As previously discussed, to reduce the number of esti-mated channel parameters, we consider hu,v = [hu,v,l, l = 0,1, . . . ,L − 1]T for u = 1, . . . ,Nt, v = 1, . . . ,Nr insteadof Hu,v = [Hu,v(k), k = 0, 1, . . . ,K]T since usually L �K . Using the least squares (LS) criterion, our aim is toiteratively estimate the (2LNtNr + 2) × 1 parameter vector

ωi = [ωi,0 ωi,1 · · · ωi,2LNtNr+1]T

at iteration i to minimizethe following weighted squared error sum:

C(ωi) =

i∑

p=1

λi−pNr∑

v=1

∣∣ei,p,v∣∣2

, (15)

where λ is the forgetting factor, p = 1, . . . , i denotes the pthtone index in the set of i tone indices used in this adaptiveestimation. The elements of ωi are

ωi,l+2L(u−1)+2LNt(v−1) = Re{h(i)u,v,l

},

ωi,l+L+2L(u−1)+2LNt(v−1) = Im{h(i)u,v,l

},

ωi,2LNtNr = ε(i), ωi,2LNtNr+1 = η(i),

(16)

with u = 1, . . . ,Nt, v = 1, . . . ,Nr , l = 0, . . . ,L− 1. From (10),we obtain

ei,p,v = Ycv,mp

(kp)− fv

(Xu,mp

(kp), ωi),

fv(Xu,mp

(kp), ωi) = e j(2π/N)Nmp ε

(i)kp ρckp

Nt∑

u=1

Xu,mp

(kp)H(i)u,v(kp),

H(i)u,v(kp) =

L−1∑

l=0

h(i)u,v,le

− j(2πkpl/N),

ε(i)kp= kpη(i) +

(1 + η(i)

)ε(i),

ρckp =1N

N−1∑

n=0

e j(2π/N)n[kpη(i)+(1+η(i))ε(i)−(1+ηc)εc].

(17)

It is noted that Xu,mp(kp) denotes the soft estimate of the pthdata tone at subcarrier kp of the mpth OFDM symbol fromthe u th transmit antenna.

It is clear that fv(Xu,mp(kp), ωi) is a nonlinear functionof ωi,2LNtNr = ε(i) and ωi,2LNtNr+1 = η(i). For a sufficientlysmall error ei,p,v, fv(Xu,mp(kp), ωi) can be approximatelyrepresented by the linear terms of its Taylor series, that is,an approximately linear estimation error can be determinedby

ei,p,v ≈ Ycv,mp

(kp)− { fv

(Xu,mp

(kp), ωi−1

)

+∇ f Tv (Xu,mp

(kp), ωi−1

)(ωi − ωi−1

)}.

(18)

The gradient vector of fv(Xu,mp(kp), ωi−1) corresponding tothe vth receive antenna is determined by

∇ fv(Xu,mp

(kp), ωi−1

)

=[∂ fv

(Xu,mp(kp), ωi−1

)

∂ωi−1,0· · · ∂ fv

(Xu,mp(kp), ωi−1

)

∂ωi−1,2LNtNr+1

]T,

(19)

where ∂ fv(Xu,mp(kp), ωi)/∂ωi,l+2L(u−1)+2LNt(v−1) = Xu,mp(kp)

× e− j(2πlkp/N)e j(2π/N)Nmε(i)kp ρckp , l = 0, . . . ,L− 1,

∂ fv(Xu,mp

(kp), ωi)

∂ωi,l+L+2L(u−1)+2LNt(v−1)= j

∂ fv(Xu,mp

(kp), ωi)

∂ωi,l+2L(u−1)+2LNt(v−1)

∂ fv(Xu,mp

(kp), ωi)

∂ωi,2LNtNr

= (1 + η(i))Ωi,p,v

Ωi,p,v=e j(2π/N)Nmε(i)kp

[j2πNNmρ

ckp

+1N

N−1∑

n=0

j2πNnej(2π/N)n[ε(i)

kp−εcη]

]

×Nt∑

u=1

Xu,mp

(kp)H(i)u,v

(kp),

∂ fv(Xu,mp

(kp), ωi)

∂ωi,2LNtNr+1= (kp + ε(i))Ωi,p,v, u = 1, . . . ,Nt.

(20)

Note that for ρ = 1, . . . ,Nr and ρ /= v, ∂ fv(Xu,mp(kp),

ωi)/∂ωi,l+2L(u−1)+2LNt(ρ−1) = 0, ∂ fv(Xu,mp(kp), ωi)/∂ωi,l+L+2L(u−1)+2LNt(ρ−1) = 0. Subsequently, the vectorRLS algorithm [30] can be used to formulate the followingvector RLS-based joint CIR, CFO and SFO tracking scheme.

Initialization. P1 = γ−1I2LNrNt+2, where γ is the regular-ization parameter. (The use of a scaled identity matrix forinitialization is mainly for convenience, and a random initial-ization matrix can also be employed. Since convergence willinvariably be attained, but the final converged position willdepend on many environmental factors and are unknown,the difference in using the two types of initialization matrices


The 1st longtraining symbol

of 52 pilot tones

The 2nd longtraining symbolof 52 pilot tones

The 1st dataOFDM symbolof 52 data tones

(no pilot tone)

The 225th dataOFDM symbolof 52 data tones(no pilot tone)

Preamble segment Data segment

Burst structure (for each transmit antenna)

Coarse CFO & SFO estimation

by conjugate-delaycorrelation

Coarse CIR estimationby vector RLS algorithm

Received samples

FFT

MIMO- demapper

P/S and deinterleaving

SISO decoder

Interleaving and S/P

Vector RLS joint CIR, CFO and SFOtracking estimator

Soft mapper

Coarse CFO and SFO estimates

Coarse CIRestimates

Received samples in time domain (after CFO-SFOcompensation)Yc

v,m(k)

hu,v,l , ε, η

P(d;O)

P(c; I)

P(c;O)

P(d; I)Xu,m(k)

· · ·

Figure 3: Turbo processing for joint channel estimation, synchronization, and decoding.

is in general not significant. However, due to its randomness,using a random matrix may give rise to problems withmatrix inversion or other similar matrix operations undercertain conditions. As a result, most adaptive algorithmsmake use of the more deterministic scaled identity matrix forinitialization purposes.)

Iterative Procedure. At the ith iteration with a forgettingfactor λ, update

Xi,Nr =[∇ f Tv

(Xu,mi

(ki), ωi−1

) · · · ∇ f Tv(Xu,mi

(ki), ωi−1

) ],

Ki = Pi−1X∗i,Nr

(λINr + XT

i,NrPi−1X∗

i,Nr

)−1,

Pi = λ−1(Pi−1 −KiXTi,Nr

Pi−1),

ei,Nr =[(Ycv,mi

(ki)− fv

(Xu,mi

(ki), ωi−1

)),v = 1, . . . ,Nr

]T,

u = 1, . . . ,Nt,

ωi = ωi−1 + Kiei,Nr .(21)

Under the above implementation of the vector RLS-basedtracking of CIR, CFO, and SFO algorithm, the resultingcomputational complexity is (L3N3

t N3r Nd) per each turbo

iteration, where L denotes the channel length, Nt stands forthe number of transmit antennas,Nr is the number of receive

antennas, and Nd is the number of subcarriers used in eachturbo iteration for the vector RLS tracking.

6. Coarse CIR, CFO, and SFOEstimation for Initial Values

For a stationary environment and time-invariant parametervector, the RLS algorithm is stable regardless of the eigen-value spread of the input vector correlation matrix [31] asshown in [32]. Due to the use of the first-order Taylor seriesapproximation, the stability of the vector RLS-based CFO,SFO, and CIR tracking scheme requires sufficiently smallinitial errors between the initial guesses and the true valuesof CIR, CFO, and SFO.

Accurate yet simple coarse estimation of CFO and SFOcan be based on the conjugate delay correlation of the twoidentical and known training sequences in the preamble ofthe burst (as shown in Figure 3), that is, based on (4), we canobtain the following approximation:

E{rv,m2,nr

∗v,m1,n

}

≈ e j(2π/N)(N+Ng )εη

N2

∣∣∣∣∣

K/2−1∑

k=−K/2e j(2πk/N)n(1+η)e j(2πk/N)ηNm1

×Nt∑

u=1

Xu,m1 (k)Hu,v(k)

∣∣∣∣∣

2

,

(22)


10−2

10−1

100

MSE

ofC

IRes

tim

ates

1 5 10 15 20 25

Number of data OFDM symbols

CRLB of pilot-based CIR estimate

using only 4 pilot tones in each

data OFDM symbol

CRLB of pilot-based CIR estimate

using perfect information of all (52)

tones in each data OFDM symbol

Turbo processing with 1 iteration

Turbo processing with 2 iterations


SNR= 2 dBMIMO with (Nt ,Nr ) = (2, 2)CFO= 0.005SFO= 112 ppm

Figure 4: MSE and CRLB of CIR estimates.

where m1 and m2 = m1 + 1 denote the indices of the 1st and2nd training sequences. Therefore, the combined CFO-SFOperturbation can be estimated by

εη = N

2π(N +Ng

)Φ[E{rv,m2,nr

∗v,m1,n

}], (23)

where Φ[E{rv,m2,nr∗v,m1,n}] is the angle of [E{rv,m2,nr∗v,m1,n}].Under the assumption of η � 1 (e.g., for a typical SFO

value of around 50 ppm or 5E-5 in practice) and the use ofthe two identical long training sequences in the preamble ofa burst, the coarse (initial) CFO and SFO estimates can bedetermined separately by

ε = 12π(N +Ng

)NrΦ

[ Nr∑

v=1

N−1∑

n=0

rv,m2,nr∗v,m1,n

],

η = 0,

(24)

where Φ[∑Nr

v=1

∑N−1n=0 rv,m2,nr∗v,m1,n] is the angle of

∑Nrv=1

∑N−1n=0 rv,m2,nr∗v,m1,n. The above coarse CFO and SFO

estimates are then used in the coarse CIR estimation thatemploys the vector RLS algorithm with the known Xm(k)’sduring the preamble.

7. Simulation Results and Discussions

Computer simulation has been conducted to evaluate theperformance of the proposed turbo joint channel estimation,synchronization, and decoding scheme for a convolutional-coded MIMO-OFDM system. In the investigation, theOFDM-related parameters are set to be similar to that givenby IEEE standard 802.11a [15]. QPSK is employed for dataOFDM symbols, each has 52 data tones. Note that in [15],4 out of 52 data tones are reserved for known pilot tones tofacilitate the CIR, CFO, and SFO tracking, which representsan overhead of 8.33%. For the proposed turbo joint channelestimation, synchronization, and decoding scheme, theentire OFDM symbol can be used for data tones to eliminate

10−8

10−7

10−6

10−5

10−4

MSE

ofC

FOes

tim

ates

1 5 10 15 20 25


CRLB of pilot-based

CFO estimate using

4 pilots in each OFDM symbol

CRLB of pilot-based CFO estimate





Turbo processing with

3 iterations


Figure 5: MSE and CRLB of CFO estimates.

10−11

10−10

10−9

10−8

10−7

MSE

ofSF

Oes

tim

ates

1 5 10 15 20 25


CRLB of pilot-aided SFO estimate

using 4 pilots in each OFDM symbol

CRLB of pilot-based SFO estimate







Figure 6: MSE and CRLB of SFO estimates.

this overhead of 8.33%. As illustrated in Figure 3, a burstformat of two identical long training symbols and 225 dataOFDM symbols was used in the simulation. The two identicallong training symbols in the preamble of a burst are used toperform a correlation-based coarse CFO-SFO estimation toestablish their initial values for the turbo joint tracking ofCIR, CFO, and SFO. The coarse CIR estimation is performedby using the vector RLS algorithm and the first long trainingsymbols with the available CFO and SFO initial estimatesand initial guesses of CIRs and the gradient componentsat (19) corresponding to CFO-SFO variables set to zeros.The rate 1/2 nonrecursive systematic convolutional code withlength covering 2 OFDM symbols is employed for encodingat the transmitter. At the receiver, the SISO decoder is usedas discussed in Section 4. For each transmit-receive antennapair, we consider an exponentially decaying Rayleigh fadingchannel with a channel length of 5 and a RMS delay spreadof 50 nanoseconds. In the simulation, the channel impulseresponses and frequency offsets are assumed to be unchanged


10−10

10−8

10−6

10−4

10−2

100

MSE

4 5 6 7 8 9 10 11 12

SNR (dB)

CFO= 0.1, SFO = 100 ppm

QPSK, 2× 2 MIMOMSEs measured after the 2nddata OFDM symbols

CIR

CFO

SFO

ML scheme [11]Proposed schemeCRLBs

(a) For QPSK

10−10

10−8

10−6

10−4

10−2

100

MSE

4 5 6 7 8 9 10 11 12

SNR (dB)

CFO = 0.3, SFO = 100 ppm

MIMO with (Nt ,Nr ) = (2, 2), 16-QAM

MSEs measured after the 2nd dataOFDM symbol in a burst of 225

data OFDM symbols

MSE of CIR estimatesCRLB of CIR estimatesMSE of CFO estimates

CRLB of CFO estimatesMSE of SFO estimatesCRLB of SFO estimates

(b) For 16-QAM

Figure 7: MSE and CRLB of CIR, CFO, and SFO estimates versusSNR.

over the duration of a burst of 227 OFDM symbols (twotraining OFDM symbols for preamble).

Figure 4 shows the measured mean squared errors(MSEs) of the CIR estimate and relevant Cramer-Rao lowerbounds (CRLBs). The numerical results demonstrate that theproposed estimation algorithm provides a fast convergenceand the best MSE performance with forgetting factor λ =1 and regularization parameter γ = 10. For comparison,the CRLB values of the CIR estimates obtained by usingany unbiased pilot-aided estimation approach with 4 knownpilot tones (in the IEEE standard 802.11a [15]) and of all52 known tones (i.e., ideal but unrealistic case) in each

10−6

10−5

10−4

10−3

10−2

10−1

100

BE

R

4 5 6 7 8 9 10 11 12

SNR (dB)

CFO = 0.005SFO = 112 ppm(Nt ,Nr) = (2, 2)

A: without turbo processing (preamble-based estimation)B: after 1 iteration of turbo processingC: after 2 iterations of turbo processingD: after 3 iterations of turbo processingE: ideal BER (perfect channel estimation, CFO = SFO = 0)

Figure 8: BER performance of the proposed turbo joint channelestimation, synchronization, and decoding scheme.

data OFDM symbol are also plotted in Figure 4. As canbe seen in Figure 4, the numerical results show that theMSE values of the CIR estimates obtained by the proposedscheme with just one iteration are even smaller than theCRLB obtained by any unbiased pilot-aided joint CIR, CFO,and SFO estimation approach using 4 pilots in each OFDMsymbol. Furthermore, after just 3 iterations, the proposedscheme converges to its best MSE performance close tothe CRLB of the ideal but unrealistic case of all 52 knowntones. In the same manner, Figures 5 and 6 show the MSEresults and relevant CRLBs of the CFO and SFO estimates,respectively. Figure 7 shows the MSE performance and CRLBvalues of the proposed turbo scheme with 3 iterations ofturbo processing versus SNR for QPSK (a) and 16-QAM(b). As can be seen in Figure 7(a), the proposed jointCIR/CFO/SFO estimation scheme provides more accurateCFO estimates than the existing ML-based CFO and SFOtracking algorithm [11] that requires the use of perfectchannel knowledge. For the same SNR, the gap between theMSE and corresponding CRLB for QPSK is smaller than thatfor 16-QAM.

Figure 8 shows the BER performance of the proposedturbo scheme with different numbers of iterations. Forreference, the ideal BER performance (curve E) in the caseof perfect channel estimation and synchronization (i.e., zeroCFO and SFO, using 3 iterations between MIMO-demapperand SISO decoder) is also plotted. The results show that theperformance of the proposed turbo scheme is improved withthe number of iterations and can approach that of the caseof perfect channel estimation and synchronization after 3iterations (curve D). Without turbo processing, the resultingworst-case BER performance (curve A) corresponding to


10−4

10−3

10−2

BE

R

50 100 150 200 250

SFO (ppm)

CFO= 0.3SNR = 8 dB(Nt ,Nr) = (2, 2)

Use 3 iterations of turbo processingIdeal BER (perfect channel estimation, CFO = SFO = 0)

Figure 9: BER performance of the proposed turbo joint channelestimation, synchronization, and decoding scheme under variousSFO values.

10−4

10−3

10−2

10−1

100

BE

R

0 0.1 0.2 0.3 0.4

CFO

SFO= 100 ppmSNR = 8 dB(Nt ,Nr) = (2, 2)

Use 3 iterations of turbo processingIdeal BER (perfect channel estimation, CFO = SFO = 0)

Figure 10: BER performance of the proposed turbo joint channelestimation, synchronization, and decoding scheme under variousCFO values.

the case of using only the preamble for the vector RLS-based joint channel estimation and synchronization is plot-ted in Figure 8. As shown, without the use of the turboprinciple, the vector RLS-based joint channel estimation andsynchronization scheme using only the preamble (curve A)provides an unacceptable receiver performance (BER valuesaround 0.5), while the proposed turbo scheme offers aremarkable improvement in BER performance even after justone iteration (curve B).

To investigate the effect of CFO and SFO on theperformance of the proposed turbo scheme, Figures 9 and

10 show the BER performance of the proposed turboalgorithm under various CFO and SFO values, respectively.For reference, the ideal BER performance in the case ofperfect channel estimation and synchronization (i.e., zeroCFO and SFO, using 3 iterations between MIMO-demapperand SISO decoder) is also plotted. As shown, the proposedturbo estimation scheme is highly robust against a widerange of SFO values.

8. Conclusions

In this paper, a received signal model in the presence of CFO,SFO and channel distortions was examined and exploredto develop a turbo joint channel estimation, synchroniza-tion, and decoding scheme and a vector RLS-based jointCFO, SFO, and CIR tracking algorithm for coded MIMO-OFDM systems over quasistatic Rayleigh multipath fadingchannels. The astonishing benefits of turbo process enablethe proposed joint channel estimation, synchronization, anddecoding scheme to provide a near ideal BER performanceover a wide range of SFO values without the needs of knownpilot tones inserted in the data segment of a burst. Simulationresults show that the joint CIR, CFO, and SFO estimationwith the turbo principle offers fast convergence and lowMSE performance over quasistatic Rayleigh multipath fadingchannels.

Appendices

A. Cramer-Rao Lower Bound for Pilot-BasedEstimates of CIR, CFO, and SFO

Based on (5), the received subcarrier ki in frequency domainat the vth receive antennacan be expressed by

Yv,m(ki) = e j(2π/N)Nmi εki ρki ,ki

Nt∑

u=1

Xu,m(ki)Hu,v

(ki)

+Wv,m(ki).

(A.1)

Note that ICI components in (A.1) can be assumed to beadditive and Gaussian distributed and included in Wv,m(ki)[20].

By collecting K subcarriers in each receive antenna, theresulting KNr subcarriers from Nr receive antennas can berepresented in the vector form as follow:

y = c + w, (A.2)

where

y = [Y1,m1

(k1) · · ·Y1,mK

(kK) · · ·YNr ,m1

(k1)

· · ·YNr ,mK

(kK)]T

,

w = [W1,m1

(k1) · · ·W1,mK

(kK) · · ·WNr ,m1

(k1)

· · ·WNr ,mK

(kK)]T

c = (INr ⊗(Φ(ε,η)SF

))h,

(A.3)


Φ(ε,η)=diag(e j(2π/N)Nm1 εk1 ρk1,k1 · · · e j(2π/N)NmK εkK ρkK ,kK

),

S =

⎡⎢⎢⎢⎢⎢⎢⎣

x(k1)

01×Nt 01×Nt(K−2)

01×Nt x(k2)

01×Nt(K−2)

.... . .

...

01×Nt 01×Nt(K−2) x(kK)

⎤⎥⎥⎥⎥⎥⎥⎦

,

F =

⎡⎢⎢⎣

F1...

FK

⎤⎥⎥⎦ , Fi = INt ⊗

[1 · · · e− j(2π/N)(L−1)ki

],

x(ki) =[X1(ki) · · ·XNt

(ki)]

, 01×Nt = [0 · · · 0]︸︷︷︸Nt elements

,

h = [hT1 · · ·hTNr]T

,

hv = [h1,v,0 · · ·h1,v,L−1 · · ·hNt ,v,0 · · ·hNt ,v,L−1]T

,v = 1, . . . ,Nr

(A.4)

Based on (A.2), the Fisher information matrix [33] can becomputed by

M = 2σ2wRe[∂cH

∂ω

∂c∂ωT

], (A.5)

where ω = [ hTR hTI ϕT ]T

, hR = Re{h}, hI = Im{h},

ϕ = [ ε η ]T

,

∂cH

∂hR= INr ⊗

(FHSHΦH(ε,η)

),

∂cH

∂hI= − jINr ⊗

(FHSHΦH(ε,η)

),

∂cH

∂ϕ=

⎡⎢⎣

hH(

INr ⊗(

FHSHΦHε

))

hH(

INr ⊗(

FHSHΦHη

))

⎤⎥⎦ ,

∂c∂hTR

= INr ⊗(Φ(ε,η)SF

),

∂c∂hTI

= jINr ⊗(Φ(ε,η)SF

),

∂c∂ϕT

= [(

INr ⊗(ΦεSF

))h(

INr ⊗(ΦηSF

))h].

(A.6)

Therefore, the Cramer-Rao lower bound of estimatedparameters ω, CRLB(ω), can be determined by

CRLB(ω) = diag(

M−1). (A.7)

B. SNR

Based on (4), the signal-to-noise ratio (SNR) at the vthreceive antenna is

SNRv = PS,v

PN, (B.1)

where

PS,v, = 1N2

E

{∣∣∣∣∣

K/2−1∑

k=−K/2e j(2πk/N)n(1+η)e j(2πk/N)ηNm

×Nt∑

u=1

Xu,m(k)Hu,v(k)

∣∣∣∣∣

2},

(B.2)

and PN = σ2. Assume that the coefficients of CIR,{hu,v,0,hu,v,1, . . . ,hu,v,L−1}, are independent zero-mean complex random variables with commonvariances{σ2

0 , σ21 , . . . , σ2

L−1} for all pairs of transmit-receiveantennas, and all receive antennas experience the sameAWGN power. After some manipulation, it can be shownthat the SNR values at all receive antennas are equal to

SNR = KNtEs

L−1∑

l=0

σ2l(

N2σ2) , (B.3)

where Es = E{ |Xu,m(k)|2} is the average energy of the M-QAM symbols.

Acknowledgment

The authors would like to thank Mr. Robert Morawski forhis kind help in running many computer simulations for thispaper.

References

[1] H. Bolcskei, “MIMO-OFDM wireless systems: basics, perspec-tives, and challenges,” IEEE Wireless Communications, vol. 13,no. 4, pp. 31–37, 2006.

[2] B. M. Hochwald and S. ten Brink, “Achieving near-capacity ona multiple-antenna channel,” IEEE Transactions on Communi-cations, vol. 51, no. 3, pp. 389–399, 2003.

[3] S. Haykin, M. Sellathurai, Y. de Jong, and T. Willink, “Turbo-MIMO for wireless communications,” IEEE CommunicationsMagazine, vol. 42, no. 10, pp. 48–53, 2004.

[4] J. Liu and J. Li, “Turbo processing for an OFDM-based MIMOsystem,” IEEE Transactions on Wireless Communications, vol. 4,no. 5, pp. 1988–1993, 2005.

[5] P. Liang, S. Feng, Y. Jin, and W. Wu, “A novel turbo MIMO-OFDM system via sequential Monte Carlo,” in Proceedings ofthe 16th IEEE International Symposium on Personal, Indoor andMobile Radio Communications (PIMRC ’05), vol. 2, pp. 984–988, Berlin, Germany, September 2005.

[6] M. Speth, S. A. Fechtel, G. Fock, and H. Meyr, “Opti-mum receiver design for wireless broad-band systems usingOFDM—part I,” IEEE Transactions on Communications, vol.47, no. 11, pp. 1668–1677, 1999.




[9] S. Gault, W. Hachem, and P. Ciblat, “Joint sampling clockoffset and channel estimation for OFDM signals: Cramer-Rao bound and algorithms,” IEEE Transactions on SignalProcessing, vol. 54, no. 5, pp. 1875–1885, 2006.

[10] B. Ai, Z.-X. Yang, C.-Y. Pan, J.-H. Ge, Y. Wang, and Z. Lu, “Onthe synchronization techniques for wireless OFDM systems,”IEEE Transactions on Broadcasting, vol. 52, no. 2, pp. 236–244,2006.

[11] C. Oberli, “ML-based tracking algorithms for MIMO-OFDM,” IEEE Transactions on Wireless Communications, vol.6, no. 7, pp. 2630–2639, 2007.

[12] H. Minn and N. Al-Dhahir, “Optimal training signals forMIMO OFDM channel estimation,” IEEE Transactions onWireless Communications, vol. 5, no. 5, pp. 1158–1168, 2006.

[13] M. Cicerone, O. Simeone, and U. Spagnolini, “Channelestimation for MIMO-OFDM systems by modal analy-sis/filtering,” IEEE Transactions on Communications, vol. 54,no. 11, pp. 2062–2074, 2006.

[14] Z. J. Wang, Z. Han, and K. J. R. Liu, “A MIMO-OFDM channelestimation approach using time of arrivals,” IEEE Transactionson Wireless Communications, vol. 4, no. 3, pp. 1207–1213,2005.

[15] IEEE Computer Society, IEEE Std 802.11a-1999, December1999.

[16] K. J. Kim and R. A. Iltis, “Frequency offset synchroniza-tion and channel estimation for the MIMO-OFDM systemusing rao-blackwellized gauss-hermite filter,” in Proceedingsof IEEE Wireless Communications and Networking Conference(WCNC ’06), vol. 2, pp. 860–865, Las Vegas, Nev, USA, April2006.

[17] J. Li, G. Liao, and S. Ouyang, “Jointly tracking disper-sive channels and carrier frequency-offset in MIMO-OFDMsystems,” in Proceedings of the International Conference onCommunications, Circuits and Systems (ICCCAS ’06), vol. 2,pp. 816–819, Guilin, China, June 2006.

[18] J. Chen, Y.-C. Wu, and T.-S. Ng, “Optimal joint CFO andchannel estimation for multiuser MIMO-OFDM systems,” inProceedings of IEEE International Conference on Communica-tions (ICC ’08), pp. 563–567, Beijing, China, May 2008.

[19] J. Chen, Y.-C. Wu, S. Ma, and T.-S. Ng, “Joint CFO andchannel estimation for multiuser MIMO-OFDM systems withoptimal training sequences,” IEEE Transactions on SignalProcessing, vol. 56, no. 8, part 2, pp. 4008–4019, 2008.

[20] H. Nguyen-Le, T. Le-Ngoc, and C. C. Ko, “Joint channelestimation and synchronization with inter-carrier interferencereduction for OFDM,” in Proceedings of IEEE InternationalConference on Communications (ICC ’07), pp. 2841–2846,Glasgow, Scotland, June 2007.

[21] K. Shi, E. Serpedin, and P. Ciblat, “Decision-directed finesynchronization in OFDM systems,” IEEE Transactions onCommunications, vol. 53, no. 3, pp. 408–412, 2005.

[22] IEEE 802.15-03/268r0, Physical Layer Submission to 802.15Task Group 3a: Multiband Orthogonal Frequency-DivisionMultiplexing, 2003.

[23] European Telecommunication Standard ETS 300 744, DigitalBroadcasting Systems for Television, Sound and Data Services;Framing Structure, Channel Coding and Modulation forDigital Terrestrial Television, 1996.

[24] Y. Xu, L. Dong, and C. Zhang, “Sampling clock offsetestimation algorithm based on IEEE 802.11n,” in Proceedingsof IEEE International Conference on Networking, Sensing andControl (ICNSC ’08), pp. 523–527, Sanya, China, April 2008.

[25] K. Nikitopoulos and A. Polydoros, “Compensation schemesfor phase noise and residual frequency offset in OFDM

systems,” in Proceedings of IEEE Global TelecommunicationsConference (GLOBECOM ’01), vol. 1, pp. 330–333, SanAntonio, Tex, USA, November 2001.

[26] M. Speth, S. Fechtel, G. Fock, and H. Meyr, “Optimumreceiver design for OFDM-based broadband transmission—part II: a case study,” IEEE Transactions on Communications,vol. 49, no. 4, pp. 571–578, 2001.

[27] H. Nguyen-Le, T. Le-Ngoc, and C. C. Ko, “Turbo jointdecoding, synchronization and channel estimation for codedMIMO-OFDM systems,” in Proceedings of IEEE InternationalConference on Communications (ICC ’08), pp. 4366–4370,Beijing, China, May 2008.

[28] A. M. Tonello, “Space-time bit-interleaved coded modulationwith an iterative decoding strategy,” in Proceedings of the 52ndIEEE Vehicular Technology Conference (VTC ’00), vol. 1, pp.473–478, Boston, Mass, USA, September 2000.

[29] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “Asoft-input soft-output APP module for iterative decoding ofconcatenated codes,” IEEE Communications Letters, vol. 1, no.1, pp. 22–24, 1997.

[30] J. M. Mendel, Lessons in Estimation Theory for Signal Process-ing, Communications, and Control, Prentice-Hall, EnglewoodCliffs, NJ, USA, 1995.

[31] D. G. Manolakis, V. K. Ingle, and S. M. Kogon, Statistical andAdaptive Signal Processing, McGraw-Hill, Boston, Mass, USA,2002.

[32] S. Haykin, Adaptive Filter Theory, Prentice-Hall, EnglewoodCliffs, NJ, USA, 3rd edition, 1996.

[33] S. M. Kay, Fundamentals of Statistical Signal Processing,Prentice-Hall PTR, Englewood Cliffs, NJ, USA, 1998.


Research Article

Biologically Inspired Intercellular Slot Synchronization

Alexander Tyrrell1, 2 and Gunther Auer1

1 DOCOMO Euro-Labs, 80687 Munich, Germany2 Institute of Networked and Embedded Systems, University of Klagenfurt, 9020 Klagenfurt, Austria

Correspondence should be addressed to Alexander Tyrrell, [email protected]

Received 30 June 2008; Revised 3 November 2008; Accepted 21 January 2009


The present article develops a decentralized interbase station slot synchronization algorithm suitable for cellular mobilecommunication systems. The proposed cellular firefly synchronization (CelFSync) algorithm is derived from the theory ofpulse-coupled oscillators, common to describe synchronization phenomena in biological systems, such as the spontaneoussynchronization of fireflies. In order to maintain synchronization among base stations (BSs), even when there is no direct linkbetween adjacent BSs, some selected user terminals (UTs) participate in the network synchronization process. Synchronizationemerges by exchanging two distinct synchronization words, one transmitted by BSs and the other by active UTs, without any apriori assumption on the initial timing misalignments of BSs and UTs. In large-scale networks with inter-BS site distances up toa few kilometers, propagation delays severely affect the attainable timing accuracy of CelFSync. We show that by an appropriatecombination of CelFSync with the timing advance procedure, which aligns uplink transmission of UTs to arrive simultaneously atthe BS, a timing accuracy within a fraction of the inter-BS propagation delay is retained.

Copyright © 2009 A. Tyrrell and G. Auer. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

1. Introduction

Slot synchronization is an enabling component for cellularsystems. It is a prerequisite for advanced intercellular coop-eration schemes, such as interference suppression betweenneighboring cells, as well as multicast and broadcastingservices. The problem of intercell slot synchronization is toalign the internal timing references of all nodes, so that basestations (BSs) and user terminals (UTs) agree on a commonreference instant that marks the start of a transmission slot.In the context of cellular systems a slot is composed of anumber of successive uplink and downlink frames, referredto as superframe.

Network synchronization in cellular systems is com-monly performed in a master-slave manner: BSs synchronizeto an external timing reference, known as the primaryreference clock, and transfer this timing to UTs. This refer-ence clock can be acquired through the global positioningsystem (GPS) or through the backbone connection. The firstmethod requires the installation of a GPS receiver at eachBS, which increases costs and, more importantly, does notwork in environments where GPS signals cannot be received.

For high accuracy, the second method requires precise delaycompensation, and the accuracy severely decreases whenclocks are chained [1].

Over-the-air decentralized intercell slot synchronizationthat avoids the need for an external timing reference waspioneered in [2], and further elaborated in [3, 4]. Its basicprinciple is summarized as follows: a BS emits a pulse indi-cating its own timing reference and is receptive to pulses fromsurrounding BSs; internal timing references are adjustedbased on the power-weighted average of received pulses.Conditions for convergence were derived in [5], whichreveals that convergence and stability are tightly linked to theintersite propagation delays between neighboring BSs. This isa critical issue, as inter-BS propagation delays are not knowna priori. Furthermore, in [2], direct communication betweenBSs is required, and for the exchange of synchronizationpulses, a separate frequency band is assumed to be available.

In the present paper a different approach is taken basedon the theory of pulse-coupled oscillators (PCOs), which iscommonly used to describe self-organized synchronizationof biological systems such as swarms of fireflies, heart cells,or neurons. Mirollo and Strogatz [6] derived a theoretical


1

0T τj 2T

t

φi(t) Fire Fire

(a) (b)

Δφ(φi(τj))

Figure 1: (a) Uncoupled phase function and (b) phase incrementupon reception of a pulse.

framework for the convergence to synchrony. Various aspectsregarding the application of the PCO model to wirelessnetworks are addressed in literature: radio effects such aspropagation delays [7], channel attenuation, and noise [8, 9],and allowing for long synchronization words [10]. The rulesthat govern the PCO synchronization model are intriguinglysimple and serve as a basis for inter-BS synchronization.

The proposed cellular firefly synchronization (CelFSync)algorithm adapts the PCO model to account for constraintsof cellular networks. CelFSync operates over-the-air, in adecentralized manner; no constraints are imposed on theavailability of an external timing reference. As BSs andUTs typically transmit on successive downlink and uplinkframes, two groups need to be distinguished; the BS grouptransmitting on the downlink and the UT group transmittingon the uplink. To facilitate the formation of two groups, twosynchronization words are specified, one associated to BSsand the other to UTs. UTs transmit an uplink sync wordbased on their internal timing reference, which is receivedby BSs to update their own timing; in return UTs adjusttheir timing reference upon reception of downlink syncwords from neighboring BSs. Thus, unlike [2], no separatefrequency band is required as sync words are transmittedin-band with data. Moreover direct communication amongBSs is not mandatory as synchronization is performed byhopping over UTs. As the downlink sync word is mandatoryfor conventional cellular systems to align the timing of UTswith the BS, the only overhead for inter-BS synchronizationis the insertion of the uplink sync word. Thanks to theproposed strategy, the network is able to synchronize startingfrom an arbitrary misalignment, and propagation delays onlyaffect the achieved accuracy but do not compromise theconvergence to synchrony.

When considering a scenario where BSs are separated byseveral hundred meters up to a few kilometers, propagationdelays severely affect the attainable timing accuracy. Wepropose to combine CelFSync with the timing advanceprocedure, which ensures that UT uplink transmissionsarrive simultaneously at the BS. Compensating intracellpropagation delays with the timing advance procedure, aswell as selecting cell edge users to participate in CelFSync,are effective means to substantially improve the achievedinterbase station timing accuracy.

The remainder of the paper is structured as follows. InSection 2 the PCO model and its achieved synchronization

accuracy in the presence of delays are presented. In Section 3CelFSync is developed by adopting the rules that govern thesynchronization of PCOs to cellular networks, and Section 4combines CelFSync with timing advance to compensate theeffects of propagation delays. Practical constraints regardingthe implementation in cellular networks are addressed inSection 5, and simulation results are presented in Section 6that investigate the time to convergence and the achievedaccuracy for an indoor office environment as well as an urbanmacrocell deployment composed of hexagonal cells.

2. Synchronization of Pulse-Coupled Oscillators

Pulse-coupled oscillators (PCOs) describe systems whereindividual nodes periodically emit pulses and adjust theirinternal time reference upon reception of pulses fromneighboring oscillators. In this section the rules that governthe PCO model [6] are summarized, and the achievedaccuracy in the stable state is elaborated.

2.1. Phase Function. A PCO is described by its phase functionφi(t), 1 ≤ i ≤ N , where N is the number of oscillators. Thisfunction evolves linearly over time with natural period T :

dφi(t)dt

= 1T. (1)

Whenever φi(t) = 1 at reference instant t = τi, the PCO issaid to fire, it transmits a pulse and resets its phase to 0. Thenφi(t) increases again linearly, and so on. Figure 1(a) plots theevolution of the phase function (1) during one period withinitial condition φi(0) = 0. The phase function can be seenas an internal counter that dictates the emission of pulses.In the following, we consider that all nodes have the samedynamics, that is, clock jitter is considered negligible.

2.2. Synchronization Rules. The goal of slot synchronizationis to align the internal time references of all nodes, so thatall PCOs fire simultaneously. To do so, the phase φi(t) isadjusted when a pulse is received. When coupled to others,an oscillator i is receptive to the pulses of its neighbors andadjusts its phase φi(t). When node j fires at instant τj , thephase of node i instantly increases by a valueΔφ that dependson its current value φi(τj):

φi(τj) −→ φi

(τj)

+ Δφ(φi(τj)). (2)

The phase increment Δφ is determined by the phase responsecurve, which in [6] was chosen to be a linear function:

φ + Δφ(φ) = min(αφ + β, 1), (3)

where the coupling parameters α and β determine thecoupling between oscillators. Figure 1(b) plots the timeevolution of the phase when receiving a pulse at t = τj . Thereceived pulse causes the oscillator to fire early.

Provided that α > 1 and 0 < β < 1, a system ofN identical oscillators coupled all-to-all is always able tosynchronize, so that all PCOs agree on a common referenceinstant, independent of initial timing misalignments [6].


0

0.2

0.4

0.6

0.8

1

Ph

asesφi

0 1 2 3 4 5 6 7

t/T

Figure 2: Phase representation of the synchronization of N = 30PCOs.

2.3. Convergence. An example of the synchronization ofpulse-coupled oscillators is shown in Figure 2. Initiallyall nodes start with a random phase, which incrementsaccording to (1) until one phase reaches the threshold. At thisinstant and each time a phase reaches 1, neighboring nodesincrement their phase according to (3). Over time, orderemerges from a seemingly chaotic situation where nodes firerandomly, and in Figure 2, all nodes fire in synchrony withinfive periods.

A key feature in the synchronization of PCOs is that,over time, nodes cluster into groups of oscillators. Thisphenomenon is referred to as absorption and occurs whena pulse forces nodes to exceed their firing threshold, causingthem to fire immediately. The absorption limit φ� is derivedfrom (3):

φ� =1− βα

. (4)

As nodes have the same internal dynamics and if theyare coupled all-to-all, absorptions remain permanently (seeFigure 2). Therefore nodes following the PCO rules firstgather into groups that gradually absorb one another, andafter some time, always coalesce into one synchronizedgroup.

In [11] Lucarelli and Wang extended the demonstrationof [6] to remove the all-to-all assumption. Under weakcoupling assumptions, that is, α close to 1 and β close to 0in (3) (no proof for strong coupling exists), equivalent phasedeviation variables are derived for each node (each variablerepresents the mean local interactions over one period) andare shown to asymptotically converge to the same value [11].

Unfortunately the analysis in [11] is not applicable whendelays are introduced. Izhikevich showed that there is noequivalent phase deviation variable when interactions aredelayed [12]. As the proposed inter-BS synchronizationscheme always delays interactions (see Section 3), an ana-lytical convergence study appears infeasible. Convergence isconsequently studied through simulations in Section 6.

2.4. Impact of Delays. When delays are introduced, such aspropagation delays, the coupling between two nodes i and jis delayed by νi j . In the presence of coupling delays a networkof PCOs may become unstable, and the network is unableto synchronize [13]. Stability is regained by introducing arefractory period of duration Trefr after reference instant τi

ν12

ν13

Node 1 τ1 Refrt

Node 2ν12

ν23

Refrt

Node 3ν13

ν23

Refrt

τ2

τ3

Figure 3: Synchronization of three pulse-coupled oscillators withdelays.

[7]. In refractory, when φi(t) < φrefr with φrefr = Trefr/T , nophase increment is possible, so that received pulses are notacknowledged. The duration of the refractory period needsto be at least twice the maximum delay between two nodes,so that echos are not acknowledged [7]:

Trefr > maxi, j

2νi j . (5)

Because of delays nodes are no longer able to perfectly aligntheir reference instants τi [7]. Nevertheless nodes convergeto a stable state where reference instants are spread within aninterval limited only by the coupling delays νi j , as detailedfor networks of two and three nodes in the remainder of thissection. Further discussion on the achieved accuracy of thePCO scheme in the presence of delays is available in [14].

2.4.1. Two Nodes. The accuracy limits for a network ofN = 2nodes is bounded by the interval of reference instants leadingto a stable state [7]. Suppose that the reference instants of twonodes i and j are aligned such that τj > τi + νi j ; then node i isthe forcing node that imposes its delayed reference onto nodej. After coupling, node j is pulled to the delayed timing ofnode i, τj = τi + νi j (as shown for nodes i = 1 and j = 2in Figure 3), as long as the pulse of node i falls within theabsorption interval (4) of node j, that is, φj(τi+νi j) ∈ [φ� , 1].If τi > τj + νi j , the roles are reversed, in the way that node jimposes its delayed timing onto node i, so that after couplingτi = τj + νi j . On the other hand, if the reference instant ofnode i is within the range

τi ∈[τj − νi j , τj + νi j

], (6)

the pulses from node j fall into the refractory period ofnode i, and vice versa, and are thus not acknowledged.This corresponds to the stable state where the phases ofboth nodes are not adjusted. According to (6) the achievedaccuracy is bounded by the propagation delay νi j and is givenby [7]:

εi j �∣∣τi − τj

∣∣ ≤ νi j . (7)

The introduction of a refractory period thus may resultin a state where one node imposes its timing onto the other,


in a similar way to a master-slave synchronization scheme.However, the achieved state is random: it depends on theinitial condition and on interactions with other nodes in thenetwork. Therefore the role of the forcing node is arbitrary,and PCO synchronization is still considered decentralized.

2.4.2. Three Nodes. The analysis of [7] is extended to anetwork of N = 3 nodes in the following. Two cases aredistinguished.

(i) The forcing node is directly connected with all nodes.

(ii) The forcing node is the edge node of a line topologyand imposes its timing to the other edge node byhopping over the center node.

Considering (i), suppose that node 1 is the forcing node thatimposes its delayed timing onto nodes 2 and 3. This stateis shown in Figure 3: node 1 fires at instant t = τ1, whichcauses nodes 2 and 3 to increment their phases at instantsτ1 + ν12 and τ1 + ν13, respectively. Assuming that their phaseexceeds the absorption limit (4), nodes 2 and 3 fire at instantsτ2 = τ1 + ν12 and τ3 = τ1 + ν13, and subsequently enterrefractory. No further phase increments occur because thepulses from nodes 2 and 3 are received when nodes are inrefractory (5). Therefore the network is in a stable state, andthe achieved accuracies of node 1 relative to node 2 and 3amount to ε12 = ν12 and ε13 = ν13, respectively. Interestingly,the accuracy between nodes 2 and 3 is equal to the differencein delays with forcing node 1, that is, ε13 = |ν12 − ν13|. Thusthis achieved accuracy does not depend on the direct delayν23 but on the delay difference with the forcing node 1.

In case (ii) the considered nodes form a line topology,where the edge nodes 1 and 3, cannot communicate directly.Suppose that node 1 is the forcing node that imposes itstiming onto node 3 via the center node 2. As the accuracybetween adjacent nodes is bounded by (7), that is, ε12 ≤ ν12

and ε23 ≤ ν23, the resulting accuracy interval over two hopsbetween edge nodes 1 and 3 amounts to the sum of delays:ε13 ≤ ν12 + ν23.

3. Decentralized Intercell Synchronization

This section presents an adaptation of the PCO model to per-form intercell synchronization. To facilitate reliable exchangeof reference instants in the presence of signal fading,interference, and noise, long synchronization sequences thatare transmitted in-band with data are considered instead ofpulses. Furthermore, half-duplex transmission is considered,which implies that nodes cannot receive whilst transmitting.To this end, when two nodes transmit sync words thatpartially overlap, both nodes are unable to detect the syncword sent by the other node, referred to as deafness betweennodes. Hence both nodes are effectively uncoupled, aneffect which may severely disrupt intercell synchronization.Further accounting for constraints in cellular systems, theframe structure does not allow for overlapping downlinkand uplink slots. Thus synchronized BSs and UTs should nottransmit simultaneously.

T

τ1

τ2

τ1

τ2

T/2

τ1

τ2Δ

In-p

has

esy

nch

ron

izat

ion

An

ti-p

has

esy

nch

ron

izat

ion

Ou

t-of

-ph

ase

syn

chro

niz

atio

n

t

t

t

t

t

t

Figure 4: Synchronization regimes of pulse-coupled oscillators.

The proposed cellular firefly synchronization (CelFSync)scheme takes into account these fundamental constraints, byresorting to an out-of-phase synchronization regime, intro-duced in Section 3.1. CelFSync relies on two synchronizationsequences, one transmitted by BSs to adjust timing referencesof UTs, and a second one transmitted by UTs to adjusttiming references of BSs, based on rules that are establishedin Section 3.2. The detection of the two distinct synchroniza-tion sequences in an asynchronous environment is discussedin Section 3.3. For ease of explanation, propagation delaysare neglected in this section and are treated specifically inSection 4.

3.1. Synchronization Regimes. A system of PCOs is saidto be synchronized when all nodes have reached a stablestate where their internal timing references are aligned,constrained to the considered synchronization regime [15].The synchronization regime is characterized by the phasedifference Δ = τ1 − τ2 between two synchronized groupsin the stable state, where members of the same groupare perfectly aligned. Depending on the phase differenceΔ, three synchronization regimes are distinguished [15],as illustrated in Figure 4. If there is no phase shift, Δ =0, the regime is said to be in-phase. If the phase shiftis exactly equal to half a period, Δ = T/2, nodes havereached an antiphase synchronization regime. Finally if thephase difference between oscillators is Δ /= 0 and Δ /=T/2between the first and second groups (and T − Δ betweenthe second and first groups), then oscillators are out-of-phasesynchronized.

The in-phase regime is the most common form ofsynchronization; pacemaker cells pulse simultaneously topump the heart, fireflies emit light at the same time.Antiphase synchronization is also familiar; when walking,our legs are antiphase synchronized: the left foot touches theground half a period after the right one, and vice versa.

Following the frame structure of cellular systems com-posed of successive downlink and uplink frames, BSs areto be synchronized out-of-phase with UTs. Out-of-phase


BSaνai

UTi

Cell boundary

νbiBSb

Figure 5: Cellular network topology with two BSs and one UT.

T

Δ T − Δ

UTiTransmitUL Sync

TBS, dec

TransmitDL Sync

TransmitUL Sync

TUT, decBSa

τUT, i τBS, a τUT, i

t

Figure 6: Synchronization principle of CelFSync.

synchronization ensures that uplink and downlink transmis-sions in the steady state do not overlap, so that detrimentaleffects of deafness between nodes, inherent to half-duplextransmission, are mitigated.

3.2. Cellular Firefly Synchronization. The goal of CelFSyncis to synchronize in time the transmission slots of a cellularnetwork, so that neighboring BSs mutually align the start ofthe superframe preamble. The timing information betweenBSs is conveyed by implicitly hopping over mobiles close tothe cell edge, as exemplified in Figure 5. Hopping on the UTenables to extend the reception range of sync words, andthus allows for robust intercell synchronization, even whenneighboring base stations do not hear one another.

CelFSync adapts the PCO synchronization model toestablish an out-of-phase synchronization regime. Thedesired stable state is illustrated for one user terminal UTi

and one base station BSa in Figure 6. Unlike the PCO model,instead of pulses, nodes transmit long synchronizationsequences denoted by UL Sync and DL Sync of durationTUL,Sync and TDL,Sync, respectively. For slot synchronizationthree states are distinguished: transmission of the sync word,the refractory period, and the listen state. Transmission startswhen a node fires (see τUT,i for UTi in Figure 6). Half-duplex transmission is considered: when a node transmits, itsreceiver is switched off. After transmission of the sync wordnodes enter the refractory period, where detected sync wordsare not acknowledged. In listen state nodes maintain a phasefunction, that is, adjusted upon detection of a sync word. Keyto separating nodes into two predefined groups is achieved bythree types of interactions as follows.

UT-BS Coupling. Base station BSa estimates the referenceinstant of UTi by detecting its sync word UL Sync; theestimate of this reference instant is denoted by τUT,i. In order

to establish the desired out-of-phase synchronization regime,BSa adjusts its phase function φBS,a exactly Δ seconds afterUTi has fired, at instant θUT,i = τUT,i + Δ. If the couplinginstant θUT,i falls within the listen state of BSa, the receivingBS increments its phase:

φBS,a(θUT,i

) −→ φBS,a(θUT,i

)+ ΔφBS

(φBS,a

(θUT,i

)). (8)

The phase response curve ΔφBS is chosen according to (3),such that phase increments are strictly positive:

φ + ΔφBS(φ) = min(αBSφ + βBS, 1

). (9)

The coupling parameters are chosen in accordance to thePCO synchronization model: αBS > 1 and 0 < βBS < 1.

The BS decoding delay TBS,dec, shown in Figure 6,specifies the interaction delay between the instant UL Syncdetected at τUT,i + TUL,Sync and the coupling instant θUT,i =τUT,i+Δ. It is an important parameter for two reasons. FirstlyTBS,dec allows for a processing delay at the receiver in orderto perform link level synchronization. Secondly TBS,dec needsto be appropriately chosen, so that the desired out-of-phasesynchronization regime is reached. As BSs fire Δ after UTs,the BS decoding delay yields

TBS,dec = Δ− TUL,Sync. (10)

BS-UT Coupling. The considered user terminal UTi esti-mates τBS,a, the reference instant of BSa. If the reception ofDL Sync from BSa at instant θBS,a = τBS,a +T−Δ falls withinthe listen state of UTi, the receiving UT increments its phase:

φUT,i(θBS,a

) −→ φUT,i(θBS,a

)+ ΔφUT

(φUT,i

(θBS,a

)). (11)

Again the phase response curve for BS-UT coupling ΔφUT ischosen according to (3):

φ + ΔφUT(φ) = min(αUTφ + βUT, 1

)(12)

with the coupling parameters αUT > 1 and 0 < βUT < 1. TheUT decoding delay that enforces UTs to fire T − Δ after BSsis equal to (see Figure 6):

TUT,dec = T − Δ− TDL,Sync. (13)

Thanks to this strategy, the formation of two groups iscontrolled. Starting from an arbitrary initial misalignment,where all reference instants τUT,i, τBS,a are randomly dis-tributed within [0,T], by following simple coupling rules,reference instants of UTs and BSs separate over time intotwo groups; all BS fire Δ after UTs, and all UTs fire T −Δ after BSs. This state corresponds to the synchronizedstate shown in Figure 6. Convergence is verified throughsimulations in Section 6; by appropriately selecting thecoupling parameters, it is shown that synchronization isalways accomplished.

To speed up the convergence of CelFSync, two enhance-ments are possible, namely BS-BS and UT-UT couplings andthe selection of active UTs.


BS-BS and UT-UT Coupling. In case BSs can communicatedirectly or UTs are placed close to one another, convergencemay be accelerated by allowing coupling between nodesof the same group. Moreover, the occurrence of deafnessbetween nodes decreases because the number of nodesthat are potentially coupled is increased. As half-duplextransmission is considered, BS-BS and UT-UT couplings areuseful only during the coarse synchronization phase, thatis, among nodes whose reference instants are misaligned bymore than the sync word length.

Phase adjustments are made similarly to (8) and (11)for BSs and UTs; however decoding delays are different, asnodes need to align in time with other nodes from theirown group. Therefore the interaction delay upon detectionof DL Sync and UL Sync needs to be equal to one period T ,giving a decoding delay of TBS-BS,dec = T − TDL,Sync for BSsand TUT-UT,dec = T − TUL,Sync for UTs.

Active UT Selection. Since uplink sync words UL Syncshould be heard by multiple BSs, it is reasonable to selecta subset of UTs close to the cell boundary to participate inintercell synchronization. Therefore, in each cell, the basestation selects the NUT UTs with the largest propagationdelay among NUT,tot total UTs in the cell. The remainingNUT,tot − NUT UTs are not active in CelFSync and follow thetiming reference dictated by their closest BS, by aligning theirlocal clocks based on DL Sync.

3.3. Synchronization Word Detection. CelFSync relies on thedetection of transmitted DL Sync and UL Sync sequences.In the following, we assume that uplink and downlink syncwords are two different random sequences, each composedof M symbols. Sync word detection is carried out by thelink-level synchronization unit, which cross-correlates thereceived signal stream x(t) with the sync word s(t), wheres(t) = sUL(t) if uplink sync words are to be detected,and s(t) = sDL(t) otherwise. The output of the link-levelsynchronization unit i is denoted by ri(t) =

∫x(t−τ)s∗(τ)dτ.

The correlator output produces a series of peaks, in a similarway to the emission of pulses in the PCO model, anddetection of a sync word is declared when ri(t) exceeds thedetection threshold R [16].

Signal fading may attenuate the received signal x(t),which may result in a missed detection. The probability thatreference instants τUT,i and τBS,a are correctly detected isdefined as [17]

Pd = Pr[ri(t) ≥ R |H]

, (14)

where H is the hypothesis that a sync word is present at thereceiver. On the other hand, as sync words are transmittedin-band, cross-correlation of s(t) with other sync words,payload data or noise produces spurious peaks, so thatdetection of a sync word may be declared although no syncword is present, giving rise to a false alarm. The false alarmprobability is defined as [17]

Pfa = Pr[ri(t) ≥ R |H]

, (15)

where H , the hypothesis that no sync word is present at thereceiver, is the complement of H .

The Neyman-Pearson criterion is used to design thesync word detector [17]: the detection threshold R is setaccording to the desired false alarm rate Pfa; once R is set, thedetection rate Pd is determined. The impact of false alarmand detection rates on an adaptation of the PCO model toad hoc networks was studied for a multicarrier system in[18]. It was shown that false alarms have a higher impact onthe convergence than missed detections 1 − Pd. Hence, it isnecessary to maintain a sufficiently low false alarm rate [18].

The reliability of the link-level synchronization unit canbe enhanced by increasing the length of the sync word M.Increasing M improves the detection rate for a given falsealarm rate, at the expense of higher overhead [18].

4. Compensation of Propagation Delays

The accuracy of CelFSync is limited by propagation delays,similarly to the PCO model discussed in Section 2. In anindoor environment where distances between nodes aretypically small, propagation delays are negligible. However,for cellular systems where the inter-BS distance is up to afew kilometers, Section 4.1 reveals that propagation delayscannot be ignored. A common procedure to align uplinktransmissions is the timing advance procedure, described inSection 4.2. Timing advance is combined with CelFSync inSection 4.3 to achieve a timing accuracy within a fraction ofthe inter-BS propagation delays.

4.1. Achieved Accuracy in the Stable State. After CelFSyncconverges and reaches a stable state, reference instants of BSsand UTs are out-of-phase synchronized (see Figure 6), andno phase increments occur. In the following discussion asufficient refractory period (5) is assumed; then stability ismaintained and the achieved timing accuracy in the stablestate between any two nodes is bounded by (7). In thepresence of propagation delays, the stable state condition (6)in terms of the reference instants of BSa and UTi translates to

τBS,a ∈[τUT,i + Δ− νai, τUT,i + Δ + νai

], (16)

where νai is the propagation delay between BSa and UTi.When the upper bound in (16) is approached, then τBS,a =τUT,i + Δ + νai, UTi is the forcing node that imposes itstiming onto BSa. Likewise, (16) approaches the lower bound,τUT,i = T − Δ + τBS,a + νai, when BSa is the forcing node thatimposes its timing onto UTi.

The effect of propagation delays on the achieved inter-BS accuracy in the stable state is analyzed with the aid of acase study, where two BSs are synchronized via one UT, asdepicted in Figure 5. This case study resembles the discussionfor a network with N = 3 nodes presented in Section 2.4.2.Clearly, the worst case inter-BS timing misalignment isencountered when one BS is the forcing node. Then thetwo end nodes BSa and BSb synchronize by hopping overUTi, so that the timing misalignments over two hops addup. Applying the bound (16), the inter-BS accuracy is upper


bounded by the sum of the BSa to UTi and UTi to BSbpropagation delays:

∣∣τBS,b − τBS,a∣∣ ≤ νai + νbi. (17)

Given that in cellular networks the inter-BS distance is up toa few kilometers, propagation delays have a major impact onthe achieved accuracy in the stable state.

4.2. Timing Advance Procedure. As UTs are arbitrarily dis-tributed within the cell, the distance dai between UTi toBSa varies. Since propagation delays are distance dependentthrough νai = dai/c, where c is the speed of light, theobserved timing reference of BSa measured at differentUTs, denoted τBS,a = τBS,a + νai, are mutually different.To ensure that uplink transmissions arrive simultaneouslyat their own base station, timing advance is a commonprocedure in current cellular systems [19] and in wiredtelecommunication systems [20]. For timing advance UTi

advances its transmission by νai, the propagation delay toits serving BS, taken to be BSa (see Figure 5). The uplinkreference instant of UTi including timing advance is givenby

τUTA,i = τUT,i − νai. (18)

The propagation delay νai may be determined by esti-mating the round trip delay between BSa and UTi [21].Upon reception of DL Sync from BSa, UTi responds with thetransmission of a random access preamble (RAP) at τRAP,i =τBS,a+TRAP. Since TRAP is a constant known to BSa, the roundtrip delay 2νai is determined by detecting the received timingof the RAP at BSa. In addition, the RAP identifies UTi, so thatBSa can distribute the estimate of νai to UTi.

4.3. CelFSync with Timing Advance. In order to combatpropagation delays, we propose to combine CelFSync withthe timing advance procedure. If UTi knows the propagationdelay to its serving base station BSa, the corresponding roundtrip delay of 2νai can be compensated. Owing to the multi-point-to-point topology specific to cellular networks, BSa ofcell A typically serves several mobiles UTi, i ∈A, each with aspecific propagation delay νai. Hence, all timing inaccuracies,the propagation delays from BSa to UTi and back from UTi

to BSa, must be compensated for at the mobile UTi. This isaccomplished by advancing both, the transmitted UL Syncand the coupling of the received DL Sync at UTi, by the BS-UT propagation delay νai.

For the following discussion, suppose that UTi hascarried out the timing advance procedure with BSa, but itsUL Sync transmission is received by BSb.

UT-BS Coupling. For CelFSync with timing advance, UTi

sends the uplink sync word UL Sync at the advancedreference instant τUTA,i = τUT,i − νai in (18). Then a phaseincrement occurs at BSb at instant θUTA,i = τUTA,i +Δ+ νbi, sothat (8) is transformed to

φBS,b(θUTA,i

) −→ φBS,b(θUTA,i

)+ ΔφBS

(φBS,b

(θUTA,i

))(19)

with θUTA,i = τUT,i + Δ + νbi − νai.

T

Δ T − Δ

νai νai

UTi

BSa

BSb

TransmitUL Sync

TransmitUL Sync

TBS, dec + νai

TBS, dec + νbi

TransmitDL Sync

TransmitDL Sync

TUT, dec

TUL, dec+νbi − νai

τUT, i τBS, a τUT, i

τUTA, i θUTA, i t

Figure 7: Combination of CelFSync with timing advance.

BS-UT Coupling. For BS-UT coupling (11), we propose toalso advance the coupling by the propagation delay. So giventhat UTi is timing aligned to BSa, but receives DL Sync fromBSb, the mobile UTi advances its coupling by νai. Then thereceived DL Sync from BSb leads to a phase increment at UTi

at instant θBSA,b = θBS,b − νai, so that (11) changes to

φUT,i(θBSA,b

) −→ φUT,i(θBSA,b

)+ ΔφUT

(φUT,i

(θBSA,b

))(20)

with θBSA,b = θBS,b − νai = τBS,b − Δ + νbi − νai.Figure 7 summarizes the proposed combination of CelF-

Sync with timing advance: UTi starts transmision at τUTA,i =τUT,i−νai, so that the coupling at BSa occurs exactly at τBS,a =τUT,i + Δ; in return, BSa starts transmission of its sync word,whose decoding time is reduced at UTi by νai so that UTi firesexactly T −Δ after BSa. Hence, all entities within one cell areperfectly timing aligned, and thus, the only remaining sourceof timing inaccuracies is between entities of neighboringcells.

In the synchronized steady state, sync words observed atθUTA,i and θBSA,b must fall into the refractory period, suchthat τBS,b ≤ θUTA,i < τBS,b + Trefr for UT-BS coupling, andτUT,i ≤ θBSA,b < τUT,i + Trefr for BS-UT coupling. The steadystate accuracy between BSb and UTi is bounded by the twoextreme cases when either BSb or UTi is the forcing node.In case UTi is forcing, the observed timing at BSb yieldsτBS,b = τUT,i + Δ + νbi − νai. Otherwise, if BSb is forcing, thetiming imposed on UTi amounts to τUT,i = τBS,b−Δ+νbi−νai.This means that the achieved accuracy in the steady statebetween BSb and UTi is bounded by

τBS,b ∈[τUT,i + Δ− ∣∣νbi − νai

∣∣, τUT,i + Δ +∣∣νbi − νai

∣∣].(21)

Therefore combining timing advance with CelFSync alwaysachieves an accuracy, that is, bounded by the difference of UT-BS propagation delays.

In order to analyze the achieved inter-BS accuracy, thecase study depicted in Figure 5 and discussed in Section 4.1is revisited. Given that UTi is time aligned to BSa, that is,τUTA,i = τUT,i−νai, the only remaining source of inaccuracies


is the link from UTi to BSb, so that the UT-BS accuracybound (21) can be directly applied. Substituting τUT,i =τBS,a − Δ into (21), the inter-BS accuracy between BSa andBSb over two hops is bounded to

∣∣τBS,b − τBS,a∣∣ ≤ ∣∣νai − νbi

∣∣. (22)

Provided that UTi is located near the cell boundary, itspropagation delays to BSa and BSb are similar, so that thedifference |νai − νbi| is much smaller than the individualdelays νai and νbi. This is in sharp contrast to the achievedaccuracy without timing advance in (17), which is boundedby the sum of propagation delays. Increasing the UT densityper cell NUT,tot increases the probability of selected UTs tobe close to the cell edge, which has the appealing effect thatthe inter-BS accuracy (22) improves. The accuracy bound isextended to multiple UTs in the Appendix.

The working principle of CelFSync including timingadvance is summarized as follows.

(i) UTi connects to the BS with the strongest receivedsignal strength, assumed to be BSa.

(ii) UTi aligns its timing to BSa by carrying out a timingadvance procedure, as described in Section 4.2.

(iii) If identified as active, UTi emits UL Sync at referenceinstants τUTA,i in (18) and adjusts its phase φUT,i uponreception of DL Sync according to (20).

5. Implementation Aspects

In order to integrate CelFSync into a cellular mobile radiostandard, several practical constraints need to be taken intoconsideration. Constraints regarding the frame structure andthe chosen duplexing scheme are addressed in this section.

5.1. Frame Structure. CelFSync is implemented and verifiedbased on the frame structure taken from the specificationsof the Wireless World Initiative New Radio (WINNER, URL:http://www.ist-winner.org.) system concept [22]. Consecu-tive downlink and uplink slots constitute one frame, anda number of successive frames form one super-frame ofduration T . One uplink and one downlink sync wordsUL Sync and DL Sync are placed into the superframe witha relative spacing of Δ, as illustrated in Figure 4.

The downlink sync word DL Sync allows UTs to synchro-nize to its BS and is therefore essential for cellular networks.Unlike DL Sync, the insertion of the uplink sync wordUL Sync adds overhead, as UL Sync is typically not requiredin current cellular networks. Fortunately, this overhead ismodest as UL Sync is typically transmitted with low rate. Forthe WINNER system the respective durations for superframeand UL Sync are 5.8 ms and 45 μs. Hence the resultingoverhead is less than 1% [22].

5.2. Acquisition and Tracking Modes. An intrinsic propertyof PCO synchronization is that coupling between nodeseffectively shortens period T . However, cellular systemstypically rely on a fixed frame structure, which specifies the

way uplink and downlink slots are arranged to exchangepayload data. To this end, whilst the reception of payloaddata is still ongoing, CelFSync may shorten the period oftwo successive reference instants to T′ ≤ T , which effectivelyshortens the duration of the superframe.

As long as the effective period T′ is only slightlyshortened, such that T − T′ ≤ ε, insertion of a guard timewith duration TG > ε ensures that reception of payloaddata is completed before a sync word is transmitted. Thecondition T − T′ ≤ ε corresponds to the tracking mode inthe steady synchronization state, where small offsets due toclock skews, leading to deviations of the natural oscillationperiod T between nodes, are compensated.

In case of coarse timing misalignments between cells, sothat T − T′ > ε, the network is in acquisition mode. Potentialconflicts in acquisition mode are avoided by

(i) suspending payload data transmission while intercellsynchronization is in progress;

(ii) shortening the superframe duration to Tsf < T .

Scheme (i) does not allow for exchange of payload databefore CelFSync has reached a steady state. Given that asteady state is likely to be maintained for hours or even days,while CelFSync typically converges within a fraction of asecond or so, the loss in system throughput due to suspendeddata transmissions may be acceptable. For instance, scheme(i) is applied to facilitate the synchronization procedure inthe wireless LAN standard 802.11 [23, 24]: periodically, datatransfer is preempted, and the access point transfers its clockvalue, known as timing synchronization function (TSF), tothe networks participants.

Scheme (ii) avoids conflicts by forcing the effective periodT′ to be at least as long as Tsf. By doing so, continuousexchange of payload data is maintained, at the expense ofreducing the throughput during acquisition by about (T −Tsf)/T .

5.3. Duplexing Scheme. CelFSync is applicable to bothtime division duplex (TDD) and frequency division duplex(FDD). Nodes adjust their internal clocks based on receivedsync words; whether the uplink and downlink sync wordsare transmitted on different frequency bands or not isirrelevant. The discussion in this paper targets half-duplextransmission, where nodes cannot receive and transmit atthe same time, applicable to TDD and half-duplex FDD.Full-duplex FDD benefits CelFSync, since nodes can transmitand receive simultaneously, which eliminates deafness due tomissed sync words whilst transmitting.

5.4. Imposing a Global Timing Reference. An inherent prob-lem of any distributed synchronization procedure is thatnodes agree on a relative time reference, that is, valid onlyamong the considered nodes and has no external tie. Such arelative reference is opposed to a global time reference suchas the Coordinated Universal Time, which is provided byGPS for example. Furthermore, as the size of the networkincreases, it becomes increasingly difficult to synchronize


the entire network in a completely decentralized manner.To avoid this difficulty, in [25] a scenario was consideredwhere only a few nodes have access to a global time reference.The PCO model was extended such that these master nodesimpose a global time reference to the entire network, eventhough the number of master nodes was only a small fractionof the total number of nodes in the network. Furthermore,the behavior of normal nodes that do not have access to aglobal time reference is not modified at all.

Applied to CelFSync a subset of BSs get access to aglobal time reference. These master BS emit downlink syncwords DL Sync with a slightly shortened period Tma <T , and are not receptive to sync words from other nodes[25]. Neighboring cells then align their reference instantsfollowing the synchronization rules outlined in Section 3.2.It was demonstrated in [25] that for 0.9T ≤ Tma < T ,arbitrarily large networks are reliably synchronized. By doingso the problem of synchronizing large networks with adistributed algorithm is reduced to synchronizing a numberof cells (typically up to 2 or 3 tiers) around a master BS.

6. Performance Evaluation

To evaluate the performance of CelFSync two deploymentscenarios are considered: first an indoor office scenario inSection 6.1; and second a macrocell deployment modeledby an hexagonal cell structure in Section 6.2 [26]. All nodestransmit with the same power Pt. The propagation channelbetween nodes i and j is modeled as a distance-dependentpathloss channel. Node j receives the transmission of anode i at a distance di j with power Ptd

−χi j , where χ is

the pathloss exponent. The signal-to-noise-plus-interferenceratio (SINR) of a received sync word is composed of thereceived power of the sync word, divided by the level ofinterference plus thermal noise with powerN0. The detectionthreshold is set for a given false alarm rate, which enables thecomputation of the detection probability Pd for each receivedsync word as a function of the current SINR (see Section 3.3).Unless otherwise stated, the parameters shown in Table 1 areused in the simulations.

Both environments impose different strains on CelFSync.In the indoor environment, sync words are subject to a highlevel of interference from other transmitting UTs. In theoutdoor environment, the large distance between UTs andBSs results in higher channel attenuations, creating a moresparsely connected network, which implies that networksynchronization is to be carried out over multiple hops.

In both scenarios, Monte-Carlo simulations are con-ducted for 5000 sets of initial conditions: all BSs initiallycommence with uniformly distributed internal timing refer-ences, while UTs are locally synchronized to their closest BS.Synchronization is declared when two groups have formed,so that reference instants of UTs are aligned and out-of-phasesynchronized with reference instants of BSs, with a relativetiming difference of Δ.

6.1. Indoor Office Environment. An indoor office with twocorridors and ten offices on each side is considered. This

Table 1: Default simulation parameters.

Parameter SymbolDefault value

Indoor Macrocell

Transmit power Pt 10 dBm

Pathloss exponent χ 4 3

Noise level N0 −93 dBm

False alarm rate Pfa 10−4

Sync word length M 32 symbols

Superframe duration T 5.89 ms

Out-of-phase offset Δ 0.11 ms

BS refractory TBS,refr 2.33 ms

BS couplingαBS 1.15

βBS 0.01

UT refractory TUT,refr 2.33 ms

UT couplingαUT 1.3

βUT 0.01

Number of BSs NBS 4 BSs 19 BSs

Number of active UTs NUT 15 UTs 3 UTs/cell

1

2

3

456

78

9

10

11

12

1415

16

17

18

19

0

5

10

15

20

25

30

35

40

45

50

Y(m

)

0 10 20 30 40 50 60 70 80 90 100

X (m)

BSUT

Figure 8: Considered indoor network topology.

setting was defined for the local area scenario in WINNER[27]. The network topology with NBS = 4 BSs and NUT =15 UTs participating in CelFSync is depicted in Figure 8.The selected UTs (marked as bold circles) can communicatedirectly with all BSs (marked as squares). UTs that do notparticipate in the network synchronization procedure do nottransmit UL Sync and adjust their slot oscillator based onreceived DL Sync.

Results plotted in Figure 9 elaborate on the time takenfor the entire network to synchronize. The time to synchronyTsync is normalized to the duration of a superframe T .Figure 9 plots the cumulative distribution function (CDF) ofthe normalized time to synchrony for different values of theBS-UT coupling factor αUT.

The performance of the proposed inter-BS synchroniza-tion scheme can be controlled by the coupling factor αUT. Fora high coupling value, αUT > 1.3, synchronization is reachedquickly, but convergence to a synchronized stable state is not


0

0.2

0.4

0.6

0.8

1

CD

FofT

syn

c

0 2 4 6 8 10

Tsync/T

αUT = 1.25αUT = 1.3

αUT = 1.4αUT = 1.5

Figure 9: CDF of the normalized time to synchrony in theconsidered indoor environment when varying the BS-UT coupling.

always achieved. The fraction of initial conditions that donot converge to this state is due to deafness among nodes:some part of the network transmits partially overlappingDL Sync and UL Sync sequences, and due to the half-duplexassumption, some nodes are thus not able to synchronize.The deafness probability increases with the coupling factorαUT, and for αUT = 1.5, it is approximately 10%. If thecoupling is low, αUT ≤ 1.3, synchronization is always reachedwithin Tsync = 10 periods, and for αUT = 1.3, 80% of initialconditions lead to synchrony within Tsync = 5 periods. Thisis encouraging given the fact that deafness among nodes doesnot occur when αUT ≤ 1.3, even though nodes start witha random initial timing reference. Setting αUT sufficientlylow reduces the absorption limit (4), which allows nodes toreceive more sync words in the synchronization phase. Thislowers the deafness probability, and enables the network tosynchronize starting from any initial timing misalignment.

6.2. Macrocell Deployment. For cellular networks, an hexag-onal cell structure is considered as shown in Figure 10. Oneor two tiers of BSs are placed around a center BS, resultingin a network of NBS = 7 and NBS = 19 BSs, respectively,each of radius of dcell = 1 km. The number of active UTsper cell, NUT, specifies the number of UTs that participate inCelFSync. Among the NUT,tot UTs randomly placed in eachcell, theNUT UTs closest to the cell edge are selected as active.

6.2.1. Time to Synchrony. In a similar manner to Figure 9,results plotted in Figure 11 depict the time to synchrony ofCelFSync in an hexagonal cell deployment for NBS = 7 BSsand NBS = 19 BSs. Coupling among UTs is also consideredwith strength αUT-UT = 1.05.

As expected, networks of NBS = 19 BSs convergeless rapidly than smaller networks of NBS = 7 BSs. Thisdegradation is due to the increase in network diameterfrom 4 hops to 8 hops. Moreover, the number of UTs percell participating in CelFSync, NUT, does not significantlychange the time to synchrony, and a synchrony rate of 80%

0

1000

2000

3000

4000

5000

6000

Y(m

)

0 1000 2000 3000 4000 5000 6000

X (m)

Base stationActive UTOther UT

Figure 10: Macrocell network topology composed of NBS = 7hexagonal cells with NUT = 3 active UTs per cell.

0

0.2

0.4

0.6

0.8

1

CD

FofT

syn

c

0 10 20 30 40 50

Tsync/T

NUT = 3 UTs/cellNUT = 5 UTs/cellNUT = 7 UTs/cell

NBS = 7

NBS = 19

Figure 11: CDF of the normalized time to synchrony for anhexagonal cell deployment scenario with NBS = 7 and NBS = 19base stations.

is achieved within 12T when NBS = 7 BSs and within 25Twhen NBS = 19 BSs. In all cases, a synchronization rate of100% is achieved within Tsync = 50 periods, which meansthat deafness between nodes, due to partially overlappingsync words, does not corrupt the convergence of CelFSync.

6.2.2. Achieved Inter-BS Accuracy. While in an indoor envi-ronment propagation delays are typically negligible, theopposite is true for the macrocell deployment (17). Theachieved inter-BS accuracy εab = |τBS,b − τBS,a| of CelFSync


0

0.2

0.4

0.6

0.8

1

CD

Fofε ab

0 0.5 1 1.5

εab (μs)

NUT, tot = 5 UTs/cellNUT, tot = 10 UTs/cell

NUT, tot = 25 UTs/cellNUT, tot = 50 UTs/cell

Figure 12: Achieved inter-BS accuracy for NBS = 19 BSs withtiming advance and NUT = 3 active UTs.

including timing advance is verified in Figure 12 for variousnode densities NUT,tot. Simulations are conducted over 100random network topologies, each with 200 sets of initialconditions. It is assumed that UTs are timing aligned withtheir closest BS, and that the number of active UTs per cell isset to NUT = 3 UTs per cell.

As the accuracy bound (22) suggests, the inter-BS accu-racy εab is significantly improved as the node density NUT,tot

increases. Augmenting NUT,tot increases the probability forselected UTs to be close to the cell edge, which decreases thedelay difference νbi − νai in (22). For a UT density equal orhigher than NUT,tot ≥ 25 UTs per cell, the achieved accuracyis bounded by εab < 0.5μs. This is a significant achievementas the propagation delay for an inter-BS distance of 2 dcell =2 km is νab ≈ 6.67μs.

7. Conclusion

This paper studied the application of self-organized syn-chronization inspired from the theory of pulse-coupledoscillators to cellular systems. The original algorithm wasmodified to align the timing references of base stations tosimultaneously transmit on downlink frames, and of userterminals to simultaneously transmit on uplink frames. Withthe proposed decentralized cellular firefly synchronization(CelFSync) algorithm, a local area wireless network com-posed of 4 base stations and 15 user terminals is always ableto synchronize within 10 periods. In large-scale networkswhere propagation delays are typically non-negligible, thetiming advance procedure, common in current cellularnetworks, was combined with CelFSync to combat the effectof propagation delays. By compensating intra-cell propa-gation delays with timing advance together with selectingcell edge users to participate in CelFSync, the detrimentaleffects of large propagation delays are substantially reduced.Simulation results demonstrated that the achieved inter-BS timing accuracy is always below 1μs when at least 10users are randomly distributed per cell, which corresponds

to approximately 15% of the direct propagation delay for aninter-BS spacing of 2 km.

Appendix

Achieved Accuracy for Multiple UTs

In the following the inter-BS accuracy bound (22) isextended to multiple UTs. Active UTs that are timing alignedto BSa and BSb are associated to cells A and B, respectively.Entities within cells A and B are perfect timing aligned,such that τBS,a = τUT,i + Δ, ∀i ∈ A, and τBS,b = τUT,i + Δ,∀i ∈ B. In line with the discussion in Section 4.3, timingmisalignments between entities belonging to different cellsare bounded by four extreme cases: either UTs in cell A orB are forcing by imposing their timing reference τUT,i toneighboring BS; alternatively either BSa or BSb force UTs inneighboring cells.

If UTs in cell A are forcing, then UTi, i ∈ A with theearliest timing reference τUT,i imposes its time reference toBSb, such that τBS,b = mini∈A{τUT,i + Δ + νbi − νai}. SinceτBS,a = τUT,i+Δ is valid for all entities within cell A the timingreference of BSb yields

τBS,b = τBS,a + mini∈A

{νbi − νai

}. (A.1)

Now consider the case when BSb forces UTs in cell A.For BS-UT coupling (20) the reference instant of BSb causesa phase adjustment at UTi at instant θBSA,b = τBS,b − Δ +νbi− νai. Since τBS,a = τUT,i +Δ generally holds for all entitiesin cell A, the UTi, i ∈ A whose UT-BS propagation delaysminimize the difference νbi − νai receives the earliest θBSA,b.This UT then triggers BSa and in turn the remaining UTs ofcell A, and hence determines the accuracy between BSb andthe UTs in cell A. When BSb is forcing UTs in cell A, thetiming reference of BSa therefore yields

τBS,a = τBS,b + mini∈A

{νbi − νai

}. (A.2)

Due to symmetry the remaining two cases, when either UTsof cell B force BSa or BSa forces UTs of cell B, are obtainedby exchanging a with b, and A with B in (A.1) and (A.2).This yields the inter-BS accuracy bound for CelFSync withtiming advance between two cells:

∣∣τBS,b − τBS,a∣∣ ≤ max

{∣∣∣mini∈B

{νai − νbi

}∣∣∣,∣∣∣mini∈A

{νbi − νai

}∣∣∣}.

(A.3)

If UTs are timing aligned to the BSs with the shortestdistance, the difference νbi − νai, for i ∈ A and νai − νbi,for i ∈ B, will always be positive. Hence, the bound (A.3)improves with growing numbers of UTs per cell |A| and|B|. Asymptotically, when |A|, |B| → ∞, the accuracyapproaches zero, so that the effect of propagation delaysis perfectly compensated. This trend is confirmed by thesimulation results presented in Section 6.2.2, which showthat the achieved inter-BS accuracy significantly improves asthe number of users per cell NUT,tot increases.


Acknowledgments

This work has been performed in the framework of the ISTproject IST-4-027756 World Wireless Initiative New Radio(WINNER), which is partly funded by the European Union.This paper was presented in part at the IEEE VehicularTechnology Conference (VTC 2008 Fall), Calgary, Canada,September 2008.

References

[1] S. Bregni, Synchronization of Digital Telecommunications Net-works, John Wiley & Sons, New York, NY, USA, 1st edition,2002.

[2] Y. Akaiwa, H. Andoh, and T. Kohama, “Autonomous decen-tralized inter-base-station synchronization for TDMA micro-cellular systems,” in Proceedings of the 41st IEEE VehicularTechnology Conference (VTC ’91), pp. 257–262, St. Louis, Mo,USA, May 1991.

[3] X. Lagrange and P. Godlewski, “Autonomous inter base stationsynchronisation via a common broadcast control channel,” inProceedings of the 44th IEEE Vehicular Technology Conference(VTC ’94), vol. 2, pp. 1050–1054, Stockholm, Sweden, June1994.

[4] S. Izumi, A. Hirukawa, and H. Takanashi, “PHS inter-base-station frame synchronization technique using UW withexperimental results,” in Proceedings of the 6th IEEE Inter-national Symposium on Personal, Indoor and Mobile RadioCommunications (PIMRC ’95), vol. 3, pp. 1128–1132, Toronto,Canada, September 1995.

[5] F. Tong and Y. Akaiwa, “Theoretical analysis of interbase-station synchronization systems,” IEEE Transactions on Com-munications, vol. 46, no. 5, pp. 590–594, 1998.

[6] R. E. Mirollo and S. H. Strogatz, “Synchronization of pulse-coupled biological oscillators,” SIAM Journal on AppliedMathematics, vol. 50, no. 6, pp. 1645–1662, 1990.

[7] R. Mathar and J. Mattfeldt, “Pulse-coupled decentral synchro-nization,” SIAM Journal on Applied Mathematics, vol. 56, no.4, pp. 1094–1106, 1996.

[8] Y.-W. Hong and A. Scaglione, “A scalable synchronizationprotocol for large scale sensor networks and its applications,”IEEE Journal on Selected Areas in Communications, vol. 23, no.5, pp. 1085–1099, 2005.

[9] O. Simeone, U. Spagnolini, Y. Bar-Ness, and S. H. Strogatz,“Distributed synchronization in wireless networks: globalsynchronization via local connections,” IEEE Signal ProcessingMagazine, vol. 25, no. 5, pp. 81–97, 2008.

[10] A. Tyrrell, G. Auer, and C. Bettstetter, “Biologically inspiredsynchronization for wireless networks,” in Advances in Biolog-ically Inspired Information Systems, pp. 47–62, Springer, NewYork, NY, USA, 2007.

[11] D. Lucarelli and I.-J. Wang, “Decentralized synchronizationprotocols with nearest neighbor communication,” in Pro-ceedings of the 2nd International Conference on EmbeddedNetworked Sensor Systems (SenSys ’04), pp. 62–68, Baltimore,Md, USA, November 2004.

[12] E. M. Izhikevich, “Phase models with explicit time delays,”Physical Review E, vol. 58, no. 1, pp. 905–908, 1998.

[13] U. Ernst, K. Pawelzik, and T. Geisel, “Synchronization inducedby temporal delays in pulse-coupled oscillators,” PhysicalReview Letters, vol. 74, no. 9, pp. 1570–1573, 1995.

[14] A. Tyrrell, G. Auer, and C. Bettstetter, “On the accuracy offirefly synchronization with delays,” in Proceedings of the 1st

International Symposium on Applied Sciences on Biomedicaland Communication Technologies (ISABEL ’08), pp. 1–5,Aalborg, Denmark, October 2008.

[15] E. M. Izhikevich, “Weakly pulse-coupled oscillators, FM inter-actions, synchronization, and oscillatory associative memory,”IEEE Transactions on Neural Networks, vol. 10, no. 3, pp. 508–526, 1999.

[16] E. Sourour and M. Nakagawa, “Mutual decentralized synchro-nization for intervehicle communications,” IEEE Transactionson Vehicular Technology, vol. 48, no. 6, pp. 2015–2027, 1999.

[17] H. L. Van Trees, Detection, Estimation, and ModulationTheory—Part I, John Wiley & Sons, New York, NY, USA, 2ndedition, 2001.

[18] L. Sanguinetti, A. Tyrrell, M. Morelli, and G. Auer, “On theperformance of biologically-inspired slot synchronization inmulticarrier ad hoc networks,” in Proceedings of the 67th IEEEVehicular Technology Conference (VTC ’08), pp. 21–25, MarinaBay, Singapore, May 2008.

[19] 3GPP TS 05.10, “Radio subsystem synchronization,” 3GPPStd. 3GPP TS 05.10, 1995.

[20] W. C. Lindsey, F. Ghazvinian, W. C. Hagmann, and K.Dessouky, “Network synchronization,” Proceedings of the IEEE,vol. 73, no. 10, pp. 1445–1467, 1985.

[21] 3GPP, “Evolved Universal Terrestrial Radio Access (E UTRA);physical channels and modulation,” 3GPP Std. 3GPP TS36.211, 2008.

[22] IST-4-027756 WINNER II, “D6.13.14: WINNER II SystemConcept Description,” December 2007.

[23] IEEE Std. 802.11a, “Wireless LAN Medium Access Control(MAC) and Physical Layer (PHY) Specifications,” 1999.

[24] D. Zhou and T.-H. Lai, “Analysis and implementation ofscalable clock synchronization protocols in IEEE 802.11 adhoc networks,” in Proceedings of IEEE International Conferenceon Mobile Ad-Hoc and Sensor Systems, pp. 255–263, FortLauderdale, Fla, USA, October 2004.

[25] A. Tyrrell and G. Auer, “Imposing a reference timing ontofirefly synchronization in wireless networks,” in Proceedings ofthe 65th IEEE Vehicular Technology Conference (VTC ’07), pp.222–226, Dublin, Ireland, April 2007.

[26] IST-4-027756 WINNER II, “D1.1.2: WINNER II ChannelModels,” December 2007.

[27] IST-4-027756 WINNER II, “D6.13.7: Test Scenarios andCalibration Cases Issue 2,” December 2006.


Research Article

Discrete-Time Second-Order DistributedConsensus Time Synchronization Algorithm forWireless Sensor Networks

Gang Xiong and Shalinee Kishore

Department of Electrical and Computer Engineering, Lehigh University, Bethlehem, PA 18015, USA

Correspondence should be addressed to Shalinee Kishore, [email protected]

Received 14 April 2008; Accepted 7 September 2008


This paper proposes a novel discrete-time second-order distributed consensus time synchronization (SO-DCTS) algorithm forwireless sensor networks. The consensus properties and convergence rates of the SO-DCTS algorithm are analyzed for bothdirected and undirected networks. Additionally, the convergence region and optimal convergence rate of the SO-DCTS algorithmare determined for undirected networks and this convergence rate is shown to be superior to that of the first-order DCTS (FO-DCTS) algorithm under careful algorithm design. Furthermore, the asymptotic expectation and mean square synchronizationerror are investigated for the SO-DCTS algorithm when there is Gaussian delay between network nodes. Finally, simulation resultsare provided to verify these analytical results.

Copyright © 2009 G. Xiong and S. Kishore. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

1. Introduction

Wireless sensor networks are typically comprised of inex-pensive, small-sized, power-limited terminals. In a varietyof applications, these sensor nodes are collectively requiredto maintain accurate time synchronization, for example,moving object acquisition and tracking, habitat monitoring,reconnaissance and surveillance, environmental monitoring,traffic control, and so forth [1]. This necessitates networkalgorithms that achieve and maintain global time synchro-nization at all network nodes, that is, algorithms that alignall network nodes to a common notion of time.

Due to imperfections in low-cost hardware nodes andthe decentralized nature of wireless sensor networks, globaltime synchronization has been recognized as a particularlychallenging task. Conventional synchronization protocolssuch as time-synchronization protocol for sensor networks(TPSNs) [2], reference broadcast synchronization (RBS) [3],and flooding time synchronization protocol (FTSP) [4] aimto perform centralized global synchronization for all nodesin wireless sensor network [5]. These protocols achievesynchronization via time-stamped packet exchanges with a

root node or a data fusion center and are thus vulnerable tofailure of these central nodes.

Recently, several distributed time synchronization algo-rithms have been proposed. One important class of suchalgorithms is referred to as distributed consensus timesynchronization (DCTS) [6]. In the DCTS approach, aglobal time consensus can be sufficiently reached withina connected network by averaging pair-wise local timeinformation at network nodes. In [7], Olfati-Saber et al,established a theoretical framework for the analysis of con-sensus synchronization algorithms. Later, a fully distributed,asynchronous DCTS algorithm was proposed in [8]; thisscheme was designed to reach agreement on time offsetand skew offset between network nodes using media accesscontrol (MAC) layer time-stamped packet exchanges. Asan alternative, a physical layer-based DCTS algorithm wasintroduced in [9] by modeling sensor nodes as coupleddiscrete-time oscillators. In particular, the algorithm adoptsinstantaneous received powers as weighted coefficients whenupdating local clocks.

To the best of our knowledge, existing literature onDCTS methods assumes that local timing update at each


node is done using only current timing information, that is,via a first-order DCTS (FO-DCTS) approach. In contrast, asecond-order DCTS (SO-DCTS) algorithm would utilize notonly current timing information but also that available fromthe previous iteration of the algorithm to update local clocks.Such an extension to the basic consensus algorithm was firstreported for a continuous time approach in [10]. Subsequentpapers have analyzed this second-order continuous timeconsensus method assuming fixed network topologies [11],time delay [12], and switching topologies [13]. In thispaper, we apply the principles of the second-order consensusapproach to the distributed timing synchronization problemin wireless sensor networks. Specifically, we propose anovel discrete-time SO-DCTS algorithm for wireless sensornetworks and examine its convergence properties.

The major contribution of this paper is the theoreti-cal analysis of the convergence characteristics of the SO-DCTS algorithm for both directed and undirected networks.Moreover, we investigate the convergence region and optimalconvergence rate of the SO-DCTS algorithm in undirectednetworks and claim that the optimal convergence rate ofthe SO-DCTS algorithm is superior to that of the FO-DCTSalgorithm under an appropriate algorithm design. Finally,we present the asymptotic expectation and mean squaresynchronization error of the SO-DCTS algorithm when thetiming offset between network nodes is Gaussian distributed.

This paper is outlined as follows. Section 2 describes thesystem model and background for the proposed SO-DCTSalgorithm. Section 3 presents the convergence propertiesof the SO-DCTS algorithm in directed and undirectednetworks. Section 4 discusses the convergence region andoptimal convergence rate of the SO-DCTS algorithm inundirected networks. In Section 5, we present some conver-gence results for the SO-DCTS method when network nodeshave Gaussian delay between each other. Simulation resultsare presented in Section 6, and we conclude our discussionin Section 7.

2. Background and System Model

2.1. Proposed SO-DCTS Algorithm. Timing informationbetween network nodes can be exchanged either using time-stamped packets at the MAC layer or by estimating arrivaltimes of neighboring nodes’ physical layer pulse signals. Inthe following, we describe the SO-DCTS method regardlessof whether it is implemented at the MAC or physical layers.In each iteration of the SO-DCTS algorithm, each nodeprocesses and decodes the time-stamped message from itsneighbors or estimates the arrival time of its neighbors’ pulsesignals. Each node then updates its local clock using theweighted average of the current time differences betweenitself and its neighboring nodes as well as stored timedifferences from the previous iteration of the algorithm. Itshould be noted that in the SO-DCTS algorithm, each nodeneeds to store time information from its neighbor nodes forboth the previous and current iterations; this is in contrastto the FO-DCTS approach where only the current timeinformation is processed in the current iteration.

The timing update rule of the SO-DCTS algorithm ateach node i is given as

ti(k) = ti(k − 1) + ε∑

j∈Ni

[t j(k − 1)− ti(k − 1)

]

− γε∑

j∈Ni

[t j(k − 2)− ti(k − 2)

],

(1)

where ti(k) is the local time at node i during iteration k; Ni isthe set of neighboring nodes that can communicate reliablywith node i; ε is a constant step size; γ is a constant for eachiteration. We assume that initial conditions of the SO-DCTSalgorithm are ti(−1) = ti(0) = zi, where zi is initial timeoffset for node i. It is worth mentioning that when γ = 0, theSO-DCTS algorithm reduces to the FO-DCTS algorithm.

2.2. Network Model and Some Preliminaries. In the following,we model a wireless sensor network as a graph G = (V, E),consisting of a set of n nodes V = {1, 2, . . . ,n} and a setof edges E . Each edge is denoted as e = (i, j) ∈ E wherei ∈ V and j ∈ V are head and tail of the edge e, respectively.In a wireless sensor network, the presence of an edge (i, j)indicates that node i can communicate with node j reliably.We assume here a connected graph; that is, there exists adirected path connecting any pair of distinct nodes in thenetwork. Throughout our discussion, we assume a time-invariant and connected network unless otherwise stated.

Given this network model, we denote A = [ai j] as theadjacency matrix of G such that

ai j ={

1, (i, j) ∈ E ,

0, otherwise.(2)

Then, the in-degree and out-degree of a node i (denoted asci and di, resp.,) are given as ci =

∑nj=1aji, and di =

∑nj=1ai j .

Specifically, di is also equal to the number of neighbors ofnode i from which it can receive information reliably, that is,di = |Ni|.

Next, we let L be the graph Laplacian matrix of G whichis defined as L = D − A, where D = diag(d1,d2, . . . ,dn) isthe degree matrix of G. Given this matrix L, we can showthat L1 = 0, where 1 = [1, 1, . . . , 1]T, and 0 = [0, 0, . . . , 0]T.In particular, for a connected graph, the rank of L is n − 1.Furthermore, for a balanced directed network, the in-degreeand out-degree of a node are equal, that is, ci = di, thus wesee that 1TL= 0T.

For an undirected network, the presence of an edge(i, j) indicates that nodes i and j can communicate witheach other reliably. Under this condition, we can also showthat 1TL= 0T. Additionally, in this case, L is a symmetricpositive semidefinite matrix (implying that its eigenvaluesare nonnegative), and its eigenvalues can be arranged inincreasing order as 0 = λ1(L) < λ2(L) ≤ · · · ≤ λn(L) [14].

Let us define �t(k) = [t1(k), t2(k), . . . , tn(k)]T. Theevolution of the SO-DCTS algorithm in (1) can be writtenas

�t(k) = (In − εL)�t(k − 1) + γεL�t(k − 2), (3)


with the initial conditions �t(−1) = �t(0) = �z, where �z =[z1, z2, . . . , zn]T. Here, In denotes the n× n identity matrix.

3. Convergence Properties ofthe SO-DCTS Algorithm

In this section, we investigate the consensus properties of theSO-DCTS algorithm in directed and undirected networks.Additionally, we discuss the convergence rate of the SO-DCTS algorithm in such networks.

3.1. Consensus Analysis of the SO-DCTS Algorithm. Themain result regarding the average consensus property of theSO-DCTS algorithm in directed networks is stated in thefollowing theorem.

Theorem 1. For a time-invariant, connected, directed net-work, consider the SO-DCTS algorithm,

�t(k) = (In − εL)�t(k − 1) + γεL�t(k − 2), (4)

with initial conditions�t(−1) =�t(0) =�z. Define

H =[In − εL γεL

In 0n×n

], J =

[K 0n×nK 0n×n

], (5)

where K = 1�βT/(�βT1) and �β is the left eigenvector of Lassociated with λ1(L) = 0. When ρ(H − J) < 1, a globalconsensus is achieved asymptotically, or equivalently,

limk→∞

ti(k) =�βT�z�βT1

, ∀i ∈ V, (6)

where ρ(·) denotes the spectral radius of a matrix.

Proof. The proof of this theorem is similar to [11, 15].Here, we present a sketch proof. Let us define �ψ(k) =[�t(k)T �t(k − 1)T]T. Then, the SO-DCTS algorithm in (4) canbe rewritten as

�ψ(k) = H�ψ(k − 1), (7)

which implies �ψ(k) = Hk�ψ(0). To calculate the eigenvaluesof H , we have [16]

det(H − λI2n

) = det(λ2In +

(εL− In

)λ− γεL)

=n∏

i=1

[λ2 +

(ελi(L)− 1

)λ− γελi(L)

]

= 0.

(8)

Thus, the eigenvalues of H are

λk(H) = 12

[1− ελi(L)±

√(1− ελi(L)

)2+ 4γελi(L)

]. (9)

For a time-invariant, connected, directed network, L hasonly one eigenvalue λ1(L) = 0. Then, we know that H has

two eigenvalues λ1(H) = 1 and λ2(H) = 0. Additionally,the eigenvalues of H − J agree with those of H except thatλ1(H) = 1 is replaced by λ1(H−J) = 0 [16]. Since ρ(H−J) <1, we see that the eigenvalues of H stay inside the unit circleexcept for λ1(H) = 1. Thus, we have

limk→∞

Hk = V limk→∞

[1 01×(2n−1)

0(2n−1)×1 Λk

]V−1

= V

[1 01×(2n−1)

0(2n−1)×1 0(2n−1)×(2n−1)

]V−1

= �wr �w Tl ,

(10)

where Λ is the Jordan form matrix corresponding toeigenvalues λi(H) /= 1 [16], �wl and �wr are left and righteigenvectors of H corresponding to λ1(H) = 1, respectively,

and �wTr �wl = 1. In particular, �wl = (1/�βT1)[�βT 0T]T and

�wr = [1T 1T]T. Plugging �wl and �wr into (10) and consideringthe SO-DCTS algorithm in (7), we have

limk→∞

�ψ(k) = 1�βT1

⎡⎣1�βT 0n×n

1�βT 0n×n

⎤⎦⎡⎣�t(0)

�t(−1)

⎤⎦ , (11)

which indicates that

limk→∞

ti(k) =�βT�z�βT1

. (12)

This completes the proof.

According to Theorem 1, we see that in general, althoughaverage consensus is not achieved for directed networks, allnodes in the network can still reach a global agreement.By “average consensus” we mean that all nodes convergeto the same timing which is determined by the average ofthe initial timing differences between the nodes. However,when the SO-DCTS algorithm is employed in either anundirected network or a balanced directed network, averageconsensus can be achieved asymptotically. We show this viathe following theorem.

Theorem 2. Consider the SO-DCTS algorithm in (4) in atime-invariant, connected, directed balanced network or atime-invariant, connected, undirected network, with initialconditions�t(−1) =�t(0) = �z. When ρ(H − J) < 1, an averageconsensus is achieved asymptotically, or equivalently,

limk→∞

ti(k) = 1n

1T�z, ∀i ∈ V. (13)

We know that in a time-invariant, connected, directedbalanced or undirected network, �β = 1 and K = (1/n)11T.The rest of proof is similar to that of Theorem 1 and thusomitted here.

3.2. Convergence Rate for SO-DCTS Algorithm. One ofthe most important measures of any distributed iterativealgorithm is its convergence speed. As we show next, the


convergence speed of the SO-DCTS algorithm is determinedby the spectral radius of H − J , which is similar to the FO-DCTS algorithm [17].

Let us define the global consensus value in each iteration

as m(k) = (1/�βT1)�βT�t(k). In the SO-DCTS algorithm, thisvalue remains invariant during each iteration since

m(k) = (1/�βT1)�βT[(In − εL

)�t(k − 1) + γεL�t(k − 2)]

= m(k − 1) = · · · = m(0).(14)

We now define the disagreement vector as �δ(k) =�t(k) −m(k)1, which indicates the difference between the updatedtimes and the global consensus times of the network nodes.Then, the evolution of the disagreement vector is obtained as

�δ(k) = (In − εL)�δ(k − 1) + γεL�δ(k − 2). (15)

Given this dynamic of the disagreement vector, we notethe following Lemma.

Lemma 1. For the SO-DCTS algorithm in (4) in a time-invariant, connected network with initial conditions�t(−1) =�t(0) = �z and α = ρ(H − J) < 1, a global consensus is expo-nentially reached in the following form:

‖�δ(k)‖2 + ‖�δ(k − 1)‖2

‖�δ(0)‖2≤ 2α2k , (16)

where ‖·‖ denotes the 2 norm of a vector.

Proof. Let us define the error vector as �e(k) = [�δT(k) �δT(k −1)]T which can be obtained from�e(k) = �ψ(k)−J1�ψ(k), where

J1 =[K 0n×n

0n×n K

]. (17)

Based on this definition, we see that the error vector resultsin the following evolution:

�e(k) = (H − J1H)�ψ(k − 1)

= (H − J)[�ψ(k − 1)− J1�ψ(k − 1)]

= (H − J)�e(k − 1).

(18)

The above equation is valid because (H − J)J1= 02n×2n andJ1H = J . Then, we have

‖�e(k)‖2 = ‖(H − J)�e(k − 1)‖2 ≤ α2‖�e(k − 1)‖2

≤ · · · ≤ α2k‖�e(0)‖2,(19)

which is equivalent to (16). This completes the proof.

Therefore, we see that the convergence rate for the SO-DCTS algorithm in both directed and undirected networksis determined by the spectral radius of H − J .

0

0.11

1.5

22.5

3

3.54

|λk(H

)|

10−1

−2−3

−4−5

γ

00.5 1

1.52 2.5

3

ελn(L)

Convergence region

Figure 1: Convergence region for the SO-DCTS algorithm inundirected networks: three-dimensional view.

4. Convergence Region and OptimalConvergence Rate for SO-DCTS Algorithmin Undirected Networks

In this section, we investigate more specific convergenceresults (i.e., the convergence region and optimal convergencerate) for the SO-DCTS algorithm in undirected networks.Without loss of generality, we assume that ε and γ are realvalues, and ε > 0.

4.1. Convergence Region for SO-DCTS Algorithm in Undi-rected Networks. From Theorem 2, we know that whenρ(H − J) < 1, the SO-DCTS algorithm in an undirectednetwork can achieve average consensus asymptotically. Letus define the convergence region R to satisfy ρ(H − J) < 1.After some algebraic derivations (outlined in Appendix A),the convergence region for the SO-DCTS algorithm inundirected networks is

R =R† ∪R‡, (20)

where R† = {−1/(ελn(L)) < γ < 1, 0 < ε < 1/λn(L)}, andR‡ = {−1/(ελn(L)) < γ < 2/(ελn(L)) − 1, 1/λn(L) ≤ ε <3/λn(L)}.

The convergence region of the SO-DCTS algorithm inundirected networks is shown in Figures 1 and 2 using athree-dimensional and two-dimensional perspective, respec-tively. We see that compared to the FO-DCTS algorithmwhere the range of the step size ε is (0, 2/λn(L)), the rangeof ε in the SO-DCTS approach increases to (0, 3/λn(L)).

4.2. Optimal Convergence Rate for SO-DCTS Algorithmin Undirected Networks. Next, we investigate the fastestconvergence rate of the SO-DCTS algorithm based on ε andγ. Recall that in the FO-DCTS algorithm, the constant stepsize εopt,FO which minimizes convergence time is given as [15]

εopt,FO = 2λn(L) + λ2(L)

. (21)


Additionally, the convergence rate for εopt,FO is determined bythe second largest absolute eigenvalue of the Perron matrix[18], that is,

αopt,FO = λn(L)− λ2(L)λn(L) + λ2(L)

. (22)

As we show next, the convergence rate of the SO-DCTSalgorithm can be superior to that of the FO-DCTS algorithmby choosing suitable ε and γ. However, as stated in thefollowing lemma, the convergence rate of the FO-DCTSalgorithm is faster under some circumstances.

Lemma 2. For the SO-DCTS algorithm in (4) in a time-invariant, connected, undirected network with initial condi-tions�t(−1) = �t(0) = �z and (ε, γ) ∈ R in (20), if γ > 0, theconvergence rate of the SO-DCTS algorithm is less than that ofthe FO-DCTS algorithm with the optimal constant step size in(21).

The proof of this lemma is omitted here since it canbe readily extended from the following result. Considertwo real values a and b with b > 0, then max{(1/2)|a +√a2 + b|, (1/2)|a −

√a2 + b|} > a. Thus, we have |λk(H)| >

1− ελi(L), which implies |λk(H)| > αopt,FO.Based on the above lemma, we see that there may exist

possible choices of ε and γ (e.g., when γ < 0) such that theconvergence rate of the SO-DCTS method is faster than theFO-DCTS algorithm. To see this, we formulate the followingspectral radius minimization problem to find the optimal εand γ for the SO-DCTS algorithm:

minimize ρ(H − J)subject to (ε, γ) ∈R, γ < 0.

(23)

Using the steps outlined in Appendix B, the optimal ε andγ to minimize (23) can be obtained as

εopt,SO = 3λn(L) + λ2(L)λn(L)

[λn(L) + 3λ2(L)

] ,

γopt,SO = −[λn(L)− λ2(L)

]2

[λn(L) + 3λ2(L)

][3λn(L) + λ2(L)

] .

(24)

It is worth noting that (εopt,SO, γopt,SO) ∈ R‡. Recallthat the convergence rate for the SO-DCTS algorithm inundirected networks is determined by the spectral radius ofH − J , that is,

αopt,SO = λn(L)− λ2(L)λn(L) + 3λ2(L)

. (25)

We see that αopt,SO ≤ αopt,FO and αopt,SO = αopt,FO only whenλ2(L) = λn(L). Thus, we have the following theorem for theconvergence rate of the SO-DCTS algorithm.

Theorem 3. For the SO-DCTS algorithm in (4) in a time-invariant, connected, undirected network with initial condi-tions �t(−1) = �t(0) = �z and (ε, γ) ∈ R in (20), there existsa pair of ε and γ such that the convergence rate of the SO-DCTS algorithm is greater than or equal to that of the FO-DCTS algorithm with the optimal constant step size in (21).

−5

−4

−3

−2

−1

0

1

2

γ

0 0.5 1 1.5 2 2.5 3

ελn(L)

Convergence region

Figure 2: Convergence region for the SO-DCTS algorithm inundirected networks: two-dimensional view.

5. SO-DCTS Algorithm with GaussianDelay in Undirected Networks

In this section, we investigate the convergence properties ofthe SO-DCTS algorithm in undirected networks when thereis both deterministic and random (Gaussian) delay betweennetwork nodes during local time information exchange.In [19], we motivate why the Gaussian assumption isappropriate to model the undeterministic timing differencesbetween nodes exchanging either MAC layer or physical layertiming information. We do not reiterate those argumentshere but rather present convergence results for the SO-DCTS algorithm when such timing differences exist. Wehave separately examined the performance of the SO-DCTS algorithm considering alternate delay distributions,for example, exponential delay distribution [20]. Resultsshow similar performance bounds as those presented inthis paper for the Gaussian assumption. For this reason, weconstrain our discussion here to the more common Gaussiandelay model.

With Gaussian delay, the timing update rule of the SO-DCTS algorithm at each node i is given as

ti(k) = ti(k − 1) + ε∑

j∈Ni

[t j(k − 1)− ti(k − 1)

]

− γε∑

j∈Ni

[t j(k − 2)− ti(k − 2)

],

(26)

where t j(k) = t j(k) + Tdelay = t j(k) + Tc + Li j /c + vj(k); Tc isa constant (deterministic) delay; Li j is the distance betweennode i and j; c is light speed (thus, Li j /c is the propagationdelay between nodes i and j); vj(k) are independent identicaldistributed (i.i.d) Gaussian random variables, with zeromean and variance σ2. Local time information exchangebetween node i and j under this delay model is shown inFigure 3.


Node iti(k) ti(k + 1)

Node jt j(k) ti(k + 1)

Tc + Li j /c + vj(k) Tc + Li j /c + vi(k + 1)

Figure 3: SO-DCTS algorithm with Gaussian delay during localtime information exchange.

The SO-DCTS algorithm in (26) can be rearranged as

ti(k) = ti(k − 1) + ε∑

j∈Ni

[t j(k − 1)− ti(k − 1)

]

− γε∑

j∈Ni

[t j(k − 2)− ti(k − 2)

]+ ni(k − 1),

(27)

where ni(k− 1) = (1− γ)ε∑

j∈Ni(Tc +Li j /c) + ε

∑j∈Ni

[vj(k−1)− γvj(k − 2)].

Let us define the noise vector �n(k) = [n1(k),n2(k),. . . ,nn(k)]T. Based on this definition, the evolution of SO-DCTS algorithm in (27) can be written as

�t(k) = (In − εL)�t(k − 1) + γεL�t(k − 2) + �n(k − 1). (28)

We now define �v(k) = [v1(k), v2(k), . . . , vn(k)]T and �u =[u1,u2, . . . ,un]T, where ui =

∑j∈Ni

(Tc + Li j /c). Then, the

noise vector in (28) is given as�n(k−1) = ε[(1−γ)�u+A(�v(k−1)− γ�v(k − 2))].

Let us additionally define �ζ(k) = [�n(k)T 0T]T. Then,(28) can be rewritten as

�ψ(k) = H�ψ(k − 1) + �ζ(k − 1). (29)

Recall that for undirected networks, the average value ineach iteration is m(k) = (1/n)1T�t(k). Thus, the mean andvariance of the average value m(k) are given in the followinglemma.

Lemma 3. For the SO-DCTS algorithm in (28), the mean andvariance of the average value m(k) are given as

E[m(k)] = m(0) +k

n

n∑

i=1

ui,

var [m(k)] = kε2σ2(1 + γ2

)

n2

n∑

i=1

d2i .

(30)

The proof of this lemma is straightforward and thusomitted from this paper. It can be seen that as iteration timeincreases, both mean and variance in (30) increase linearlywith the time index k, that is, as the algorithm evolves.

Furthermore, the variance of m(k) increases linearly with thevariance of the random Gaussian delay, σ2. As we will see inour numerical results, although the average valuem(k) growslinearly with iteration time when there is Gaussian delay inthe network, an average consensus may still be achievableunder certain network topologies.

5.1. Expectation and Second Central Moment of Error Vector.In order to understand the convergence property of SO-DCTS algorithm with Gaussian delay, we first quantify theoverall impact of uncertainty by computing the first twomoments of the disagreement vector.

With Gaussian delay, we see that the error vector �e(k)results in the following evolution:

�e(k) = P�e(k − 1) +Q�ζ(k − 1), (31)

where P = H−J andQ = I2n−J1. Then, we have the followinglemma.

Lemma 4. For the SO-DCTS algorithm in (28), the expecta-tion of the error vector �e(k) is given by

�e(k) = Pk�e(0) + (1− γ)εk−1∑

l=0

PlQ�u1, (k ≥ 1), (32)

where �u1 = [�uT0T]T.

The proof of this lemma is straightforward and thusomitted from this paper.

Let us define the second central moment of the errorvector as κe(k) = E{(�e(k) − E[�e(k)])T(�e(k) − E[�e(k)])}and the covariance matrix of the error vector as Σe(k) =E{(�e(k)−E[�e(k)])(�e(k)−E[�e(k)])T}. It is worth mentioningthat κe(k) = tr(Σe(k)), where tr(·) denotes the trace of amatrix. Additionally, let us denote the covariance matrix of�ζ(k) as Σζ = E{(�ζ(k) − E[�ζ(k)])(�ζ(k) − E[�ζ(k)])T} which isgiven as

Σζ = ε2(1 + γ2)σ2

[A2 0n×n

0n×n 0n×n

]. (33)

Given these definitions, we next note Lemma 5.

Lemma 5. For the SO-DCTS algorithm in (28), the covariancematrix of the error vector �e(k) is given as

Σe(k) = Pk�e(0)�e(0)T(PT)k +k−1∑

l=0

PlQΣζQ(PT)l, (k ≥ 1),

(34)

and the second central moment of the error vector �e(k) is givenas

κe(k) = �e(0)T(PT)kPk�e(0)+tr

(Qk−1∑

l=0

(PT)lPlQΣζ

), (k ≥ 1).

(35)

The proof of this lemma is similar to [19] and thus omit-ted from the paper.


5.2. Asymptotic Expectation of Global Synchronization Error.Using Lemma 4, we see that the steady state of expectation ofthe error vector �e(k) is

limk→∞

�e(k) = (1− γ)ε(I2n − P

)−1Q�u1. (36)

The above equation holds because limk→∞Pk = limk→∞(Hk

− J) = 0. Before we investigate the convergence propertyof SO-DCTS algorithm with Gaussian delay, we give thefollowing lemma for block matrix inversion.

Lemma 6. Consider n× n matrices A1, A2, A3, and A4, whenA4 and C = A1 − A2A

−14 A3 are nonsingular, then [16]

[A1 A2

A3 A4

]−1

=[

C−1 −C−1A2A−14

−A−14 A3C−1 A−1

4 +A−14 A3C−1A2A

−14

].

(37)

Based on this lemma, the steady state of error vector �e(k)is

limk→∞

�e(k) = (1− γ)ε

[W1 γεW1L

GW1 In + γεGW1L

][G�u0

]

= (1− γ)ε

[W1G�uW1G�u

],

(38)

where G = In − K and W1 = [(1− γ)εL + K]−1. Theabove equation is valid because KW1 = K , which impliesKW1G= 0n×n, which in turn implies GW1G = W1G.Specifically, we see that the eigenvalues ofW1 are λ1(W1) = 1and λi(W1) = 1/[(1−γ)ελi(L)], i = 2, . . . ,n.Additionally, the

steady state of the disagreement vector �δ(k) is upper half ofthe vector limk→∞�e(k), that is,

�μ(∞) � limk→∞

�δ(k) = (1− γ)εW1G�u. (39)

For this �μ(∞), we can show the following theorem.

Theorem 4. In an undirected network with fixed connectedtopology, �μ(∞) in (39) is a constant vector independent of theconstant values of ε and γ.

Proof. Let us denote the eigenvectors of W1 as wi. It is easyto check that the eigenvector corresponding to λ1(W1) = 1 isw1 = 1. �μ(∞) in (39) can thus be rewritten as

�μ(∞) = (1− γ)ε11TG�u + (1− γ)ε

[ n∑

i=2

λi(W1

)wiw

Ti

]G�u

= (L + K)−1G�u.(40)

Therefore, �μ(∞) does not depend on ε and γ. This completesthe proof.

Thus, for constants ε and γ, the steady state of theexpectation of the disagreement vector is a constant vector

regardless of ε and γ. In other words, in an undirectednetwork with fixed topology, the expectation of globalsynchronization error is the same regardless of the speedof synchronization. We observed the same phenomena inthe FO-DCTS algorithm with Gaussian delay [19]. Letus now define the asymptotic expectation of pair-wisesynchronization error as

Δti, j = limk→∞

E[ti(k)− t j(k)

] = μi(∞)− μj(∞), ∀i, j ∈ V.

(41)

Hence, the maximum asymptotic expectation of theglobal synchronization error between any two nodes isΔtmax = max{|Δti, j|}. Then, we have the following defini-tion.

Definition 1. A connected network is called “average con-sensus achievable with tolerable synchronization error” ifthe maximum asymptotic expectation of the global timesynchronization error is less than a predefined thresholdΔtTh

when applying the SO-DCTS algorithm in (28), that is, whenΔtmax < ΔtTh.

Similar to [19], we have Definition 2.

Definition 2. A network is called “time delay balancednetwork” if the delay∑

j∈Ni

(Tc+Li j /c

) =∑

m∈Nk

(Tc+Lkm/c

), (i, j) ∈ E , (k,m) ∈ E ,

(42)

or equivalently, Δtmax = 0.

5.3. Asymptotic Mean Square Time Synchronization Error.Using Lemma 5, the steady state of the second centralmoment of the error vector is

κe(∞) � limk→∞

κe(k) = tr(QW2QΣζ

), (43)

where W2 =∑∞

l=0(PT)lPl. Note that W2 satisfies the follow-

ing condition:

I + PTW2P =W2. (44)

Let us denote the covariance matrix and second centralmoment of the disagreement vector as Σδ(k) and κδ(k),respectively. We see that

tr(Σe(k)

) = tr(Σδ(k)

)+ tr

(Σδ(k − 1)

). (45)

Therefore, as k→∞, the steady state of second centralmoment of disagreement vector is

κδ(∞) � limk→∞

κδ(k) = κe(∞)2

= tr(QW2QΣζ

)

2. (46)

We now define the asymptotic mean square time syn-chronization error as

σ2Δt = lim

k→∞

n∑

i=1

E[|ti(k)−m(k)|2], (47)


which indicates the amount of error by which the updatedtime at each node differs from the average value over all nnodes. We see that

σ2Δt = �u

TQ(L + K)−2Q�u +

tr(QW2QΣζ

)

2. (48)


In the following simulation results, we assume that the initialtime offset of node i is (i − 1/2)T/n, i = 1, . . . ,n, where T =1000 microseconds unless otherwise stated (trends similarto the ones noted below were observed when initial timeoffsets between nodes were arbitrary (e.g., when they wereuniformly distributed over [0,T]). We use this fixed offsetassumption here for comparison purposes).

6.1. Structured Networks. In our simulations, we examine theconvergence performance of the FO-DCTS and SO-DCTSalgorithms for several structured, undirected networks.Specifically, we study the following network topologies.

Definition 3. “A Ring Network with Equal Distance (Rn)”: Aring network is a network that consists of a single cycle. Thering network with equal distance is a ring network that hasn nodes, n edges, and Lc = Li j = Lkm for (i, j) ∈ E and(k,m) ∈ E .

Definition 4. “A Path Network with Equal Distance (Pn)”: Apath network is a network that consists of edge set {(i, i +1), 1 ≤ i < n}. The path network with equal distance is a pathnetwork that has n nodes, n− 1 edges and Lc = Li j = Lkm for(i, j) ∈ E and (k,m) ∈ E .

Definition 5. “A Star Network with Equal Distance (Sn)”: Astar network is a network that consists of edge set{(i,n), 1 ≤i < n}. The star network with equal distance is a star networkthat has n nodes, n−1 edges, and Lc = Li j = Lkm for (i, j) ∈E and (k,m) ∈ E .

Figure 4 shows examples of these networks: a ringnetwork R8, a path network P5, and a star network S8. Basedon Definition 2, we see that Rn is a “time delay balancednetwork” and Δtmax = 0. We now explore the convergenceproperties of the SO-DCTS algorithm for these structurednetworks via simulation.

Optimal Convergence Rate. First we compare the conver-gence speeds of the SO-DCTS and FO-DCTS algorithmsfor the above structured networks assuming that the con-vergence rate is defined as ν = − log(α), and there is noGaussian delay between nodes. Table 1 gives the numericalvalues of the optimal convergence rate for the SO-DCTS andFO-DCTS algorithms under the R16, P16, and S16 topologies.As expected, the SO-DCTS algorithm converges faster thanthe FO-DCTS algorithm in all three cases. Specifically, we seethat the optimal convergence rate of the SO-DCTS algorithmis nearly twice as that of the FO-DCTS algorithm for all threetypes of networks.

Table 1: Numerical results comparing convergence rates of FO-DCTS and SO-DCTS algorithms in R16, P16, S16.

R16 P16 S16

αopt νopt αopt νopt αopt νopt

FO-DCTS Alg. 0.9267 0.0762 0.9808 0.0194 0.8824 0.1252

SO-DCTS Alg. 0.8634 0.1469 0.9623 0.0384 0.7895 0.2364

Table 2: Asymptotic results for the SO-DCTS algorithm in struc-tured networks with Gaussian delay.

R16 P16 S16

Δtmax (μs) 0 35 8.75

σ2Δt 305.8075 13329 84.2996

Convergence Properties of SO-DCTS Algorithm with GaussianDelay. In our simulations of the SO-DCTS algorithm withGaussian delay, we assume Tc + Lc /c = 10 microseconds andthe optimal values of εopt,SO and γopt,SO from (24). The simu-lation results and the asymptotic mean square time synchro-nization errors for the R16, P16, and S16 networks are shownin Figure 5. For each network topology, the asymptotic meansquare time synchronization error σ2

Δt is calculated from (48).It can be seen that as time index increases, mean squaretime synchronization error approaches the steady-state valuewhen utilizing SO-DCTS algorithm with Gaussian delay.Additionally, we see that the SO-DCTS algorithm performspoorest in a path network where it has the largest value ofσ2Δt and the slowest convergence speed. This is not surprising

since in such networks information flow from node 1 to noden requires n− 1 hops.

Table 2 summarizes the asymptotic results of the SO-DCTS algorithm for structured networks. As expected, themaximum asymptotic expectation of global time synchro-nization error for Rn is 0 since Rn is a time delay balancednetwork. Furthermore, the SO-DCTS algorithm in Pn hasthe largest Δtmax because of its highly unbalanced timedelay structure. It is worth mentioning that the SO-DCTSalgorithm in star networks Sn has relatively small valuesof Δtmax and σ2

Δt. In fact, the SO-DCTS algorithm for astar network can be seen as a type of centralized timesynchronization algorithm in which a root node determinesand propagates the average of local time information of allother nodes in the network.

In Figure 6, we show the asymptotic value of σ2Δt as a

function of the number of nodes in these structured net-works. It can be seen that when using the optimal εopt,SO andγopt,SO, the asymptotic mean square time synchronizationerror for a star network is nearly constant as the number ofnodes increases. However, σ2

Δt is an increasing function of thenumber of nodes for both path and ring networks.

6.2. Random Networks. We also present here simulationresults for a random network comprised of n nodes thatwere randomly generated with uniform distribution over aunit square kilometer; two nodes were assumed connectedif the distance between them was less than η, a predefinedthreshold. One realization of such a network with 16 nodes


(a) (b) (c)

Figure 4: Structured networks: (a) R8, (b) P5, (c) S8.

101

102

103

104

105

106

107

σ2 Δt

0 50 100 150

Iteration time index

Steady state: R16

Simulation: R16

Steady state: P16

Simulation: P16

Steady state: S16

Simulation: S16

Figure 5: σ2Δt as a function of the iteration time index for the

SO-DCTS algorithm in structured networks (R16, P16, S16) withGaussian delay.

is shown in Figure 7. We assume that the average distancebetween two nodes is 0.5 km.

Figure 8 shows the simulation results for the convergencerates of the FO-DCTS and SO-DCTS algorithms in randomnetworks with 256 nodes when η = 0.25 without Gaussiandelay between network nodes. Specifically, we plot the mean

square time synchronization error (defined as (1/n)‖�δ(k)‖2).In simulating random networks, we average results over5000 network realizations. To obtain these results, we choseεopt,FO for the FO-DCTS algorithm and εopt,SO and γopt,SO

for the SO-DCTS algorithm. In Figure 8, we observe thatthe optimal convergence rate of the SO-DCTS algorithm isfaster than that of the FO-DCTS algorithm. In addition to theresults shown here, we ran this simulation setup for variousrealizations of random networks, assuming both n = 256 anda smaller network with n = 16. Overall, the results show asimilar trend, that is, the convergence rate of the SO-DCTSalgorithm exceeds the FO-DCTS algorithm.

Figure 9 shows the simulation results when the SO-DCTS algorithm is implemented in a random network

101

102

103

104

105

106

107

108

σ2 Δt

10 15 20 25 30 35 40 45 50 55 60

Number of nodes

Steady state: ring networkSteady state: path networkSteady state: star network

Figure 6: σ2Δt as a function of the number of nodes for the SO-DCTS

algorithm in structured networks with Gaussian delay.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

Figure 7: Random network with 16 nodes used to obtain simula-tion results in Figure 9.

of Figure 7 assuming Gaussian delay between networknodes. As expected, we see here that an asymptotic global


10−25

10−20

10−15

10−10

10−5

100

105

Mea

nsq

uar

eer

ror

0 50 100 150 200 250 300


FO-DCTS algorithmSO-DCTS algorithm

Figure 8: Evolutions of the FO-DCTS and SO-DTS algorithms inrandom network with 256 nodes when η = 0.25 without Gaussiandelay between network nodes.

−600

−400

−200

0

200

400

600

E[δi(k)

]

0 5 10 15 20 25 30


Δtmax

Figure 9: Evolution of the average disagreement of the SO-DCTSalgorithm in random network (see Figure 7) with Gaussian delaybetween network nodes.

synchronization error persists between some pairs of nodes,that is, Δtmax = 26.4130 microseconds for this randomnetwork. If we specify a threshold ΔtTh to be greater than orequal to this Δtmax, we call this network as “average consensusachievable with tolerable synchronization error” as describedin Definition 1.

7. Conclusions

In this paper, we propose a novel discrete-time SO-DCTSalgorithm to address the global timing synchronizationproblem in wireless sensor networks. We analyze several

important convergence characteristics of the SO-DCTSalgorithm for directed and undirected networks. Addition-ally, we investigate the convergence region and optimalconvergence rate of the SO-DCTS algorithm in undirectednetworks and claim that the optimal convergence rate ofthe SO-DCTS algorithm is superior to that of the FO-DCTS algorithm under an appropriate algorithm design.Furthermore, we investigate the asymptotic expectationand mean square synchronization error of the SO-DCTSalgorithm when there is Gaussian delay between networknodes. In the future, we intend to investigate the effectsof skew, link failure, and other practical conditions whenutilizing the SO-DCTS algorithm in wireless sensor net-works.

Appendices

A. Convergence Region for SO-DCTS Algorithmin Undirected Networks

Let us denote the pairs of eigenvalues of H corresponding toλi(L) as λi′(H) and λi′′(H), that is,

λi′(H) = 12

[1− ελi(L) +

√(1− ελi(L)

)2+ 4γελi(L)

],

λi′′(H) = 12

[1− ελi(L)−

√(1− ελi(L)

)2+ 4γελi(L)

].

(A.1)

Now, we examine the convergence region for the SO-DCTS algorithm based on conditions |λi′(H)| < 1, 1 < i′ ≤n, and |λi′′(H)| < 1, 1 < i′′ ≤ n.Case 1. When λi′(H) and λi′′(H) are real values: in this case,we have (1− ελi(L))2 + 4γελi(L) ≥ 0, that is,

γ ≥ −[1− ελi(L)

]2

4ελi(L), 1 < i ≤ n. (A.2)

In the following, we assume that 1 < i, i′, i′′ ≤ n unlessotherwise stated.

(1) First, we consider the convergence region for λi′(H).After some manipulations, we can show that theconvergence region is

{γ < 1, 0 < ε <

3λi(L)

}

∪{

2− ελi(L)ελi(L)

< γ < 1, ε >3

λi(L)

}.

(A.3)

(2) Then, we consider the convergence region for|λi′′(H)| < 1 which is given as

{γ <

2− ελi(L)ελi(L)

, 0 < ε <3

λi(L)

}. (A.4)


Combining the convergence region for λi′(H) and λi′′(H)with (A.2), the convergence region R1 for this case is

R1 ={−[1− ελi(L)

]2

4ελi(L)≤ γ < 1, 0 < ε <

1λi(L)

}

∪{−[1−ελi(L)

]2

4ελi(L)≤γ< 2−ελi(L)

ελi(L),

1λi(L)

≤ε< 3λi(L)

}

(A.5)

Case 2. When λi′(H) and λi′′(H) are complex values: In thiscase, we have (1− ελi(L))2 + 4γελi(L) < 0, that is,

γ < −[1− ελi(L)

]2

4ελi(L). (A.6)

Here, R{λi′(H)}=R{λi′′(H)} and I{λi′(H)}=−I{λi′′(H)}.Thus, we only need to examine the convergence region for|λi′(H)|. In order to satisfy the conditions, we have

(1) the real part of λi′(H) should be less than 1, that is,|R{λi′(H)}| < 1, then we have

0 < ε <3

λi(L); (A.7)

(2) the imaginary part of λi′(H) should be less than 1,that is, |I{λi′(H)}| < 1, then we have

−4 +[1− ελi(L)

]2

4ελi(L)< γ < −

[1− ελi(L)

]2

4ελi(L); (A.8)

(3) the absolute value of λi′(H) should be less than 1, thatis, R2{λi′(H)} + I2{λi′(H)} < 1, then we have

γ > − 1ελi(L)

. (A.9)

Combining the above results, the convergence region R2 forthis case is

R2 ={− 1ελi(L)

< γ < −[1− ελi(L)

]2

4ελi(L), 0 < ε <

3λi(L)

}.

(A.10)

By taking the union of R1 in (A.5) and R2 in (A.10) andconsidering the increasing order of λi(L), the convergenceregion for the SO-DCTS algorithm in (20) is obtained.

B. Solution for Minimization Problem

Here, we give a sketch solution to the spectral radiusminimization problem in (23). Since λ2(L) ≤ · · · ≤ λn(L),the optimization problem is equivalent to minimize

max{|λ2′(H)|, |λ2′′(H)|, |λn′(H)|, |λn′′(H)|}. (B.1)

(1) First, we find the optimal γ given ε to minimize (B.1).Here, we consider four different cases dependingon whether λ2′(H), λ2′′(H), λn′(H), λn′′(H) are realvalues or complex values. After algebraic derivations,we can show that the minimum of (B.1) givenε can be achieved when λ2′(H) and λ2′′(H) arereal values and λn′(H) and λn′′(H) are complexvalues. Additionally, the following equation shouldbe satisfied:

|λ2′(H)| = |λn′(H)| = |λn′′(H)|. (B.2)

Thus, we have

γ = −λn(L)[1− ελ2(L)

]2

ε[λ2(L) + λn(L)

]2 . (B.3)

(2) Next, we find the optimal ε given γ to minimize (B.1).Again, this can be achieved by taking I{λn′(H)} = 0.Then, we have the following relationship between εand γ:

γ = −[1− ελn(L)

]2

4ελn(L). (B.4)

Combining (B.3) with (B.4), we get (24).

Acknowledgment

This work was supported in part by research grants fromThales Communications, Inc., Md, USA, and the NationalScience Foundation.

References

[1] D. Culler, D. Estrin, and M. Srivastava, “Overview of sensornetworks,” Computer, vol. 37, no. 8, pp. 41–49, 2004.

[2] S. Ganeriwal, R. Kumar, and M. B. Srivastava, “Timing-sync protocol for sensor networks,” in Proceedings of the1st International Conference on Embedded Networked SensorSystems (SenSys ’03), pp. 138–149, Los Angeles, Calif, USA,November 2003.

[3] J. Elson, L. Girod, and D. Estrin, “Fine-grained network timesynchronization using reference broadcasts,” in Proceedingsof the 5th Symposium on Operating Systems Design andImplementation (OSDI ’02), pp. 147–163, Boston, Mass, USA,December 2002.

[4] M. Maroti, B. Kusy, G. Simon, and A. Ledeczi, “The floodingtime synchronization protocol,” in Proceedings of the 2nd Inter-national Conference on Embedded Networked Sensor Systems(SenSys ’04), pp. 39–49, Baltimore, Md, USA, November 2004.

[5] F. Sivrikaya and B. Yener, “Time synchronization in sensornetworks: a survey,” IEEE Network, vol. 18, no. 4, pp. 45–50,2004.

[6] A. Giridhar and P. R. Kumar, “Distributed clock synchro-nization over wireless networks: algorithms and analysis,”in Proceedings of the 45th IEEE Conference on Decision andControl (CDC ’06), pp. 4915–4920, San Diego, Calif, USA,December 2006.

[7] R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus andcooperation in networked multi-agent systems,” Proceedings ofthe IEEE, vol. 95, no. 1, pp. 215–233, 2007.


[8] L. Schenato and G. Gamba, “A distributed consensus protocolfor clock synchronization in wireless sensor network,” inProceedings of the 46th IEEE Conference on Decision andControl (CDC ’07), pp. 2289–2294, New Orleans, La, USA,December 2007.

[9] O. Simeone and U. Spagnolini, “Distributed time synchroniza-tion in wireless sensor networks with coupled discrete-timeoscillators,” EURASIP Journal on Wireless Communicationsand Networking, vol. 2007, Article ID 57054, 13 pages, 2007.

[10] H. G. Tanner, A. Jadbabaie, and G. J. Pappas, “Flocking infixed and switching networks,” IEEE Transactions on AutomaticControl, vol. 52, no. 5, pp. 863–868, 2007.

[11] W. Ren and E. Atkins, “Second-order consensus protocols inmultiple vehicle systems with local interactions,” in Proceed-ings of the AIAA Guidance, Navigation, and Control Conference(GN&C ’05), pp. 3689–3701, San Francisco, Calif, USA,August 2005.

[12] P. Lin, Y. Jia, J. Du, and S. Yuan, “Distributed consensuscontrol for second-order agents with fixed topology and time-delay,” in Proceedings of the 26th Chinese Control Conference(CCC ’07), pp. 577–581, Zhangjiajie, China, July-June 2007.

[13] W. Ren, “Second-order consensus algorithm with extensionsto switching topologies and reference models,” in Proceedingsof the American Control Conference (ACC ’07), pp. 1431–1436,New York, NY, USA, July 2007.

[14] R. A. Horn and C. R. Johnson, Matrix Analysis, CambridgeUniversity Press, Cambridge, UK, 1985.

[15] L. Xiao and S. Boyd, “Fast linear iterations for distributedaveraging,” in Proceedings of the 42nd IEEE Conference onDecision and Control, vol. 5, pp. 4997–5002, Maui, Hawaii,USA, December 2003.

[16] C. D. Meyer, Matrix Analysis and Applied Linear Algebra,Society for Industrial and Applied Mathematics, Philadelphia,Pa, USA, 2001.

[17] R. Olfati-Saber and R. M. Murray, “Consensus problems innetworks of agents with switching topology and time-delays,”IEEE Transactions on Automatic Control, vol. 49, no. 9, pp.1520–1533, 2004.

[18] S. Kar and J. M. F. Moura, “Topology for global averageconsensus,” in Proceedings of the 40th Asilomar Conference onSignals, Systems and Computers (ACSSC ’06), pp. 276–280,Pacific Grove, Calif, USA, October-November 2006.

[19] G. Xiong and S. Kishore, “Analysis of distributed consensustime synchronization with Gaussian delay over wireless sensornetworks,” submitted to EURASIP Journal on Wireless Com-munications and Networking.

[20] H. S. Abdel-Ghaffar, “Analysis of synchronization algorithmswith time-out control over networks with exponentiallysymmetric delays,” IEEE Transactions on Communications, vol.50, no. 10, pp. 1652–1661, 2002.

Documents

Synchronization in Wireless Communicationsdownloads.hindawi.com › journals › specialissues › 310602.pdfContents SynchronizationinWirelessCommunications, Heidi Steendam, Mounir