Cooperation Without Synchronization: Practical Cooperative Relaying …xyzhang.ucsd.edu/papers/Zhang_TMC14_DAC.pdf · · 2017-08-26Cooperation Without Synchronization: Practical

1

Cooperation Without Synchronization: PracticalCooperative Relaying for Wireless Networks

Xinyu Zhang and Kang G. Shin

Abstract—Cooperative relay aims to realize the capacity of multi-antenna arrays in a distributed manner. However, the symbol-levelsynchronization requirement among distributed relays limits its use in practice. We propose to circumvent this barrier with a cross-layer protocol called Distributed Asynchronous Cooperation (DAC). With DAC, multiple relays can schedule concurrent transmissionswith packet-level (hence coarse) synchronization. The receiver then extracts multiple versions of each relayed packet via a collision-resolution algorithm, thus realizing the diversity gain of cooperative communication. We demonstrate the feasibility of DAC byprototyping and testing it on the GNURadio/USRP software radio platform. To explore its relevance at the network level, we introducea DAC-based medium access control (MAC) protocol, and a generic approach to integration of the DAC MAC/PHY layer into a typicalrouting algorithm. Considering the use of DAC for multiple network flows, we analyze the fundamental tradeoff between the improvementin diversity gain and the reduction in multiplexing opportunities. DAC is shown to improve the throughput and delay performance oflossy networks with intermediate link quality. Our analytical results have also been confirmed via network-level simulation with ns-2.

Index Terms—Relay network, cooperative communications, asynchronous cooperative relaying, cross-layer design, diversity-multiplexing tradeoff.

F

1 INTRODUCTION

It has been well-understood in information theory thatrelays’ cooperation can improve the rate and reliability ofwireless links [1]. A typical cooperative communicationprotocol allows a relay to overhear the source’s transmis-sion, and then forward the data to the desired receiverin case the direct delivery attempt fails. Such a two-stagecooperative relay protocol establishes a virtual antennaarray among multiple distributed transmitters, each witha single antenna, in order to boost the capacity of the linkfrom the source to the destination.

Orthogonal space-time codes [2], originally designedfor point-to-point MIMO (multiple-input-multiple-output) links, have been proposed to exploit theadditional degrees of freedom (referred to as the diversitygain) offered by relay nodes. Non-orthogonal schemes [3]that allow relays and the source to transmit concurrentlyin the forwarding stage can achieve the same level ofdiversity gain as MIMO. In these information-theoreticapproaches, perfect time synchronization among relaysis assumed. However, unlike point-to-point MIMOlinks, cooperative communication, by its nature, occursasynchronously as there is no global clock shared amongthe relays. The randomness resulting from propagationdelay and higher-layer operations typically generates atime offset/skew in the order of several microsecondsor more [4]. In contrast, the typical symbol durationof wireless communication standards (e.g., 802.11) iswell below 1 µs, and even half a symbol shift in timewill offset the advantage of synchronous cooperativecommunications [4].

• The work reported in this paper was supported in part by the NSF undergrants CNS-1160775, CNS-1317411 and CNS-1318292. Xinyu Zhang iswith the Department of Electrical and Computer Engineering, Universityof Wisconsin-Madison. Kang G. Shin is with the Department of ElectricalEngineering and Computer Science, The University of Michigan. The au-thors’ email addresses are: [email protected], [email protected]

R

DS

(a) Orthogonal

11

2

2

R

DS

1

1

2

(b) DAC, an asynchronous, non-(a) Orthogonal cooperative relaying

(b) DAC, an asynchronous, non-orthogonal relay protocol

Fig. 1. Comparison between traditional relaying and DAC.The shaded tags denote the order of transmission.

In this paper, we overcome the synchronization bar-rier with a cross-layer relay protocol called DistributedAsynchronous Cooperation (DAC). In contrast with or-thogonal relay protocols (Fig. 1(a)), DAC allows twonodes (or the source and one relay) to concurrently for-ward the same packet to the destination (Fig. 1(b)). Evenif one of them fails, the other can still be decoded withoutrequiring additional channel access. Hence, DAC im-proves the link reliability by exploiting spatial diversityvia co-located relays.

Unlike the non-orthogonal relaying in informationtheory [3], DAC only needs to maintain coarse-grainedpacket-level synchronization among relays, which isachieved via MAC-layer sensing and scheduling. Atthe PHY layer, it extracts multiple versions of a packetfrom different relays using a collision resolution algorithm.Specifically, DAC takes advantage of the natural timeoffset among these packet copies to decode clean bits inthe packet. It bootstraps an iterative cancellation proce-dure that recovers the symbol positions where differentbits collide, by re-modeling the known bits and cancelingthem from collided symbols.

To realize the above idea, we design and implementthe DAC PHY-layer on the GNURadio/USRP softwareradio platform [5]. The core components in our designinclude packet-offset identification, channel parameterestimation, and sample-level signal modeling and can-cellation, which are detailed in Sec. 3. Our experimenta-tion on a small relay network testbed shows that DACcan indeed make a diversity gain for typical SNR ranges.

2

To translate this advantage into network performanceenhancement, we design a MAC protocol that extendsthe widely-used CSMA/CA and integrates the DACPHY with it. A key idea in our design is to use cut-through relaying to maintain maximal compatibility withthe 802.11-style mechanism. Specifically, the relays for-ward a packet immediately (without buffering it) uponoverhearing or seeing a retransmission header from theoriginal source node. Hence, the relays make transpar-ent contributions without disrupting the retransmission,carrier sensing and exponential backoff decisions madeby the source.

We also introduce a generic approach that can use theDAC MAC/PHY to improve existing routing protocols.In applying this approach to multiple network flows, weidentify an important tradeoff between the diversity gainprovided by concurrent relays, and the multiplexing lossdue to the expanded interference region. Our analysisshows that DAC improves network throughput whenthe link loss rate is below a certain threshold, which canbe accurately profiled for simplified topologies. There-fore, DAC is best applicable to lossy wireless networks(such as unplanned mesh networks [6]), where it canenhance the network throughput by improving the reli-ability of bottleneck links with a low reception rate.

Due to the inherent limitation of the software radioplatform (i.e., USRP), we cannot directly implement theDAC-based MAC and routing protocols. Therefore, wedevelop an analytical model with closed-form charac-terization of DAC’s achievable bit error rate (BER) andpacket error rate (PER). We modify the ns-2 PHY withthis new packet-reception model, and implement theDAC MAC and routing protocols based on it. Our sim-ulation results show that DAC can significantly improvethe throughput and delay performance of existing loss-aware routing protocols. Hence, DAC has potential fornon-orthogonal cooperation in wireless relay networks.

Our main contributions can be summarized as follows.• Design and implementation of an asynchronous,

non-orthogonal relaying scheme and its evaluationon an actual software ratio platform. We have char-acterized the PHY-layer BER and PER analytically.

• Design of a MAC protocol that exploits distributedasynchronous relays with minimal signalling over-head and maximal transparency to the 802.11 MAC.

• Proposal of a generic approach that incorporatesthe DAC-based relaying scheme into existing rout-ing protocols. Based on an asymptotic analysis intractable network models, we profile the sufficientcondition when DAC improves the performance ofexisting routing protocols.

The rest of this paper is organized as follows. Sec. 2discusses the work related to wireless relay networks.Sec. 3 describes the design and implementation of theDAC PHY. The DAC-based MAC and routing protocolsare presented in Sec. 4 and Sec. 5, respectively. Sec. 6analyzes the BER, PER, and network-level asymptoticperformance for DAC. Further simulation experiments

are presented in Sec. 7 to validate DAC’s performance.Finally, Sec. 8 concludes the paper.

2 RELATED WORK

Cooperative diversity was originally proposed in infor-mation theory to realize the capacity of MIMO systems.The distributed space-time code [2] for two-stage co-operative communications has been explored widely toimprove the performance of relay networks (see [7] fora survey). One important progress is attributed to Azar-ian et al. [3] who showed non-orthogonal cooperationschemes to approximate the performance of centralizedMIMO systems through multiple relays. However, thesecooperative relay protocols assume perfect time synchro-nization among relay nodes. Recently, Wei [8] and Liet al. [9] reduced the synchronization constraint to asub-symbol level, but assumed known and controllabletime offsets between relays. Wang et al. [10] surveyedrecent research along this line of work. Bletsas et al. [11]also studied the PHY-layer realization of beamformingmechanisms that eliminate the need for perfect carrierphase synchronization. However, they still need time syn-chronization, with a precision of orders of magnitudesmaller than symbol period. Such a timing requirementis still quite challenging to meet for distributed trans-mitters, but is not needed in DAC. Zhang et al. [12]demonstrated the feasibility of implementing a coop-erative communication scheme on a USRP testbed, bysynchronizing multiple nodes with GPS clocks. Space-time coded cooperation for OFDM communication sys-tems was proposed in [13], which leverages the cyclicprefix to tolerate sub-symbol level timing offset betweencooperative relays. Rahul et al. [14] further implementedthe scheme in [13]. With a custom-built FPGA platform,they were able to achieve synchronization accuracy of50ns and demonstrate a substantial diversity gain overnon-cooperative communications. DAC’s diversity gainis incomparable with these synchronized schemes, andit only allows for two concurrent relays. However, tothe best of our knowledge, it is the first non-orthogonalrelaying protocol without any symbol-level timing con-straint.

Recently, the higher-layer implication of cooperativerelaying has been studied. Jakllari et al. [15] directlyapplied the synchronized space-time code to establishvirtual MISO links for routing. Sundaresan et al. [16]showed that a more practical two-phase orthogonal re-laying scheme (Fig. 1(a)), driven by the retransmissiondiversity from relays equipped with smart antennas, canmake a remarkable throughput gain.

An alternative approach to exploiting diversity gain isthe orthogonal opportunistic relaying [4], which selects thebest among all relays that overheard the source’s packet,based on instantaneous channel feedback. In Sec. 4, weshow that DAC can complement opportunistic relaying.By allowing two relays, it provides redundancy acrossindependently-faded packets, thus improving the linkreliability.

3

The feasibility of allowing concurrent transmissionshas also been explored in distributed beamforming [17].Beamforming protocols synchronize the relays, such thattheir signals can combine coherently at the receiver.However, they require strict frequency, phase, and timesynchronization at the symbol level, among distributedtransmitters, which remains an open problem [17], dueto the limited time resolution at wireless nodes, andthe variation of wireless channels. In [18], transmitbeamforming is directly tested on 802.11 wireless cards.However, the overlapping signals are not guaranteedto be combined coherently due to the lack of explicitsynchronization. Therefore, both gains and losses areobserved when compared with single-packet transmis-sions.

The advent of high-performance software radios hasprompted signal-processing-based solutions to overcomethe deficiency of the CSMA/CA-based 802.11 MAC. Forexample, the ZigZag protocol [19] overcomes the hiddenterminal problem in WLANs by identifying repeatedcollisions of two hidden transmitters. It then treats eachcollided packet as a sum over two packets. The twooriginal packets are recovered from two known sums,similarly to solving a linear system of equations. DAC’scollision resolution PHY is similar to ZigZag, but aimsto resolve packets from a single collision with sample-level estimation and cancellation. DAC aims to improve theend-to-end throughput and delay performance of lossywireless mesh networks, where it exploits cooperativediversity, based on cut-through relaying and optimalrelay selection.

3 COLLISION RESOLUTION PHY IN DACThe core component of DAC PHY lies in the signalprocessing module at the receiver, which can decodetwo overlapping packets carrying the same data. Herewe focus on the design and implementation of thiscustomized receiver module.

3.1 An Overview of Iterative Collision ResolutionSuppose in the second stage of the basic DAC relayingscheme (Fig. 1(b)), the relay and the source transmitthe same packet towards the destination. Due to therandomness introduced by the transmitters’ higher-layeroperations, the probability that the two versions of thepacket being time-aligned perfectly is very low. Thereceiver identifies the natural offset between these twopacket copies by detecting a preamble attached in theirheaders. It first decodes the clean symbols in the offsetregion, and then iteratively subtracts decoded symbolsfrom the collided ones, thereby obtaining the desiredsymbols.

For instance, in Fig. 2, two packets (named head packetP1 and tail packet P2, respectively, according to theirarrival order) overlap at the receiver. We first decode theclean symbols A and B in P1. Symbol C is corruptedas it collides with A′ in P2, resulting in a combinedsymbol S. To recover C, we note that symbols A′ and A

A

A'

B

B'

C

C'

S=A' + C

head packet P1

tail packet P2

D E

D' E' Y' Z'

Y Z

S=A' + C

Fig. 2. Iteratively decoding two collided packets carryingthe same information, coming from two relays (or onesource and one relay), respectively.

carry the same bit, but their analog forms are differentdue to the independent channel distortion. Therefore,we need to reconstruct an image of A′ by emulatingthe channel distortion over the corresponding bit thatis already known from A.

After reconstructing the image of A′, we subtract theemulated A′ from S, thus obtaining a decision symbolfor C. Then, we normalize the decision symbol using thechannel estimation for P1, and use a slicer to decide ifthe bit in C is 0 or 1. For BPSK, the slicer outputs 0 ifthe normalized decision symbol has a negative real part,and 1 otherwise. The decoded bit in C is then used toreconstruct C ′ and decode E. This process iterates untilthe end of the packet. Likewise, the iteration for othercollided symbols proceeds.

The above description assumes the head and tail pack-ets are aligned at symbol level. In practice, each symbolcontains multiple samples (e.g., 11 in 802.11b). Since thecorresponding transmitters are asynchronous, there isno guarantee that the head/tail symbols align perfectlywith each other. However, DAC’s iterative decodingalgorithm still works under such misalignment. In effect,the actual decoding and channel estimation algorithmruns at sample-level (Sec. 3.2 and Sec. 3.4), thus a mis-aligned symbol can be reconstructd from samples of twoadjacent symbols that were already decoded.

3.2 DAC Transceiver Design

The transmitter module in DAC (Fig. 3) is similar tolegacy 802.11b, except that it adds a DAC preamble thatassists packet detection. The transmitter maps a digitalbit to a symbol according to a complex constellation(“1” and “0” are mapped to 1 and -1, respectively). Thesymbol then passes through a root raised cosine (RRC)filter, which interpolates the symbol into I samples (weadopt a typical value I = 8) to alleviate inter-symbolinterference. The RRC-shaped symbol is the final outputfrom the transmitter.

The receiver module in DAC is also illustrated inFig. 3. In the normal case of decoding a single headpacket, the receiver acts like a typical 802.11b receiver.Upon detecting a tail packet immersed in a head packet,the receiver identifies the exact start of the tail packet,rolls back to its first symbol, and starts the iterativecancellation algorithm. The receiver needs to replay thebit-to-samples transformation at the transmitter, as wellas the channel distortion, when reconstructing a symbolin the tail packet. The channel distortion, includingamplitude attenuation, phase shift, frequency offset, andtiming offset, must be estimated and updated dynami-

4

Packet bits (0,1) Constellation (-1,1)

RRC filter

USRP TX

RRC channel head pkt

packet 1 pkt

to higher layersUSRP

RXRRC filter

channel estimation

Costas loop

MM circuit

tail pktpacket

detector

remodel tail pkt’s samples

tail pkt channel, freq, timing recovery

iterative cancellation

1 pkt

2 pkts

slicer

bits

layers

Fig. 3. Flow-chart for DAC transmitter (upper) and re-ceiver (lower).

cally, since channel parameters vary during the decodingprocedure, and the estimation error can accumulate,eventually corrupting the entire packet.

The main challenge in implementing DAC lies in iden-tifying the exact offset between the two packets, and re-modeling the symbols in the tail packet based on chan-nel estimation. Unlike interference cancellation [20], wemust deal with the common case where collided packetshave comparable RSS. Otherwise, the weak packet maybe captured and offers no diversity gain. Unlike thesymbol cancellation algorithm in ZigZag [19], the chan-nel parameters must be estimated in a single collision.To obtain accurate estimation and reconstruction of thesymbols, we extensively use sample-level correlation, re-modeling, and cancellation, as discussed next.

3.3 Packet Detection and Offset Estimation

The original 802.11b PHY detects the start of a packetby identifying a sequence of known bits from the sliceroutput. In DAC, we need to detect the presence of one ormore packets before feeding the symbols into the slicer.This is achieved by using a combination of energy andfeature detection.

Energy detection estimates a packet’s arrival by locat-ing a burst in the magnitude and phase of the receivedsymbol. According to our experiment, a data symboltypically has at least 8dB SNR in order to be decodederror-free. Therefore, it is easy to identify the first sym-bol of the head packet. When the tail packet arrivesand overlaps with the head packet, their correspondingcomplex samples add up. The magnitude and phase ofthe resulting symbol thus deviates from the previoussymbols, which are relatively stable. DAC uses thisdeviation as a hint for packet collision.

Energy detection can provide a symbol-level offsetestimation, while DAC necessitates sample-level estima-tion accuracy, since the overlapping symbols do not alignperfectly. In addition, energy detection’s false positiverate increases when ambient noise raises the RSS vari-ation. Therefore, we combine it with feature detectionto reduce false positives. Specifically, we correlate theraw decoded symbols with a 256-bit known preambleto confirm the packet arrival event. We use differentialcorrelation, i.e., correlating the phase difference of adja-cent symbols with the known difference obtained fromthe preamble, in order to cancel out the transmitter–

Samples in preamble Coarse channel estimation

(Amplitude/phase distortion)

Frequency offsetestimation

Timing recovery

Remodeling transmitter RRCSamples in

payload

Refine channelestimation

Fig. 4. Channel-estimation procedure for the tail packetthat is immersed in the signals of the head packet.

receiver frequency offset. The correlator outputs a peakwhenever a packet arrives. The threshold configurationfor peak detection is similar to that in [19]. Note thatthe correlation peak is 256 bits behind the first symbol,and therefore DAC maintains a circular buffer storingthe latest 256 symbols and their samples, and rolls backto the first symbol before cancellation.

The energy and feature detections confirm the packetarrival and indicate the symbol-level offset. The posi-tion of exact sample-level collision is then identifiedby correlating the samples near the beginning of thetail packet with the known samples in the first 16 bitsof the preamble (hence 128 known samples in total).The position where the maximum correlation magni-tude occurs indicates the start of useful samples. Toisolate channel distortion from transceiver distortion, theknown samples are obtained offline from the output ofa transmit filter.

3.4 Channel EstimationWe use the collision-free symbols in the beginning ofthe head packet to estimate its channel. The beginningof the tail packet is immersed in strong noise (i.e., thesignals in the head packet), and hence, a direct esti-mation is greatly biased. To overcome this problem, weobtain coarse estimation of the tail packet by correlatingand canceling the known preamble, and then refiningthe estimation on-the-fly. Fig. 4 illustrates the flow ofoperations for channel estimation and below we describeits components in detail.

3.4.1 Amplitude and phase distortionA coarse estimation of the channel can be obtained viasample-level correlation. Suppose the known samplesare x(t),∀t ∈ [1,Ks] (Ks = 128, as discussed above), thenthe received complex samples after channel distortionshould be: y(t) = Ax(t)ejθ+j2π∆ft + n(t), where n(t) isthe noise process; A and θ are the channel amplitude andphase distortion; ∆f is the frequency offset between thetransmitter and the receiver. After correlation, we getY = A

∑Kst=1[x(t)ejθ+j2π∆ft + n(t)]x(t). The phase error

due to frequency offset is typically on the order of 10−4

rad per sample1, and thus, its accumulating effect overKs samples is negligible. Further, the ambient noise plusthe random samples from the head packet can partlycancel out, resulting in

∑Kst=1 x

2(t) �∑Kst=1 x(t)n(t).

Therefore, we approximate the complex channel distor-tion as Cd = Y (

∑Kst=1 x

2(t))−1.

1. This number was obtained from our experiments with a USRPnode, which has a frequency stability (around 25ppm) comparable toa typical WiFi chipset (e.g., Atheros 802.11 chipset [21])

5

3.4.2 Frequency offset estimationWe use the Costas loop [22] to estimate the residualfrequency error in the received baseband signals, whichis also the frequency offset between the transmitter andthe receiver. The Costas loop calculates the phase changebetween two adjacent symbols, and then updates thefrequency error via first-order differentiation: δf = δf +ω · (p(t + 1) − p(t)), where p(t) is the symbol phase attime t, and ω is an update parameter, typically set onthe order of 10−5.

3.4.3 Timing recoveryIdeally, a receiver should align its sampling time withthe transmitter to achieve maximum SNR. In practice,the sampling time may deviate from the peak position ofthe RRC-shaped sample envelop, reducing the effectiveSNR. A widely-adopted method to correct for samplingoffset is the MM circuit [23], which uses a nonlinear hill-climbing algorithm to tune the received signals, suchthat the sample point is asymptotically aligned with theoptimal sampling time.

Note that the MM circuit works only when adjacentsymbols have a comparable magnitude, which holds forsingle-packet decoding. For DAC, the collided symbolshave large variations since they consist of symbols fromdifferent channels. Hence, we enable the MM circuittiming update only after the symbol cancellation. Wealso need to freeze the MM circuit, i.e., fix its samplingstep, whenever an energy burst is detected, indicatinga potential collision. We re-enable it for each symbol inthe head packet after the corresponding symbol in thetail packet is subtracted.

3.4.4 Transmitter distortionBeside the channel distortion, the transmitter also pre-processes the signals using the RRC filter to combatmulti-path fading. The RRC converts a symbol (1 or -1) into I = 8 samples as:

si(t) = x(t− 1)F (3I

2+ i) + x(t)F (

I

2+ i), i ∈ [0,

I

2)

si(t) = x(t)F (I

2+ i) + x(t+ 1)F (i− I

2), i ∈ [

I

2, I)

where F (i) denotes the i-th filter coefficients. At thereceiver side, this filtering process is replayed for thetail packet, observing that the digital bits x(t) are alreadyknown from prior decoded bits in the head packet.

3.4.5 Correcting channel-estimation errorsRecall the initial correlation only provides coarse esti-mation of the channel gain in the tail packet. During theiterative cancellation procedure, we need to refine the es-timation using a simple feedback algorithm. Specifically,we reconstruct an image of symbols in the head packet,and subtract these symbols to get a refined estimation ofsymbols in the tail packet. We use the difference betweenthis refined estimation and the original reconstructedimage to calculate the channel estimation error, and thenupdate the frequency and time offsets similarly to the

above estimation for the head packet. Observing that thechannel gain remains relatively stable for one packet, weuse a moving average approach to update the channelamplitude and phase distortion for the tail packet.

One observation from our implementation is that thecollision offset identification may also deviate from theexact collision position by one or two samples, especiallywhen SNR is low. We exploit the MM circuit outputto compensate for this error. When the MM circuitoutputs a sampling step larger than I , it indicates thatthe collision position is likely to be larger than initiallyestimated. Our algorithm then increases a credit valueby ∆t (0 < ∆t < 1). When ∆t > 1, we update the packetoffset by 1. A symmetric update procedure is used whenthe sampling step is smaller than I .

3.5 Harvest Diversity with Packet Selection

Beside the iterative decoding in the forward direction,DAC can also work backward, starting from the cleansymbols in the tail packet (symbol Y ′ and Z ′ in Fig. 2),until reaching its beginning, thus obtaining a different es-timation of the packet. Since these two packets arrive atthe receiver via two independent links, even if one failsin decoding, the other may still be correctly decoded.This is the basis of DAC’s diversity gain, and will berigorously evaluated in our analysis and experiments.

Note that the diversity gain comes at the expenseof additional overhead, including the preamble and theextended reception time due to the packets’ offset. How-ever, the preamble length we use is only Kb = 256 bits,and the offset time can be easily confined within theduration of tens of bits, with state-of-the-art softwareradios [24]. In contrast, a typical data payload is around1K bytes. Therefore, the additional overhead of DAC isonly on the order of 1%.

Also, note that the channel estimation, sample re-modeling and cancellation only involves linear-time op-erations. The correlation has Θ(n2) complexity (n isthe correlation length), but is only needed for aroundKb symbols after the energy detection is triggered. Inaddition, the implementation of DAC is built on BPSK.However, the estimation, reconstruction and cancellationfor higher-order modulation schemes, such as M-PSK(M=4, 8, 16, 64), can be realized likewise, except thatthe signal constellation is mapped to different complexvectors [19].

3.6 Discussion

Alternative decoding mechanisms. DAC’s PHY-layermechanism can be considered as a zero-forcing (ZF)interference cancellation, which re-models the decodedsymbols and eliminates their interference to the symbolsto be decoded. DAC’s performance can be enhanced byadvanced signal processing mechanisms like minimummean square error (MMSE) decoding and by integratingwith error control codes. Such an extension, however, isbeyond the scope of this paper.

6

4 CSMA/CR: DAC-BASED MACWe now introduce a MAC protocol that extends the802.11-style CSMA, but adds optimal relay selection andCut-through Relaying to support DAC’s Collision Reso-lution PHY, and hence, it is referred to as CSMA/CR.CSMA/CR exploits the retransmission diversity simi-larly to typical two-stage relay protocols (Fig. 1). It is,however, different from others in that at the secondstage, the relay can transmit immediately after overhear-ing a retransmission indicator and packet header fromthe source. The packets from the source and the relaypartially overlap at the receiver, but the collision can beresolved and exploited by the DAC PHY.

One challenge in realizing CSMA/CR is how to selectthe best relay to maximize the diversity gain. Anotherproblem is compatibility with 802.11—we would like tointroduce the least overhead and modification to 802.11.We propose the following solutions to meet this goal.

4.1 Protocol OperationsThe basic MAC-level operations of DAC is illustratedin Fig. 5. Suppose a direct source-destination link hasalready been established by a routing protocol. Thesource makes a first attempt to transmit the data packet,which can be heard by both the relay and the destination.If the packet reaches the destination, then CSMA/CRproceeds like normal CSMA/CA. If the source receivesno ACK from the destination (i.e., upon failure), it sched-ules a retransmission and sets an indicator bit in theheader of the retransmitted packet. When the packet isretransmitted by the source, the relay will forward thesame packet it overheard, immediately after decodingthe retransmission bit and the packet’s identity (flowid, sequence number, and transmitter id), which areincluded in its header. This cut-through relaying intro-duces an offset between the arrival times of the source’sretransmission and the relay’s forwarding, and providesthe necessary condition for collision resolution at thereceiver.

This asynchrony will make the source sense a busychannel immediately after the source’s retransmission iscompleted. It thereby extends the ACK timeout by theduration between current time and the end of this busyperiod, which is also the offset between the source’sretransmission and the relay’s forwarding. This proce-dure repeats until the source receives an ACK from thedestination. To improve the reliability of ACK, the relayalso schedules a cut-through relaying of the ACK, whenit overhears the header of the ACK packet from thedestination.

One interesting point is that the relay facilitates theretransmission only when it asserts that the source bethe only active transmitter within its sensing range.This decision is made by looking into the NAV field in802.11 MAC, which indicates activities in a neighboringregion, and by looking into the carrier sensing recordright before the source’s retransmission. If the relaysenses a busy channel but cannot decode the identity

of the transmitter, then it remains as a normal 802.11transceiver.

The advantage of CSMA/CR is that the retransmissiondecision is made solely by the source node, and itneed not know whether the relay has overheard thefirst transmission. The relay’s forwarding decision isalso made locally, according to the header it overhearsfrom the source. The idea of allowing relay operationin the middle of the source’s transmission has longbeen adopted in wireline networks [25]. It has not beenadopted in wireless networks, which typically operateson time-orthogonal mode, schedules transmissions on aper-packet basis, and allows only one transmitter withinthe carrier sensing range. However, emerging high per-formance software radios makes it viable in wirelessnetworks. For example, Sora [24] achieves a scheduling-granularity comparable to the high-rate wireless stan-dards (such as 802.11a) via programmable software andreconfigurable hardware.

For radio devices incapable of cut-through relay-ing, we adopt the following scheme built atop the802.11 RTS/CTS mechanism. Before a retransmission, thesource sends an RTS packet, piggy-backing the retrans-mission bit and the packet’s identity information in it.Upon overhearing this RTS and the subsequent CTS,both the source and the relay transmit the data packet. Incurrent wireless transceivers, the decision-making timeis typically on the order of several microseconds [4].This randomness is sufficient to offer an offset of severalbits between the two transmissions, thus allowing forcollision resolution at the PHY. For transceivers withhigher time resolution, randomness can be introducedby allowing the source and the relay to randomly backoff before starting a retransmission.

4.2 Optimal Relay SelectionThe best relay should have high-quality links to boththe source and the destination. Using a simple analysison the three-node relaying network n Fig. 1(b), wecan profile the packet delay as follows. Let psd be thereception probability of link S → D, and qsd = 1− psd (asimilar notation is used for link S → R and R→ D); andZ, D the packet size and data rate, respectively. Then,we have:

Proposition 1 When relay r is selected, the expected per-packet transmission delay is:

Tr =Z

D· 1 + qsdpsr(1− qsdqrd)−1

1− qsrqsdProof: The outcome of S’s first transmission attempt canbe one of three cases.

Case 1. The direct link S → D succeeds, whichhappens with probability psd, and takes Z

D time unitsto finish this transmission.

Case 2. Only S → R succeeds, which happens withprobability (1− psd)psr, and takes Z

D . Both S and R willthen transmit simultaneously to D using the cut-through

7

source

destination ACK

DATA DATA

collision resolution

source

relay DATA

DATA DATA

DATA

sense preamble

ACK

Fig. 5. DAC-based MAC opera-tions.

'iR secondary

relay

primary relay

1+iR1−iRiR

S D

relay

Fig. 6. Improving anexisting routing proto-col using DAC.

relaying. As will be proven in Theorem 2, the PER of thetwo combined links S → D and R→ D equals 1− (1−psd)(1− prd). Therefore, the expected remaining time toreach D is T ′r = 1

1−(1−psd)(1−prd) .Case 3. Neither S → R nor S → D succeeds, which

happens with probability 1−(1−psr)(1−psd), and wastesZD . The transmission then starts again from S, taking Trtime in expectation.

Overall, the expected time for a packet to reach D fromS is:

Tr = psdZ

D+ (1− psd)psr(

Z

D+ T ′r)

+(1− psr)(1− psd)[Tr +Z

D]

from which we get:

Tr =Z

D·

1 + (1− psd)psr 11−(1−psd)(1−prd)

1− (1− psd)(1− psr)

=Z

D· 1 + qsdpsr(1− qsdqrd)−1

1− qsrqsdthus completing the proof. ut

It follows immediately that the optimal relay incurringminimal delay should satisfy: R∗ = arg minr Tr.

To simplify the DAC relaying protocol, we haveadopted the average link reception rate for selecting asingle fixed relay, instead of per-packet SNR feedback[4]. DAC can be extended to complement opportunisticrelaying [4] by dynamically selecting the best relay thatoverheard the source packet. This can be realized byallowing the relay candidates to set a backoff counterthat is inversely proportional to the quality of theirlink to the destination, in a way similar to [4]. Furtherinvestigation of this approach is a matter of our futurework.

DAC’s MAC-level relay selection algorithm can be ex-tended further to the multi-hop case, to enhance existingrouting protocols, as discussed next.

5 INTEGRATING DAC WITH ROUTING PROTO-COLS

A joint design of DAC relay selection and routing canachieve optimal end-to-end delay performance. For sim-plicity and to stress the advantage of non-orthogonalcooperation, however, we restrict our attention to ageneric relay-selection approach that integrates DACwith existing routing protocols, given that the routeshad already been selected. Specifically, we use the ETX

routing [26] as a basis, and show how to improve itsreliability and throughput using the DAC mechanism.

Observing that real-world mesh networks tend to havea majority of links with intermediate quality [6], theETX protocol adopts a loss-aware link metric, whichis the expected number of transmissions needed forsuccessfully delivering a packet on each link. This metricis used to find the shortest path for each data session (asource-destination pair).

Our basic idea is to optimize the ETX route on a per-hop basis. As shown in Fig. 6, suppose a primary path(S · · ·Ri−1 → Ri → Ri+1 · · ·D) consisting of primaryrelays has been established by ETX. For each primaryrelay Ri, we decide whether to add a secondary relay toit, and select the best secondary relay R′i, according tothe potential performance gain in terms of reducing thedelay from the previous hop Ri−1 to the next hop Ri+1.Note that if a node already serves as a secondary relayfor Ri, then it cannot serve for Ri+1. This rule is enforcedduring the relay-selection procedure through signalingbetween relays.

Before analyzing the potential gain, we first introducethe cooperation between the primary and secondaryrelays. Take the scenario in Fig. 6 as an example. In thenormal mode, Ri−1 makes a first attempt to forward apacket to Ri. Upon successful reception, either Ri or R′ior both of them can return an ACK. The DAC collision-resolution PHY ensures no ACK collision happens. Fromthe perspective of Ri−1, it proceeds to the next packet aslong as it can decode an ACK.

If only Ri receives the packet, then it schedules theforwarding following a normal DAC MAC, regardingR′i as the relay. If both of them receive the packet, thenR′i will perform the cut-through relaying immediatelyafter it senses Ri transmitting the packet it overheard. Aprimary relay piggybacks the session ID (represented bythe source-destination of the path), sequence, and senderID in the forwarded packet’s header, so that it can berecognized in time by the secondary relay. An exceptionhappens when only the secondary relay R′i receives thepacket. R′i estimates the occurrence of such an eventvia the absence of Ri’s ACK header that is intended forRi−1. In this case, R′i sends the ACK immediately, andthen temporarily takes the position of Ri, serving as theprimary forwarder, forming a typical 3-node local relaynetwork together with Ri, following the DAC MAC. Thecontrol goes back to the primary relay Ri in the nextsuccessful packet transmission from Ri−1 to Ri.

The above protocol operations allow us to derive amodel for analyzing the expected transmission delay,and selecting the optimal relay that incurs the minimumdelay. Specifically, we model the progress of a packetas a Markov chain, driven by the transmission, coop-eration and forwarding operations among primary andsecondary relays. Following notations similar to those inSec. 4, we have the following proposition.

8

i

'ii

'i

1−i 1+i',1,1 iiii qq −−

1,'1,1 ++− iiii qq

iiii qq ,'1, +

'1,' iiii qq +

1, +iip

1,' +iip

',1,1 iiii qp −−

iiii qp ,1',1 −−

',1,1 iiii pp −−

iiii pq ,'1, +

'1,' iiii pq +

11,'1, ++ iiii qq

Fig. 7. Modeling the packet propagation in the DACprimary-secondary relay-selection algorithm as a Markovchain.

Proposition 2 The expected delay in delivering a packet fromRi−1 to Ri+1 is:

T = (1− qi−1,iqi−1,i′)−1[ZD−1 + pi−1,iqi−1,i′Ti′

+pi−1,ipi−1,i′Ti,i′ + qi−1,ipi−1,i′Ti]

where Ti′ = ZD ·

1+(qi′,i+1pi,i′ )(1−qi,i+1qi′,i+1)−1

1−qi′,i+1qi,i′, Ti,i′ =

ZD(1−qi,i+1qi′,i+1) , Ti = Z

D ·1+(qi,i+1pi′,i)(1−qi,i+1qi′,i+1)−1

1−qi,i+1qi′,i. The

best relay should have minimal delay T ∗ among all secondaryrelay candidates.

Proof: We model the propagation of a data packet as aMarkov chain, as shown in Fig. 7. Each state denotes thecurrent holder of the packet. State i represents the factthat Ri has received the packet but R′i has not. State ii′

denotes the fact that both Ri and R′i have received thepacket and the cut-through relaying starts. The expectedtransmission delay is essentially the first passage timefrom Ri−1 to Ri+1, denoted as Ti−1. Similarly, the ex-pected first passage time from state i, i′, ii′ to i + 1 aredenoted as Ti, Ti′ and Tii′ , respectively. Then, followingsimilar reasoning as in the proof for Proposition 1, wehave:

Ti−1 =Z

D+ pi−1,iqi−1,i′Ti + qi−1,ipi−1,i′Ti′

+pi−1,ipi−1,i′Tii′ + qi−1,iqi−1,i′Ti−1.

When the packet is in state i, it may proceed with threepossible outcomes. First, Ri succeeds in delivering it toRi+1 directly, which happens with probability pi,i+1.

Second, the direct delivery fails, but Ri′ overhears thepacket, and consequently the system evolves to state ii′.This happens with probability qi,i+1.

If neither happens, then the system remains in state iand repeats the above trials.

Therefore, the expected transmission time from Ri toRi+1 is:

Ti =Z

Dpi,i+1 + qi,i+1qi′iTi + qi,i+1pi′iTii′ . (1)

Similarly, for state i′, we have:

Ti′ =Z

Dpi,i+1 + qi′,i+1qii′Ti + qi′,i+1pii′Tii′ . (2)

For state ii′, the expected transmission time is theexpectation of a geometric random variable with mean:

Tii′ = (1− qi,i+1qi′,i+1)−1 (3)

and the joint PER when both Ri → Ri+1 and R′i → Ri+1

transmit concurrently is based on Theorem 2.By solving the above equations, we can obtain a

closed-form expression for Ti−1, thus completing theproof for Proposition 2. ut

In the actual implementation of DAC, a relay R′i isincluded in the candidate set of secondary relays onlyif it has a non-zero reception probability with Ri−1, Riand Ri+1. Further, based on the above proposition, wecan obtain a closed-form expression for the cooperationgain using DAC relaying in terms of throughput im-provement: g∗ = D

Z · (p−1i−1,i + p−1

i,i+1) · T ∗−1. We adopta secondary relay only if the potential gain g∗ is largerthan a threshold TD (set to 1.1 in our design).

To reduce the signaling overhead, we again used themean link loss rate as a metric for selecting a fixed sec-ondary relay, instead of adjusting the selection for eachpacket. As shown in existing measurement and routingdesign [6], [26], the mean link loss rate is relatively stableon an hourly basis, and it can be obtained from thedelivery probability of data packets.

The above scheme based on secondary relay selectioncan be used to improve other routing protocols. Forexample, we can improve the traditional routing protocolbased on orthogonal relaying [16] by adding a secondaryrelay for the existing primary relay. A similar idea canbe applied to assist opportunistic routing [27], in whichtwo forwarders who overheard the same packet can bescheduled concurrently, following a negotiation mecha-nism similar to the one in ExOR [27]. The pros and consof using a DAC-based secondary relay will be discussedfurther in our analysis.

6 ASYMPTOTIC PERFORMANCE ANALYSIS

In this section, we analyze the performance of DAC,from both PHY and network layers’ perspectives.

6.1 BER and PER in DAC’s Collision ResolutionThe iterative collision resolution in DAC can cause errorpropagation, due to the correlation between consecu-tively decoded symbols. For example, in Fig. 2, if symbolA produces an erroneous bit, then the error propagatesto A′, which affects subsequent symbols, such as C. For-tunately, such error propagation stops if the actual bitsof A′ and C are identical. In such a case, after subtractingthe error image of A′, we obtain a strengthened symbolthat indicates the correct bit of C. Error propagation alsostops when symbol C has a much greater strength thanA′. Based on these two observations, we prove:2

2. The proofs for Lemma 1, Lemma 2 and Theorem 1 are similar tothe analysis in our previous work [28], [29], and therefore, their detailsare omitted here.

9

Lemma 1 The probability of error propagation in forward-direction decoding can be characterized as:

Ps ≈ Q(√

2γ1 − 2√

2γ2)

where γi denotes the SNR of packet i. The Q-functionis defined as: Q(y) = (2π)−

12

∫∞ye−

x2

2 dx. A symmetricequation holds for backward-direction decoding.

Based on Lemma 1, we further prove that the probabil-ity of error propagation over i bits decays exponentiallyas i increases, as reflected in the following result.

Lemma 2 Let L and F denote the packet length and packetoffset, respectively, then the probability of steady-state errorlength can be characterized as:

πi = π0PePi−1s ,∀i ∈ (1, G], π0 = (1 + Pe ·

1− PGs1− Ps

)−1

where Pe = Q(√

2γWD−1) is the BER of a non-collidedpacket with SNR γ, data rate D and signal bandwidth W .G = bLF c.

With the above lemmas, we can bound the BER inDAC’s iterative collision-resolution algorithm.

Theorem 1 Let P ′e be the BER in forward-direction decodingin DAC, and Pe be the BER of a single head packet withoutcollision, then Pe ≤ P ′e < 2Pe.

A more relevant metric is the packet error rate (PER),which will be used to characterize the gain of DACover CSMA/CA based non-cooperative schemes. Withrespect to PER, we have:

Theorem 2 Let Ph and Pt, respectively, denote the PERwhen the head and tail packets are decoded without collision,then the overall PER in bi-directional collision resolution isPv = PhPt.

Proof: We start with the forward-direction decoding, i.e.,decoding the head packet by subtracting the tail packetfrom it. We assume no error correction code is used, andtherefore, the packet is corrupted once the first bit erroroccurs.

We represent symbols in the complex number form.Suppose at time t, symbol s̃1(t) = a1e

jθ1x1(t) in P1collides with s̃2(t) = a2e

jθ2x2(t) in P2. Let v denotethe receiver noise, then the received symbol s̃(t) =s̃1(t) + s̃2(t) + v. If we decode P1 first (forward-directiondecoding), then x2(t) = x1(t − F ). In addition, thechannel amplitude a2 and phase θ2 can be estimated viacorrelation, which can achieve high accuracy and intro-duces negligible noise [19]. Therefore, we can obtain adecision symbol for x1(t) as: s̃(t)−s̃2(t) = a1e

jθ1x1(t)+v.The resulting SNR level is: γ1 = |a1ejθ1 |2

2δ2 = P1

N , whichequals the SNR when s1(t) is decoded independently.With γ1, we can get the BER for typical fading patternand noise profile [22]. Suppose the relation between BER

and SNR is Pe = f(γ1), then Ph equals the probabilitythat the first L trials succeed for a geometrical randomvariable with mean Pe, i.e., Ph = 1− (1− Pe)L, which isequivalent to the probability of a bit error event in thehead packet when it is decoded alone.

Similarly, we can decode the tail packet with PER Pt.With a selective combination, the overall PER equals theprobability that both forward and backward decodingfail, which is PhPt. ut

Theorem 2 implies that by allowing two relays to transmitconcurrently, PER can be reduced to the PER product of thetwo independent packets. It seems counter-intuitive thaterror propagation does not affect the PER. The reasonsfor this are twofold. First, since the channel estimationfor the tail packet is based on preamble correlation,the estimation error is negligible compared to the biterrors in the head packet caused by channel distortion.Second, we do not use any error correction code, whichis beneficial for single-packet decoding. A joint design oferror correction and collision resolution may also yieldbetter performance for DAC, and this is left as our futurework.

6.2 Throughput Improvement for Multiple Flows

Although DAC improves link reliability via concurrentcooperative relays, it comes at the cost of reducingthe multiple access opportunities of competing networkflows. This essentially reflects the tradeoff between diver-sity gain and multiplexing gain at the network level, andposes a question: “does DAC increase or decrease thetotal network throughput when multiple flows co-exist?”For multihop networks with cooperative relays, the gen-eral capacity-scaling law is still an open problem, andexisting work has characterized it for special topologieswith a single flow only [30]. Here our analysis focuses oncharacterizing the condition when DAC can outperformnon-cooperative routing protocols without calculatingthe exact capacity bound. We start from a simplifiedgrid topology. Let Φc and Φd denote the achievablenetwork throughput of a CSMA-based routing protocoland the corresponding DAC-enhanced routing protocol,respectively, then:

Theorem 3 In a grid network with homogeneous link-reception probability p, Φd > Φc when p < 0.86. Thethroughput gain Φd

Φcdecreases monotonically with p.

Proof: Consider the grid network shown in Fig. 8. Letthe edge length be e, and assume the transmissionrange equals e and interference range equals 2e. SupposeRi−1 → Ri, Ri → Ri+1 are two consecutive links usedalong the path of a flow, which is selected by a CSMA-based routing protocol. Denote I(R) as the interferenceregion of node i. Following the secondary relay selectionrules in Sec. 5, it is easy to see that the only secondaryrelay available is R′i. Similar to the proof of Proposition 1,we can derive a balance equation for the expected packet

10

iR'iR

1+iR

1L

2L

1−iR

Fig. 8. Grid topology with homogeneous link-receptionprobability.

delay from Ri−1 to Ri+1:

Ti−1 = 2pq · 1

p+

p2

1− q2+ q2Ti−1 + 1

Ti−1 =1 + 2q

1− q2+

p2

(1− q2)2(4)

where q = 1 − p and the packet transmission time isnormalized to 1. In comparison, a CSMA-based routingentails 2

p average delay from Ri−1 to Ri+1.On the other hand, when R′i is used by DAC, it trans-

mits concurrently with Ri, expanding the interferenceregion by I(Ri) − I(Ri) ∩ I(R′i) (the shaded region inFig. 8), compared to a CSMA-based orthogonal schedul-ing protocol. Within the expanded interference region,at most two transmitters can be scheduled (L1 and L2)at the same time without interfering with each other.However, on average, a perfect orthogonal schedulingprotocol allocates one fifth of time to each node in thisgrid (because an optimal schedule achieves the minimalcoloring of the nodes, and the minimal coloring of a gridhas chromatic number 5 [31]). Further, note that the twonodes’ transmissions succeed only with probability p,and the secondary relay of DAC is used with probabilityp2 and over 1

1−q2 transmission attempts towards Ri+1.Therefore, on average, the loss of multiplexing time is25 × p×

p2

1−q2 = 2p3

5(1−q2) .DAC is guaranteed to reduce the network delay if the

diversity gain dominates the multiplexing loss, i.e.,

f(p) =2

p− 1 + 2q

1− q2− p2

(1− q2)2− 2p3

5(1− q2)> 0. (5)

We can numerically solve the equation f(p) = 0 andget its solution within (0, 1), which equals 0.86. By takingthe first-order derivative of f(p), it can be seen thatdf(p)

dp > 0,∀p ∈ (0, 1). Therefore, f(p) is monotonicallydecreasing within (0, 1). Hence, the diversity gain ofDAC always dominates its multiplexing loss when p <0.86, thus completing the proof for Theorem 3. ut

Theorem 3 can be extended to a more general case asfollows.

Corollary 1 In an arbitrary network topology with homoge-neous link-reception probability p, a sufficient condition forΦd > Φc is p < 0.64.

Proof: In an arbitrary topology, DAC selects a secondaryrelay only if it is connected to the primary relay, theprevious hop and the next hop. Therefore, the maximuminterference expansion of DAC is I(Ri)−I(Ri)∩I(R′i) <A(R), where A(R) is the area of an equilateral trian-gle with edge length equal to the interference rangeR. Further, the maximum independent set that can bepacked into I(R′i) is a regular hexagon with edge lengthR. Since I(Ri) − A(R) < I(Ri) ∩ I(R′i), at least twovertices of this hexagon fall in I(Ri) ∩ I(R′i). Therefore,the interference region expanded by the secondary relayaffects at most 4 other vertices. Among the 4 vertices,at most two can transmit concurrently under a CSMAscheduler. Therefore, the average loss of multiplexingtime is 2p × p2

1−q2 = 2p3

(1−q2) , and a sufficient conditionfor DAC to have performance gain is its diversity gaindominates multiplexing loss, i.e.,

f(p) =2

p− 1 + 2q

1− q2− p2

(1− q2)2− 2p3

(1− q2)> 0 (6)

which yields p < 0.64 in (0, 1). ut

These analytical results imply that DAC is guaranteedto improve throughput only when the average linkquality is sufficiently low. Note that real-world mesh net-works tend to have a majority of links with intermediatequality [26] because of channel attenuation, and becauseoptimal rate-adaptation schemes may prefer high data-rate links with low quality to low data-rate links withfull reception rate [6].

In a single 802.11-based wireless LAN, at any time,at most one transmitter can be active. Hence, the DACrelaying scheme achieves a diversity gain without reduc-ing the channel access opportunity of any transmitter,and it has a higher throughput than CSMA, as long asthe links have non-zero loss rates.

7 EXPERIMENTAL EVALUATION

We now evaluate and demonstrate the feasibility andperformance of DAC. We have built a small softwareradio network to validate the DAC collision-resolutionPHY. Based on insights gained from our experimentaland analytical results, we implemented the DAC-basedMAC and routing protocols in the ns-2 simulator, andevaluated its effectiveness in a large network.

7.1 DAC Collision-Resolution PHY

We design and prototype the DAC PHY based on theGNURadio/USRP platform [5]. USRP is a software radiotransceiver that converts digital symbols into analogwaves centered around a carrier frequency within theISM band. It can also receive analog signals via its RFfront-end, and down-convert them into the baseband.

11

The baseband digitized raw signals are sent to a general-purpose computer running the Python/C++ based DACPHY modules built atop the GNURadio library.

Our ideal experiments would operate on a software-radio network running DAC-based MAC and PHY.However, the USRP does not yet support MAC opera-tions requiring instantaneous response (e.g., ACK, carriersensing and cut-through relaying), because of the ineffi-cient user-mode signal processing modules and its highcommunication latency with the computer. Therefore,we focus on the core components of the DAC PHYlayer, i.e., the collision-resolution modules. Our testbedenvironment consists of three USRP nodes, which is usedto mimic the typical relay network in Fig. 1(b). Thecenter frequency of USRPs is set to 2.4145GHz, located inbetween the 802.11 channel 1 and 2. The USRP transmit-ter’s sampling rate is 128 MSamples/s and interpolationrate 32. With BPSK, each digit is mapped to one symbol,and each symbol consists of 8 samples after the RRC.Hence, the effective data rate is 128

32×8 = 0.5Mbps. Eachpacket has a 256B payload, which takes the same channeltime as a 1KB packet in an actual 2Mbps-mode 802.11bnetwork.

We use two USRPs as the source and the relay, andallow them to send packets with the same payload. Inthe common case of DAC relaying, the links of thesetwo concurrent transmitters have a comparable strength.Otherwise, the PER reduction is negligible according toTheorem 2 and we can just select a single best relay. Inaddition, the link with much higher SNR may capturethe other, and collision resolution is no longer needed.Therefore, we make coarse adjustments on the SNR be-tween each relay and the shared receiver by varying thetransmit power and link distance, so that the differencein mean SNR falls below 1dB.

The received SNR is measured using the clean partsof the known preamble in the packet. Suppose the re-ceived symbol is y(t), the data bit, noise magnitude, andchannel gain are x(t), n(t) and A, i.e., y(t) = Ax(t)+n(t).Then, we calculate the noise variance for those symbolsrepresenting the bit 1: δ2

1 = var(y2(t)). Similarly, we cancalculate the noise variance for symbols representingbit -1, denoted as δ−1, and the actual noise varianceδ = δ1+δ−1

2 is also the noise power, assuming the actualsignals are mixed with AWGN after going through allthe decoding modules [22]. A similar method is used toestimate the signal power, where the mean magnitudeof bit 1 symbols is calculated by averaging over symbolscontaining bit 1.

We evaluate PER when DAC PHY is used to resolvetwo overlapping packets, and compare it with the de-coding probability of a single non-collided packet. Dueto channel variations, the SNR value cannot be preciselycontrolled. We thus log the decoded packets, group themaccording to the received SNR, and calculate the meanpacket error rate (PER) for packets falling in the sameSNR range (in 1dB unit). The resulting SNR-PER relationis plotted in Fig. 9, where each vertical bar represents

SNR range (dB)

PE

R

(5, 6) (6, 7) (7, 8) (8, 9) (9, 10) (10,11)(11,12)(12,13)(13,14)(14,15)0

0.2

0.4

0.6

0.8

1Collision resolutionSingle−packet decoding

Fig. 9. Comparison between DAC collision resolution andsingle-packet decoding without collision.

104 packets collected over four different time periods.We observed a transition of PER from 1 to 0 when SNRbecomes larger than 8dB. DAC PHY achieves similarPER to the single-packet decoding, verifying our claim“single-direction collision resolution does not increase PER,compared to single-packet decoding, and thus bi-directionalcollision resolution achieves the PER product of the headand tail packets” (Sec. 6.1). Our analysis is based on aGaussian channel model, but the result is consistent withthe testbed experiments in an office environment withrich multipath fading. This is because the RRC filterspartly cancel out the inter-symbol interference, renderingthe noise approximately Gaussian.

7.2 Performance of DAC-Enhanced Routing

7.2.1 Experimental setup

In our asymptotic analysis, we made simplifications,such as fixed transmission range and homogeneous lossprobability, for tractability. To evaluate more realistic sce-narios, we have implemented the DAC-enhanced rout-ing protocol (Sec. 5) in ns-2. The primary path discoveryis the same as the ETX routing (which is built atopexisting ad-hoc routing protocols) [26]. The secondaryrelay-selection algorithm runs on each primary relay,which measures the quality of adjacent links, and ex-changes link-quality information for those links connect-ing secondary relay candidates and their previous andnext hops. The underlying CSMA/CR protocol is imple-mented by modifying the 802.11b MAC in ns-2. We addthe DAC header and preamble to each packet, modifythe carrier sensing and transmission timeout, so as tosupport the direct cut-through relaying, as discussed inSec. 4 and Sec. 5. As for the PHY-layer simulation, wereplace the ns-2 PHY packet-reception model with theanalytical model for DAC PHY in Theorem 2, which hasbeen verified with our experiments.

As another benchmark, we have also implementedan ideal version of opportunistic routing protocol [27],which will be referred to as OR-Oracle. In OR-Oracle,similar to DAC, a primary path is chosen, and twoor more secondary relays are then chosen as poten-tial forwarders. Unlike DAC, when multiple potentialforwarders receive a packet, only the one with short-est distance (in terms of ETX metric) to destinationwill be chosen. Otherwise, if a single relay receivesthe packet, the packet will be forwarded directly. OR-

12

Fig. 10. Distribution of link quality in randomly-generated(simulation) and Roofnet (real) topologies.

Oracle assumes the existence of an oracle that can co-ordinate both relays without incurring any overhead.The coordination in practical opportunistic routing [27]needs to be realized by exchanging messages, whichtends to require a significant channel time, especially inlossy networks. The major difference between OR-Oracleand DAC is that the former only allows two relays toforward packets concurrently, thus improving transmitdiversity. However, this also implies OR-Oracle results inless interference (higher spatial reuse) between differentflows.

Ideally, we would like to use real mesh link-qualitytraces, such as those from Roofnet [6], as a baseline sim-ulation scenario. However, such traces do not specify theinterference relation among nodes. So, we randomly gen-erate a mesh topology with 50 nodes uniformly deployedin a 1km×1km area. We use the log-normal shadow-ing model with pass-loss exponent 4.0 and shadowingdeviation 5.0dB (in consistence with the measurementin [32]). The transmit power and reception thresholdsare configured such that the reception probability is 0.1at 250m. Our topology generator repeatedly producesrandom topologies until we obtain one in which allnodes are connected, and the node degrees range be-tween 2 and 8. Fig. 10 compares a randomly-generatedtopology with the traces (the trace ID is 0606062400) fromRoofnet [33]. The simulation topology has average link-quality 0.51 and median 0.47, which is roughly consistentwith the measurements from Roofnet. Note that we havepruned the links with the reception probability below0.1. This simplification accounts for the deviation of thelink-quality distribution from a real topology. However,since the majority of links have intermediate quality, therelaying protocols would favor routing along long buthigh-quality paths over short but ultra-low quality links.For instance, in ETX, even a four-hop route results inhigher throughput than a single link with PDR 0.1.

7.2.2 Single-unicast scenarioWe evaluate the performance of DAC in comparisonwith the original ETX routing for two scenarios: singleand multiple unicasts. In the first case, a pair of source-destination nodes are randomly selected to start an end-to-end data session. Since there are no other competingflows, we are interested in the average end-to-end packetdelay and reliability (indicated by packet-delivery ratio(PDR)) for each session. This set of experiments revealsthe performance gain of DAC in an unsaturated net-

Fig. 11. Distribution of de-lay for single-unicast ses-sions.

Fig. 12. Distribution ofpacket-delivery ratio (PDR)for single-unicast sessions.

work. We evaluate these two metrics over 100 sessions,with packet size 1KB and source rate 0.2Mbps.

The CDF plotted in Fig. 11 shows that DAC reducesend-to-end delay for all the sessions. The average delayreduction is 27.3%. The delay performance of OR-Oracleexhibits a large variance, mainly because the relays arechosen online and may deviate a lot from the shortestpath.

Fig. 12 shows the distribution of PDR. With DAC,the PDR for a majority of sessions is increased to morethan 90%. It outperforms OR-Oracle for most sessions,because it boosts the reception rate of low-quality linkswith concurrent transmissions from secondary relays.

We also evaluate the saturated throughput of DAC byincreasing the source rate such that the source node’stransmit queue remains backlogged. We use throughputgain, defined as the end-to-end throughput of DAC (OR-Oracle) divided by that of ETX. The throughput gaindistribution for 100 random sessions is plotted in Fig. 13.DAC is shown to be able to achieve a 3x throughputimprovement over ETX, with an average throughputgain 1.73. Although OR-Oracle has a comparable delayas ETX (shown in Fig. 11), it still achieves a higherthroughput (average gain is 1.42), because of the domi-nantly high PDR. Note that the absolute throughput gainof OR-Oracle is less than ExOR [27], mainly becauseExOR targets batch transmissions, whereas OR-Oracletransmits each packet individually.

In a saturated network, throughput is dictated by thebottleneck link, i.e., the link with the lowest quality alongthe selected path. Hence, DAC is most effective for pathscontaining low-quality links. This can be seen from thescatter plot in Fig. 14. Obviously, DAC achieves a higherthroughput gain for those sessions where ETX has abelow-average throughput. These sessions tend to havelinks with high loss rate along their paths.

7.2.3 Multiple unicast sessionsWe now consider DAC’s performance in the presence ofmultiple competing flows, where the fundamental trade-off between diversity and multiplexing gain becomesan important factor in determining the total networkthroughput, as discussed in Sec. 6. We evaluate thenetwork throughput as a function of the traffic load.Specifically, we fix the source rate at 10Kbps and increasethe total number of sessions. As illustrated in Fig. 15,the total network throughput increases with the number

13

Fig. 13. Throughput gain of DAC and OR-Oracle overETX.

3 2 3 4 5 6 7 81

1.5

2

2.5

3

(b) ETX throughput (KB/s)

DA

C th

roug

hput

gai

n

Fig. 14. Throughput gain of DAC over ETX: each pointcorresponding to one session.

of sessions when traffic load is low. In such cases,DAC can make a 2x improvement over ETX routing.As the network becomes congested, the non-orthogonalcooperation may degrade the channel access time ofother concurrent sessions, thus making its advantage lessobvious. Since OR-Oracle only employs one relay at eachhop, it achieves better spatial reuse and may outperformDAC when the network is congested.

While DAC’s higher throughput comes from the di-versity gain, we need to ensure this advantage does notreduce the fairness among sessions. To evaluate fairness,we use the Jain’s fairness index [34] as a metric. Afairness level of 1 indicates all sessions have the samethroughput, whereas a close-to-0 fairness indicates thatsome sessions achieve a higher throughput by starvingothers. One can see from Fig. 16 that DAC alwaysmaintains a much higher level of fairness than ETX andOR-Oracle. This is because it only relieves the bottlenecklinks on low-throughput paths (which is reflected inthe threshold TD in designing DAC routing). Overall,compared to ETX, both the throughput and fairness areimproved by exchanging the multiplexing opportunityof high-throughput sessions for the diversity gain inlow-throughput sessions.3 Note that the fairness of OR-Oracle is even lower than ETX because it tends to over-load the high-quality links, resulting in local congestionand poor throughput performance, especially for thosesessions whose critical link is congested.

In Sec. 6.2 we used a simple model to prove thatDAC improves the throughput of multiple flows onlywhen the network is lossy. To make this analysis moreconcrete, we generate a mesh topology with a majorityof high-quality links (the average link quality is 0.826).Figs. 17 and 18 show the resulting network throughput

3. Note that the traffic load higher than 40 sessions is less relevantsince the fairness level is low, and most sessions are starved.

Fig. 15. Total networkthroughput vs. traffic load.

Fig. 16. Fairness vs. trafficload.

Fig. 17. Throughputvs. traffic load in a networkwith a high reception rate.

Fig. 18. Fairness vs. traf-fic load in a network with ahigh reception rate.

and fairness. Although DAC still maintains a higherlevel of fairness, much less throughput gain is achieved.This is because ETX tends to select high-quality linksif available, which are abundant in such a topology.For DAC, the opportunity of exploiting the diversitygain is scarce. Combined with the previous experimentalresults, this indicates the generality of the analysis inTheorem 3, i.e., as a non-orthogonal relaying scheme,DAC guarantees a throughput gain for networks withintermediate link quality, such as the unplanned meshnetwork Roofnet [6].

8 CONCLUSION

In this paper, we have introduced DAC, a practicalnon-orthogonal approach to cooperative relaying with-out the symbol-level synchronization constraint. Thekey idea behind DAC is that two partially-overlappingpackets carrying the same information from differentrelays can be decoded independently by using an it-erative collision-resolution algorithm at the PHY layer.We provide theoretical and experimental results thatdemonstrate the decoding probability of DAC PHY andverify the potential gain of using non-orthogonal relays.We also design a simple MAC protocol to exploit thebenefit of DAC decoding, and a generic approach that in-corporates DAC relaying into existing routing protocols.Using network-level simulations with ns-2, we show thatDAC can improve the network performance in termsof throughput, delay and fairness, especially for lossywireless mesh networks. As non-orthogonal relayinghas fundamental advantage over traditional orthogonalrelays [3], DAC is shown to be an effective step towardsexploiting the potential of non-orthogonal cooperativecommunications.

REFERENCES[1] D. Tse and P. Viswanath, Fundamentals of Wireless Communication.

Cambridge University Press, 2005.

14

[2] J. N. Laneman and G. W. Wornell, “Distributed Space-TimeCoded Protocols for Exploiting Cooperative Diversity in WirelessNetworks,” IEEE Trans. on Information Theory, vol. 49, no. 10, 2003.

[3] K. Azarian, H. E. Gamal, and P. Schniter, “On the Achiev-able Diversity-Multiplexing Tradeoff in Half-duplex CooperativeChannels,” IEEE Trans. on Information Theory, vol. 51, no. 12, 2005.

[4] A. Bletsas, A. Khisti, S. Member, and D. P. Reed, “A SimpleCooperative Diversity Method Based on Network Path Selection,”IEEE Journal on Selected Areas in Communications, vol. 24, no. 3,2006.

[5] The GNU Software Radio, http://gnuradio.org/trac/wiki.[6] J. Bicket, D. Aguayo, S. Biswas, and R. Morris, “Architecture and

Evaluation of an Unplanned 802.11b Mesh Network,” in Proc. ofACM MobiCom, 2005.

[7] G. Kramer, I. Maric, and R. D. Yates, “Cooperative Communica-tions,” Foundations and Trends in Networking, vol. 1, no. 3, 2006.

[8] S. Wei, “Diversity Multiplexing Tradeoff of Asynchronous Coop-erative Diversity in Wireless Networks,” IEEE Trans. on Informa-tion Theory, vol. 53, no. 11, 2007.

[9] Y. Li and X. Xia, “A Family of Distributed Space-Time TrellisCodes with Asynchronous Cooperative Diversity,” in Proc. of IEEEIPSN, 2005.

[10] H.-M. Wang and X.-G. Xia, “Asynchronous Cooperative Com-munication Systems: A Survey on Signal Designs,” Science ChinaInformation Sciences, vol. 54, no. 8, 2011.

[11] A. Bletsas, A. Lippman, and J. N. Sahalos, “Simple, Zero-Feedback, Distributed Beamforming with Unsynchronized Car-riers,” IEEE J.Sel. A. Commun., vol. 28, no. 7, 2010.

[12] J. Zhang, J. Jia, Q. Zhang, and E. M. K. Lo, “Implementation andEvaluation of Cooperative Communication Schemes in Software-Defined Radio Testbed,” in Proc. of IEEE INFOCOM, 2010.

[13] Z. Li and X.-G. Xia, “A Simple Alamouti Space-Time TransmissionScheme for Asynchronous Cooperative Systems,” IEEE SignalProcessing Letters, vol. 14, no. 11, 2007.

[14] H. Rahul, H. Hassanieh, and D. Katabi, “SourceSync: A Dis-tributed Wireless Architecture for Exploiting Sender Diversity,”in Proc. of ACM SIGCOMM, 2010.

[15] G. Jakllari, S. Krishnamurthy, M. Faloutsos, P. Krishnamurthy,and O. Ercetin, “A Framework for Distributed Spatio-TemporalCommunications in Mobile Ad Hoc Networks,” in Proc. of IEEEINFOCOM, 2006.

[16] K. Sundaresan and R. Sivakumar, “Cooperating with Smartness:Using Heterogeneous Smart Antennas in Ad-Hoc Networks,” inProc. of IEEE INFOCOM, 2007.

[17] R. Mudumbai, D. Brown, U. Madhow, and H. Poor, “DistributedTransmit Beamforming: Challenges and Recent Progress,” IEEECommunications Magazine, vol. 47, no. 2, 2009.

[18] M. Kurth, A. Zubow, and J. P. Redlich, “Cooperative Oppor-tunistic Routing Using Transmit Diversity in Wireless Mesh Net-works,” in Proc. of IEEE INFOCOM, 2008.

[19] S. Gollakota and D. Katabi, “ZigZag Decoding: Combating Hid-den Terminals in Wireless Networks,” in Proc. of ACM SIGCOMM,2008.

[20] D. Halperin, T. Anderson, and D. Wetherall, “Taking the Sting Outof Carrier Sense: Interference Cancellation for Wireless LANs,” inProc. of ACM MobiCom, 2008.

[21] Atheros Communications., “AR5213 Preliminary Datasheet,”2004.

[22] B. Sklar, Digital Communications: Fundamentals and Applications.Prentice Hall, 2001.

[23] G. R. Danesfahani and T. G. Jeans, “Optimisation of ModifiedMueller and Muller Algorithm,” IEEE Electronics Letters, vol. 31,no. 13, 1995.

[24] K. Tan, J. Zhang, J. Fang, H. Liu, Y. Ye, S. Wang, Y. Zhang, H. Wu,W. Wang, and G. M. Voelker, “Sora: High Performance SoftwareRadio Using General Purpose Multi-core Processors,” in Proc. ofNSDI, 2009.

[25] M. Gerla, P. Palnati, and S. Walton, “Multicasting Protocols forHigh-Speed, Wormhole-Routing Local Area Networks,” in Proc. ofACM SIGCOMM, 1996.

[26] D. Couto, D. Aguayo, J. Bicket, and R. Morris, “A high-throughput path metric for multi-hop wireless routing,” in Proc. ofACM MobiCom, 2003.

[27] S. Biswas and R. Morris, “ExOR: Opportunistic Multi-hop Routingfor Wireless Networks,” in Proc. of ACM SIGCOMM, 2005.

[28] X. Zhang and K. G. Shin, “Chorus: Collision Resolution forEfficient Wireless Broadcast,” in Proc. of IEEE INFOCOM, 2010.

[29] ——, “Delay-Optimal Broadcast for Multihop Wireless NetworksUsing Self-Interference Cancellation.” IEEE Transactions on MobileComputing, vol. 12, no. 1, 2013.

[30] K. Sreeram, S. Birenjith, and P. V. Kumar, “DMT of multi-hopcooperative networks-Part II: Layered and multi-antenna net-works,” in Proc. of IEEE ISIT, 2008.

[31] J. Bondy and U. Murthy, Graph Theory With Applications. Elsevier,1976.

[32] J. Camp, J. Robinson, C. Steger, and E. Knightly, “MeasurementDriven Deployment of a Two-tier Urban Mesh Access Network,”in Proc. of ACM MobiSys, 2006.

[33] “Roofnet traces for SIGCOMM’04,” available at:http://pdos.csail.mit.edu/roofnet.

[34] R. Jain, D. Chiu, and W. Hawe, “A Quantitative Measure ofFairness and Discrimination For Resource Allocation in SharedComputer Systems,” DEC Research, Tech. Rep. TR-301, September1984.

Xinyu Zhang is an Assistant Professor in theDepartment of Electrical and Computer Engi-neering, University of Wisconsin-Madison. Hereceived the B.E. degree in 2005 from HarbinInstitute of Technology, China, the M.S. degreein 2007 from the University of Toronto, Canada,and the Ph.D. degree in 2012 from the Univer-sity of Michigan-Ann Arbor. His research inter-est lies in designing and implementing proto-cols that improve the capacity, interoperabilityand energy efficiency of wireless networks. He

is a recipient of ACM MobiCom Best Paper Award in 2011 and NSFCAREER Award in 2014.

Kang G. Shin is the Kevin & Nancy O’ConnorProfessor of Computer Science in the Depart-ment of Electrical Engineering and ComputerScience, The University of Michigan, Ann Ar-bor. His current research focuses on computingsystems and networks as well as on embeddedreal-time and cyber-physical systems, all withemphasis on timeliness, security, and depend-ability. He has supervised the completion of 70PhDs, and authored/coauthored more than 770technical articles (more than 270 of these are in

archival journals), one a textbook and more than 20 patents or inven-tion disclosures, and received numerous best paper awards, includingthe Best Paper Awards from the 2011 ACM International Conferenceon Mobile Computing and Networking (MobiCom2011), the 2011 IEEEInternational Conference on Autonomic Computing, the 2010 and 2000USENIX Annual Technical Conferences, as well as the 2003 IEEECommunications Society William R. Bennett Prize Paper Award andthe 1987 Outstanding IEEE Transactions of Automatic Control PaperAward. He has also received several institutional awards, includingthe Research Excellence Award in 1989, Outstanding AchievementAward in 1999, Distinguished Faculty Achievement Award in 2001,and Stephen Attwood Award in 2004 from The University of Michigan(the highest honor bestowed to Michigan Engineering faculty); aDistinguished Alumni Award of the College of Engineering, SeoulNational University in 2002; 2003 IEEE RTC Technical AchievementAward; and 2006 Ho-Am Prize in Engineering (the highest honorbestowed to Korean-origin engineers). He has chaired several majorconferences, including 2009 ACM MobiCom, 2008 IEEE SECON, 2005ACM/USENIX MobiSys, 2000 IEEE RTAS, and 1987 IEEE RTSS. Heis the fellow of both IEEE and ACM, and served on editorial boards,including IEEE TPDS and ACM Transactions on Embedded Systems.He has also served or is serving on numerous government committees,such as the US NSF Cyber-Physical Systems Executive Committee andthe Korean Government R&D Strategy Advisory Committee. He hasalso co-founded a couple of startups.

Documents

Cooperation Without Synchronization: Practical Cooperative Relaying …xyzhang.ucsd.edu/papers/Zhang_TMC14_DAC.pdf · · 2017-08-26Cooperation Without Synchronization: Practical