14
1270 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 21, NO. 4, AUGUST 2013 Distributed Data Collection in Large-Scale Asynchronous Wireless Sensor Networks Under the Generalized Physical Interference Model Shouling Ji, Student Member, IEEE, ACM, and Zhipeng Cai, Member, IEEE Abstract—Wireless sensor networks (WSNs) are more likely to be distributed asynchronous systems. In this paper, we investigate the achievable data collection capacity of realistic distributed asynchronous WSNs. Our main contributions include ve as- pects. First, to avoid data transmission interference, we derive an -proper carrier-sensing range under the general- ized physical interference model, where is the satised threshold of data receiving rate. Taking as its carrier-sensing range, any sensor node can initiate a data transmission with a guaranteed data receiving rate. Second, based on , we propose a Distributed Data Collection (DDC) algorithm with fairness consideration. Theoretical analysis of DDC surprisingly shows that its achievable network capacity is order-optimal and independent of network size. Thus, DDC is scalable. Third, we dis- cuss how to apply to the distributed data aggregation problem and propose a Distributed Data Aggregation (DDA) algo- rithm. The delay performance of DDA is also analyzed. Fourth, to be more general, we study the delay and capacity of DDC and DDA under the Poisson node distribution model. The analysis demonstrates that DDC is also scalable and order-optimal under the Poisson distribution model. Finally, we conduct extensive simulations to validate the performance of DDC and DDA. Index Terms—Capacity analysis, delay analysis, distributed data aggregation, distributed data collection, wireless sensor networks (WSNs). I. INTRODUCTION O NE OF the most important functions provided by wire- less sensor networks (WSNs) is directly gathering data from the physical world. Generally, data gathering can be cat- egorized as data collection [7], [9], [10], which gathers all the data from a network without any data aggregation or merging, and data aggregation [1]–[5], which obtains some aggregation values, e.g., MAX, MIN, SUM, etc. To evaluate network per- formance, network capacity, which reects the data transmis- sion/collection/broadcast rate, is usually used, e.g., multicast ca- pacity [26], unicast capacity [33], [34], broadcast capacity [44], and data collection capacity [7], [9], [10]. For data collection capacity, it is dened as the average data receiving rate at the Manuscript received January 22, 2012; revised August 17, 2012; ac- cepted September 18, 2012; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor G. Xue. Date of publication October 25, 2012; date of current version August 14, 2013. This work was supported in part by the NSF under Grant No. CNS-1152001. The authors are with the Department of Computer Science, Georgia State University, Atlanta, GA 30303 USA (e-mail: [email protected]; [email protected]). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TNET.2012.2221165 sink, i.e., data collection capacity reects how fast data been collected by the sink. Without confusion, we use data collection capacity and network capacity interchangeably throughout the following of this paper. Following the seminal work [6] by Gupta and Kumar, many works emerged to study the network capacity issue under various network scenarios, e.g., multicast, unicast, broadcast, and data collection/aggregation. However, to our knowledge, most of the existing works studied the network capacity issue under an ideal assumption that the network time is slotted, and the entire network is strictly synchronized explicitly or implicitly, i.e., they are mainly for centralized synchronous wireless networks. Under the above ideal assumption, many centralized algorithms with nice network capacity bounds are designed and analyzed for various communication modes (e.g., multicast, unicast, broadcast, and data collection/aggregation). In the sense of providing theoretical frameworks/bounds for the design of communication protocols, these works are still sound. However, in practice, wireless networks, especially WSNs, are more likely to be distributed systems. Furthermore, for WSNs, it is difcult and not realistic to achieve ideal strict time synchronization due to the unstable deployment environments, clock drift, and other technical limits. Therefore, to comprehen- sively and profoundly understand the performance of practical WSNs, it is important to investigate the achievable network capacity of distributed asynchronous WSNs. Particularly, we study the achievable data collection capacity for distributed asynchronous WSNs in this paper. Different from the study in centralized synchronous WSNs, there are many new challenges arising when investigating the data collection capacity issue for distributed asynchronous WSNs. We summarize the main challenges as follows. C1: Unlike that in centralized synchronous WSNs, where we can acquire the overall information of a network and further make an optimized decision for data transmissions, we can only schedule data transmissions according to local information in distributed asynchronous WSNs. Due to this reason, it is very difcult to nd an optimal schedule. Therefore, how to design an effective distributed algorithm for data collection is a challenge. C2: Since we cannot maintain a uniform time clock for all the sensor nodes in distributed asynchronous WSNs, every node carries out data transmissions based on its own time clock and local information. Intuitively, this kind of communication mode leads to many data collisions and re- transmissions, incurring capacity degradation, unfairness 1063-6692 © 2012 IEEE

06340366

Embed Size (px)

DESCRIPTION

ieee papers for students

Citation preview

  • 1270 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 21, NO. 4, AUGUST 2013

    Distributed Data Collection in Large-ScaleAsynchronous Wireless Sensor Networks Under the

    Generalized Physical Interference ModelShouling Ji, Student Member, IEEE, ACM, and Zhipeng Cai, Member, IEEE

    AbstractWireless sensor networks (WSNs) are more likely tobe distributed asynchronous systems. In this paper, we investigatethe achievable data collection capacity of realistic distributedasynchronous WSNs. Our main contributions include five as-pects. First, to avoid data transmission interference, we derive an-proper carrier-sensing range under the general-

    ized physical interference model, where is the satisfied thresholdof data receiving rate. Taking as its carrier-sensingrange, any sensor node can initiate a data transmission with aguaranteed data receiving rate. Second, based on ,we propose a Distributed Data Collection (DDC) algorithm withfairness consideration. Theoretical analysis of DDC surprisinglyshows that its achievable network capacity is order-optimal andindependent of network size. Thus, DDC is scalable. Third, we dis-cuss how to apply to the distributed data aggregationproblem and propose a Distributed Data Aggregation (DDA) algo-rithm. The delay performance of DDA is also analyzed. Fourth,to be more general, we study the delay and capacity of DDC andDDA under the Poisson node distribution model. The analysisdemonstrates that DDC is also scalable and order-optimal underthe Poisson distribution model. Finally, we conduct extensivesimulations to validate the performance of DDC and DDA.

    Index TermsCapacity analysis, delay analysis, distributed dataaggregation, distributed data collection, wireless sensor networks(WSNs).

    I. INTRODUCTION

    O NE OF the most important functions provided by wire-less sensor networks (WSNs) is directly gathering datafrom the physical world. Generally, data gathering can be cat-egorized as data collection [7], [9], [10], which gathers all thedata from a network without any data aggregation or merging,and data aggregation [1][5], which obtains some aggregationvalues, e.g., MAX, MIN, SUM, etc. To evaluate network per-formance, network capacity, which reflects the data transmis-sion/collection/broadcast rate, is usually used, e.g.,multicast ca-pacity [26], unicast capacity [33], [34], broadcast capacity [44],and data collection capacity [7], [9], [10]. For data collectioncapacity, it is defined as the average data receiving rate at the

    Manuscript received January 22, 2012; revised August 17, 2012; ac-cepted September 18, 2012; approved by IEEE/ACM TRANSACTIONS ONNETWORKING Editor G. Xue. Date of publication October 25, 2012; date ofcurrent version August 14, 2013. This work was supported in part by the NSFunder Grant No. CNS-1152001.The authors are with the Department of Computer Science, Georgia State

    University, Atlanta, GA 30303USA (e-mail: [email protected]; [email protected]).Color versions of one or more of the figures in this paper are available online

    at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TNET.2012.2221165

    sink, i.e., data collection capacity reflects how fast data beencollected by the sink. Without confusion, we use data collectioncapacity and network capacity interchangeably throughout thefollowing of this paper.Following the seminal work [6] by Gupta and Kumar, many

    works emerged to study the network capacity issue undervarious network scenarios, e.g., multicast, unicast, broadcast,and data collection/aggregation. However, to our knowledge,most of the existing works studied the network capacity issueunder an ideal assumption that the network time is slotted,and the entire network is strictly synchronized explicitly orimplicitly, i.e., they are mainly for centralized synchronouswireless networks. Under the above ideal assumption, manycentralized algorithms with nice network capacity bounds aredesigned and analyzed for various communication modes (e.g.,multicast, unicast, broadcast, and data collection/aggregation).In the sense of providing theoretical frameworks/bounds for thedesign of communication protocols, these works are still sound.However, in practice, wireless networks, especially WSNs,are more likely to be distributed systems. Furthermore, forWSNs, it is difficult and not realistic to achieve ideal strict timesynchronization due to the unstable deployment environments,clock drift, and other technical limits. Therefore, to comprehen-sively and profoundly understand the performance of practicalWSNs, it is important to investigate the achievable networkcapacity of distributed asynchronous WSNs. Particularly, westudy the achievable data collection capacity for distributedasynchronous WSNs in this paper.Different from the study in centralized synchronous WSNs,

    there are many new challenges arising when investigating thedata collection capacity issue for distributed asynchronousWSNs. We summarize the main challenges as follows. C1: Unlike that in centralized synchronous WSNs, wherewe can acquire the overall information of a network andfurther make an optimized decision for data transmissions,we can only schedule data transmissions according to localinformation in distributed asynchronous WSNs. Due tothis reason, it is very difficult to find an optimal schedule.Therefore, how to design an effective distributed algorithmfor data collection is a challenge.

    C2: Since we cannot maintain a uniform time clock forall the sensor nodes in distributed asynchronous WSNs,every node carries out data transmissions based on its owntime clock and local information. Intuitively, this kind ofcommunication mode leads to many data collisions and re-transmissions, incurring capacity degradation, unfairness

    1063-6692 2012 IEEE

  • JI AND CAI: DISTRIBUTED DATA COLLECTION IN LARGE-SCALE ASYNCHRONOUS WSNs 1271

    among data flows, etc. Thus, how to avoid the disadvan-tages introduced by an asynchronous time scheme is a pri-mary concern when designing distributed data collectionalgorithms.

    C3: Following challenges C1 and C2, the third challengeis how to theoretically analyze the achievable network ca-pacity bounds for a data collection algorithm in distributedasynchronous WSNs. Since the data collection algorithmworks in a distributed manner, it is difficult, sometimeseven impossible, to know the exact time when a data trans-mission occurs, as well as the time duration of a data trans-mission. Hence, both elegant analysis techniques and acarefully designed data transmission mechanism are im-portant to obtain the achievable data collection capacity.

    To address these challenges, we propose a scalable and order-optimal distributed algorithm, named Distributed Data Collec-tion (DDC), with fairness consideration and capacity analysisunder the generalized physical interference model. To the bestof our knowledge, this is the first attempt to provide detailed pro-tocol design and rigorous capacity analysis for data collection indistributed asynchronous WSNs. DDC works in a CSMA-likemanner, except for the RTS/CTS communication mode and thenecessity to reply an ACK packet after receiving a data packet.In DDC,when a sensor node has some data packets for transmis-sion, it sets up a backoff timer, and senses the wireless channelwith a predefined carrier-sensing range (CR). If the channel isfree when the backoff timer expires, this node conducts a datatransmission. Under this transmission manner, DDC gathers allof the data in a network to the sink (i.e., base station). Moreover,we extend our data collection method to the case of data gath-ering with aggregation, and propose a Distributed Data Aggre-gation (DDA) algorithm. We summarize the main contributionsof this paper as follows. The carrier-sensing range is an important parameter inDDS, which has a significant impact on the performance ofdata collection. To avoid data transmission collisions/in-terference, especially the collisions/interference causedby the hidden-node problems, we derive an -propercarrier-sensing range under the general-ized physical interference model for the nodes in a datacollection WSN, where is the satisfied threshold ofdata receiving rate. By taking as its CR, anynode can initiate a data transmission with guaranteed datareceiving rate as long as there is no ongoing transmissionswithin its CR.

    Based on the obtained , we propose a scalableand order-optimal DDC algorithm with fairness considera-tion for asynchronous WSNs. DDC works in a CSMA-likemanner and effectively gathers all the data to the sink. The-oretical analysis of DDC surprisingly shows that its asymp-totic achievable network capacity is

    , where is a constantvalue depending on , and is the bandwidth of a wire-less communication channel. Since the upper-bound ca-pacity of data collection is [9], [10], this impliesthe achievable data collection capacity of DDC is order-op-timal. Furthermore, since is independent of network size,DDC is scalable.

    For completeness, a DDA algorithm for asynchronousWSNs is designed. We show that the number oftime-slots induced by DDA is upper-bounded by

    , where is the numberof the sensor nodes in a WSN, is the height of the dataaggregation tree, and is a constant value depending on

    . To be more general, we further study the delay and ca-pacity of DDC and DDA under the Poisson node distribu-tion model. By analysis, we demonstrate that DDC is againscalable and order-optimal, and DDA has a delay perfor-mance upper-bounded by ,whereand are constant values.

    We also conduct extensive simulations to validate theperformance of DDC/DDA in distributed asynchronousWSNs. The simulation results indicate that DDC/DDA canachieve comparable data collection capacity as the latestcentralized and synchronized data collection algorithm.

    The rest of this paper is organized as follows. In Section II,we summarize the related work and remark the differences be-tween our work and the existing works. In Section III, the con-sidered network model is discussed. In Section IV, the propercarrier-sensing range satisfying a predefined data receiving ratefor communication is derived. According to the obtained propercarrier-sensing range, a distributed asynchronous data collec-tion algorithm is proposed in Section V, followed by the theo-retical analysis, which demonstrates that the proposed algorithmcan achieve order-optimal data collection capacity as central-ized and synchronized algorithms. Furthermore, how to applythe derived proper carrier-sensing range to data aggregation isdiscussed in Section VI. To be more general, we study the delayand capacity of DDC and DDA under the Poisson distributionmodel in Section VII. In Section VIII, we validate the perfor-mance and scalability of DDC andDDA by simulations. Finally,this paper is concluded and some possible future research direc-tions are pointed out in Section IX.

    II. RELATED WORKSFollowing the seminal work [6], extensive works emerged to

    study the network capacity issue for different kinds of wirelessnetworks. In this section, we summarize the existing works ac-cording to the communication mode.

    A. Data Collection CapacityData collection capacity is studied in [7][16] for central-

    ized synchronous wireless networks. In [7], the authors pro-posed a load-balanced data gathering algorithm. Consideringdata compression, the authors showed that the network capacitycan be improved. In [8], the authors considered theminimum-la-tency data gathering problem and proposed a family of pathscheduling algorithms. The authors in [9][11] extended thework in [8] and derived tighter upper and lower data collectionbounds. Additionally, the authors of [10] and [11] investigatedthe achievable capacity bound of continuous data collection inWSNs.In [12], the authors studied data collection capacity of

    centralized synchronous WSNs based on a grid partition

  • 1272 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 21, NO. 4, AUGUST 2013

    method. They also obtained the achievable data collectioncapacity under the protocol interference model. In [13], theauthors derived the data collection capacity of WSNs underthe physical interference model. Similarly, the authors in [14]and [15] studied the continuous data collection capacity underthe physical interference model. By partitioning the networkinto square cells and interference zones, the nodes withoutinterference can conduct data transmissions concurrently. Theworst-case data collection capacity is studied in [16]. In [17],the authors studied the achievable data collection capacity ofprobabilistic WSNs. In that work, the impact of lossy linkson the degradation of data collection capacity is analyzed andderived. In another work [18], the distributed data collectionissue in cognitive radio networks (CRNs) is studied. Theauthors designed an asynchronous distributed data collectionalgorithm, which minimizes the data collection delay andmeanwhile considers the data transmission fairness. In [19], theauthors investigated the achievable data aggregation capacityfor centralized synchronous WSNs in the extended networkcase.

    B. Multicast CapacityIn [26][28], the multicast capacity for centralized syn-

    chronous wireless networks is studied. The multicast capacityfor wireless ad hoc networks under the protocol interferencemodel is investigated in [26]. In [26], the authors showed thatthe network multicast capacity is when

    and is when , whereis the bandwidth of a wireless channel, is the number

    of the nodes in a network, and is the number of the nodesinvolved in one multicast session. In [27], the authors inves-tigated the optimal multicast capacity and delay tradeoffs inmobile ad hoc networks from a global perspective. The generalmulticast capacity scaling law is studied and summarized underthe generalized physical model in a recent work [28].

    C. Unicast/Broadcast Capacity1) Unicast/Broadcast Capacity for Random Wireless

    Networks: The achievable capacity of multiple unicasts incentralized synchronous wireless networks is investigatedin [29][43]. In [29], the impact of the number of channels,the number of interfaces, and the interface switching delay onthe capacity of centralized synchronous wireless networks isinvestigated. In [34], the authors studied the balanced unicastand multicast capacity of a wireless network consisting ofrandomly placed nodes, and obtained the characterization

    of the scaling of the -dimensional balanced unicast and-dimensional balanced multicast capacity regions under the

    Gaussian fading channel model.In [38] and [39], the authors studied connectivity and ca-

    pacity of multichannel wireless networks. They considereda multichannel wireless network with some constraints onchannel switching, proposed some routing and channel as-signment strategies for multiple unicast communications andderived the per-flow capacity. In [40], the authors first proposeda multichannel network architecture, called MC-MDA, whereeach node is equipped with multiple directional antennas, andthen obtained the capacity of multiple unicast communications.

    Similar to [40], the authors in [41] studied the local sufficientrate constraints that can be constructed at each node to ensurea feasible flow allocation for multiradio multichannel wirelessnetworks. In [42], the throughput capacity of 3-D regular ad hocnetworks and 3-D heterogeneous ad hoc networks is derived forthe first time under the generalized physical interference model.The capacity scaling of multihop cellular networks is studiedin [43], and the authors further extended their method to studythe capacity of heterogeneous multihop cellular networks.In [44], the broadcast capacity for ad hoc networks is derivedwith the fixed data rate channel and the Gaussian channel,respectively.2) Unicast/Broadcast Capacity for Arbitrary Wireless Net-

    works: In [30], the authors considered the scheduling problemwhere all the communication requests are single-hop and all thenodes transmit at a fixed power level. They proposed an algo-rithm to maximize the number of concurrent transmitting linksin one time-slot. Unlike [30], the authors in [31] and [32] con-sidered a power-control problem. A family of approximation al-gorithms were presented to maximize the capacity of arbitrarywireless networks. Considering the problem of characterizingthe unicast capacity scaling in arbitrary wireless networks, theauthors proposed a general cooperative communication schemein [33].3) Unicast Capacity for Mobile Wireless Networks: A gen-

    eral framework to characterize the capacity of wireless ad hocnetworks with arbitrary mobility patterns is studied in [35]. Byrelaxing the homogeneous mixing assumption in most ex-isting works, the capacity of a heterogeneous network is ana-lyzed. Another work [36] studies the relationship between thecapacity and the delay of mobile wireless ad hoc networks,where the authors studied how much delay must be toleratedunder a certain mobile pattern to achieve an improvement ofthe network capacity. Similar to the work in [36] that considersthat Lvy mobility and human mobility share several commonfeatures, the authors in [37] studied the delaycapacity tradeoffsfor mobile wireless networks with Lvy walks and Lvy flights.

    D. RemarksCompared to the previous works, the following aspects dis-

    tinguish our work from them.1) To the best of our knowledge, our work is the first at-tempt to address the distributed data collection problemwith capacity analysis for asynchronous WSNs, which ismore complicated, however more practical. As summa-rized in Section II-A, the existing works study the datacollection capacity issue based on centralized and synchro-nized scheduling/algorithms.

    2) More importantly, we propose a scalable and order-optimalasynchronous distributed data collection algorithm. Thisdemonstrates that asynchronous distributed data collectionschemes can also achieve order-optimal data collection ca-pacity as synchronized and centralized algorithms do.

    3) We study how to apply our derived proper carrier-sensingrange to distributed data aggregation and propose a dis-tributed data aggregation algorithm. We also analyze theperformance of the proposed algorithm theoretically andvalidate its performance by simulations.

  • JI AND CAI: DISTRIBUTED DATA COLLECTION IN LARGE-SCALE ASYNCHRONOUS WSNs 1273

    III. NETWORK MODEL

    In this paper, we consider a connected WSN consisting ofone sink node serving as the base station denoted by , andsensor nodes denoted by , respectively, deployedin an area with size , where is a constant. Fur-thermore, we assume all the nodes are independent and iden-tically distributed (i.i.d.). Each node is equipped with one radioand works with a fixed power . All the data transmissionsare conducted over a common wireless channel with bandwidthbits/second. The size of a data packet is bits, and thus

    the transmission duration of a data packet is sec-onds. The maximum transmission radius of a sensor node is setto is associated with the lowest data transmission rate de-termined by the following defined generalized physical inter-ference model). Hence, the network can be modeled as a graph

    , where and includesall the possible links formed by any pair of nodes in . A node

    is said to be active at time iff is transmit-ting a data packet to some other node at time . Thus, we use

    to denote the set of all theactive nodes at time .To capture the wireless interference in wireless networks, the

    protocol interference model and physical interference modelare frequently used. Furthermore, these two models abstract adata transmission as a binary function, with values successfulor failed. Instead of modeling a data transmission processas a binary function, the Generalized Physical Interferencemodel (GPI) is more accurate to characterize a practical datatransmission. Suppose node is transmitting a data packet tonode at time , i.e., , and is the data receivingrate of from at time . Then, under the GPI model,is determined by

    (1)

    where is the signal-to-interference-plus-noiseratio (SINR) value at associated with and is definedas

    (2)

    where is the background noise, is the path loss exponentand usually , and is the Euclidian distance betweentwo nodes.Suppose the time consumption to gather all the data packets

    produced at is , then the achievable datacollection capacity can be defined as , i.e., the datacollection capacity reflects how fast that data can be gatheredby the sink.

    IV. CARRIER-SENSING RANGE

    Since we study data collection in distributed asynchronousWSNs, every node in a WSN senses the activitiesof other nodes within its CR when it has some data packets fortransmission. Only when there is no ongoing data transmissions

    within its CR, can initiate a data transmission. Thus, howto determine the CR for each node, to make all the concurrenttransmitters out of the CR of each other to simultaneously con-duct data transmissions with a data rate no less than a threshold,is crucial for the performance of a distributed data collectionscheme. Intuitively, a small CR implies a high degree of spatialreuse, which further implies small SINR values and followedby low-data receiving rates at the receivers. On the other hand,a large CR implies a low degree of spatial reuse, which furtherimplies large SINR values and high-data receiving rates. There-fore, in this section, we study how to set a PCR for each nodeto guarantee a satisfied data receiving rate and meanwhile thehighest spatial reuse degree. For clarity, we make some defini-tions as follows.Definition 4.1: -feasible state: The set of all the active

    nodes (defined in Section III) is an -feasible state if allthe nodes in can simultaneously transmit data and the datareceiving rate at each of their corresponding receivers is no lessthan . In an -feasible state , assume istransmitting a data packet to , then .Based on Definition 4.1, if the lowest tolerable data trans-

    mission rate of a WSN is , then the data collection processcan be represented as a series of -feasible states

    , where .Definition 4.2: -set ( ): Assume is the carrier-sensing

    range represented by . An -set, denoted by , isany maximal subset of that satisfiesand .Definition 4.3: -Proper Carrier-Sensing Range

    ( -PCR): The carrier-sensing range of a WSN is an-proper carrier-sensing range if for any -set , it is

    always an -feasible state.From Definition 4.3, if is an , then can ini-

    tiate a data transmission with a guaranteed data receiving rateno less than as long as there is no other active nodes withinof . Then, given a threshold of data receiving rate , the

    can be determined by Theorem 1. In the followinganalysis, as that in [45], we assume the background noise is verysmall compared to the transmission power and thuscan be ignored.Theorem 1: , whereis a constant.Proof: See Appendix A.

    From Theorem 1, we know that given a threshold of datareceiving rate , we can determine an , which is atleast a constant times . Since a small CR implies a high degreeof spatial reuse, we set

    . Furthermore, Fig. 1 depicts the relation between and, where the -axis represents the threshold of data

    receiving rate , and the -axis represents the corresponding. From Fig. 1, we can tell with the increase of ,

    the associated increases accordingly for everyvalue. This is because a high data receiving rate requires thatCR should be sufficiently large to avoid interferences, whichalso implies a low degree of spatial reuse. Additionally, a largealso implies a small . This is because the interferenceimpact decreases quickly with the increase of , which can alsobe derived from (2).

  • 1274 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 21, NO. 4, AUGUST 2013

    Fig. 1. vs. .

    V. DISTRIBUTED DATA COLLECTION AND CAPACITY

    According to the obtained in Section IV, if weset the CR of a WSN as , then all the nodes in an-set can simultaneously transmit data at

    a guaranteed data receiving rate without interference by lettingeach node work on the Re-Start (RS) mode [45]. Thus, in thissection, we propose a CSMA-like data collection algorithm fordistributed asynchronous WSNs, which has an order-optimalcapacity.

    A. Distributed Data Collection

    Before presenting the distributed data collection algorithm,for a WSN represented by , we construct a con-nected dominating set (CDS)-based data collection tree, denotedby , according to the following steps [1].1) Construct a breadth-first-search (BFS) tree in beginningat the sink , and obtain amaximal independent set (MIS)according to the search sequence. Note that is a dom-

    inating set (DS) of . For the nodes in , we call themdominators. Taking the network shown in Fig. 2(a) as anexample, the black nodes are dominators.

    2) To form a CDS, we choose the nodes, named connectors, asfew as possible in to connect all the nodes in . The setof all the connectors is denoted by . As shown in Fig. 2(a),the lightly shaded nodes are the connectors chosen to con-nect all the dominators. Then, each dominator, except for, has a connector as its parent node in . On the other

    hand, each connector has a dominator as its parent nodein .

    3) For any other node in , called a dominatee, ran-domly choose a dominator within its communication rangeas its parent node. Then, the CDS-based data collectiontree rooted at is formed. For the network shown inFig. 2(a), the constructed CDS-based data collection treeis shown in Fig. 2(b).

    Assume is the height of , i.e., the maximum numberof hops from to any node, and is the number ofhops from node to in . Evidently, according to theconstruction process of is an even number,and is an odd number. Furthermore, we define

    Fig. 2. CDS-based data collection tree. (a) Network topology and (b) CDS-based data collection tree.

    . Then, the followinglemma [1] shows some properties of .Lemma 1: [1]: 1) is adjacent to at most 12 connectors in; 2) , is adjacent to at most 11 connectorsin .Based on , we propose a DDC algorithm for asynchronous

    WSNs as shown in Algorithm 1. In Algorithm 1,is a counter that denotes the number of data packets transmittedby is the backoff contention window, and

    is the backoff time set for the transmission of theth data packet at node . As that in [45] and because of thesame reasons, we assume: 1) such that is negligiblecompared to the data transmission time; and 2) no two trans-mitters within the CR of each other have their backoff timersexpired at the same time instant.1

    Algorithm 1: The DDC Algorithm

    input: CDS-based data collection treeoutput: a distributed asynchronous data collection plan

    1 ;2 sets its CR as according to therequired threshold of data receiving rate ;

    3 while has some data packets for transmission do4 ;5 ;6 randomly sets a backoff time for the transmission

    of the th packet in window ;7 if then8 ;9 while is not countdown to 0 do10 senses the channel with ;11 if senses that the channel is busy then12 stops the countdown process (the backoff

    timer is frozen) until the channel becomes freeagain;

    13 if senses that the channel is free then14 ;15 if , i.e., the backoff timer expires then16 transmits the th data packet to its parent node;

    According to Algorithm 1, DDC runs in a CSMA-likemanner, except for the RTS/CTS working mode and the neces-1Collisions due to simultaneous countdown-to-zero can be tackled by an ex-

    ponential backoff mechanism inwhich the transmission probability of each nodeis adjusted in a dynamic way based on the network busyness [45].

  • JI AND CAI: DISTRIBUTED DATA COLLECTION IN LARGE-SCALE ASYNCHRONOUS WSNs 1275

    sity to reply an ACK packet after receiving a data packet. Thisis because by properly setting the CR and working in the RSmode, a transmission with satisfied data receiving rate can beguaranteed as shown in Section IV.In Algorithm 1 (here, taking the algorithm running process

    at node as an example), lines 15 are basic settings. Line 6randomly sets the backoff time for each data transmission. Inlines 7 and 8, the backoff time for each transmission is reset to

    , and this is mainly for fairness (any node willnot wait too long when it has some data to transmit) as shownin Theorem 2 and Corollary 2 (see Section V-B). Under this set-ting, a node cannot transmit multiple data packets in a short timeperiod. Actually, each node can transmit up to one data packetduring each backoff contention window. In lines 914, be-gins the countdown process and keeps sensing the channel with

    . If the wireless channel is busy sensed by , thecountdown process at will be frozen. In this way, when adata transmission is ongoing, all the other nodes having datapackets within the CR of the transmitter will stop their count-down process, i.e., they can share the waiting time. In lines 15and 16, transmits the th data packet when the backoff timerexpires. Since no two transmitters that within the CR of eachother have their backoff timers expired at the same time in-stant, the transmission of the th data packet can be carried outsuccessfully.

    B. Capacity AnalysisIn this section, we analyze the achievable data collection ca-

    pacity of the DDC algorithm. Since the upper-bound capacityof data collection is [9], [10], we investigate the lower-bound capacity of DDC here. First, we study the upper-boundtime consumption to collect all data packets at dominatees tothe CDS, i.e., the upper-bound time consumption to collect datapackets at to .Let , where

    is the CR used in DDC. Then, we have the followinglemma, which indicates the average/upper-bound number of thesensor nodes, denoted by , within the CR of a node.Lemma 2: Let the random variable denote the number of

    sensor nodes within the carrier-sensing area of a node. Then, wehave the following.i) .ii)

    . Thus, it is almostimpossible that the carrier-sensing area of a node containsmore than nodes, i.e., it isalmost sure that .Proof: See Appendix B.

    Based on Lemma 2, we can derive the upper-bound time con-sumption to collect all the data packets at toin DDC.Theorem 2: Any node with data packets for transmission

    can transmit at least one data packet to its parent node withintime .

    Proof: According to the DDC algorithm, for any nodewith data packets for transmission, it will carrier-sense the nodeactivities within its CR. When the backoff timer of expiresand meanwhile the channel sensed by is free, can transmit

    Fig. 3. Transmission sequence of and . (a) Case 1. (b) Case 2. (c) Case 3.

    a data packet successfully. Thus, the problem now is how longit takes for until it actually initiates a data transmission inthe worst case, i.e., the waiting time of in the worst case. Forconvenience, assume is any other node within the CR ofhaving data packets for transmission,are the backoff time for the current data transmissions of and, respectively, and , , and are the universal

    time (standard time), the system time maintained at , and ,respectively. Furthermore, if has more than one data packetfor transmission, the backoff time for to transmit a subsequentdata packet is denoted by . Evidently, the transmission se-quence of and follows one of the following three cases(note that no two transmitters within the CR of each other havetheir backoff timers expired at the same time instant).Case 1: and Share a Synchronized Backoff Contention

    Window: In this case, as shown in Fig. 3(a), will transmita data packet before/after transmits a data packet. This isbecause ,where is the backoff time chosen by for thesubsequent data transmission according to the DDC algorithm.Case 2: and Share an Asynchronous Backoff Contention

    Window and : In this case, as shown in Fig. 3(b), willtransmit a data packet before according to DDC.Case 3: and Share an Asynchronous Backoff Contention

    Window and : In this case, as shown in Fig. 3(c), whentries to transmit a data packet, it sets a backoff time for

    the packet and carrier-senses the channel. It turns out that thechannel is busy since is transmitting some data. Therefore,we conclude that because the time-slots ofand have some overlap (otherwise, cannot know that

    the channel is occupied by when it tries to transmit the datapacket). Since , it is possible that

    . This implies thatmay transmit two data packets before transmits one data

    packet. On the other hand, according to the DDC algorithm, wehave

    , where is the time that transmits itsthird data packet and is the backoff time set by for itsthird data packet transmission. Consequently, will transmitone data packet before transmits the third data packet.In summary, can transmit at most two data packets beforetransmits one data packet in the worst case. Considering that

    there are at most sensor nodes within the carrier-sensing areaof according to Lemma 2, can transmit at least one datapacket to its parent node within time in the worst case inDDC.Corollary 1: In DDC, the time consumption of collecting all

    the data packets at to is at most .Proof: Based on the construction process of the data col-

    lection tree , every node in has a parent node in

  • 1276 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 21, NO. 4, AUGUST 2013

    . Thus, all the data packets at can be transmittedto the nodes in within time according to Theorem 2.

    After time , all the data packets at will becollected to according to Corollary 1. Subsequently, weinvestigate the time consumption to collect all the data packetsat to the sink .Lemma 3: [1]: Assume that is a disk of radius andis a set of points with mutual distance of at least 1. Then,

    .Let . It follows that. Then, we can obtain the following lemma by applying

    Lemma 3.Lemma 4: Assume that is a disk of radius , then

    , where ,i.e., the number of dominators and connectors within the CR ofa node is at most in DDC.

    Proof: See Appendix C.From Lemma 4, we can obtain the following corollary.Corollary 2: After time , every node in

    with data packets for transmission can transmit at least one datapacket to its parent node within time in DDC.

    Proof: According to Lemma 4, there are at mostdominators and connectors within the CR of a node. Further-more, all the nodes in have no data packets fortransmission after time according to Corollary 1. Then, bythe same technique used to prove Theorem 2, the conclusion ofthis corollary can be obtained.Based on Lemma 4 and Corollary 2, we can obtain the time

    consumption to collect all the data packets at tothe sink as shown in Theorem 3.Theorem 3: After time , it takes at most

    time to collect all the data packets atto the sink in DDC, where is the degree of in the datacollection tree .

    Proof: As shown in Corollary 1, after time , all thenodes in have no data packets for transmission, andmeanwhile, has received at least data packets accordingto Theorem 2 since it has child nodes in . Subsequently,receives at least one data packet in every timeaccording to Corollary 2. Thus, it takes at most

    time to collect all the data packets at tothe sink after time .Theorem 4: The lower bound of data collection capacity

    achieved by DDC is , which isscalable and order-optimal.

    Proof: According to Theorems 2 and 3, to collect all thedata packets to the sink, the time consumption

    (3)

    (4)

    (5)

    (6)

    Thus, the achievable data collection capacity of DDCis

    . As mentioned before, theupper-bound capacity of data collection is [9], [10], and

    is a constant value depending on , which implies theachievable data collection capacity of the DDC algorithm isorder-optimal. Furthermore, since is independent of networksize , DDC is scalable.

    VI. -BASED DISTRIBUTED DATA AGGREGATIONAs introduced in Section I, data gathering can be categorized

    as data collection and data aggregation. Therefore, for com-pleteness, in this section we discuss how to apply the derivedproper carrier-sensing range to distributed data ag-gregation in WSNs.In data aggregation, multiple data packets can be aggregated

    into one data packet by applying an aggregation function, e.g.,MAX, MIN, SUM, etc. Formally, the data aggregation problemcan be defined as follows. Let and . Thedata of the nodes in is said to be aggregated to the nodes inin a time-slot if all the nodes in can transmit their data

    packets to the nodes in concurrently and interference-freelyduring a time-slot. Here, is called a transmitter set. Then,the data aggregation problem can be defined as to seek a dataaggregation schedule that consists of a sequence of transmittersets , such that we have the following.1) , .2) , where is the latency of this dataaggregation schedule.

    3) Data can be aggregated from to duringtime-slot for .

    Ever since the data aggregation problem is raised, extensiveresearch has been conducted on this issue ([1], [20][25], andreferences therein), especially for the Minimum-Latency Ag-gregation Schedule (MLAS) problem, which tries to obtain adata aggregation schedule with the objective to minimize thelatency (minimize ). In [1], [20], and [21], several central-ized data aggregation algorithms are proposed under the UnitDisk Graph (UDG) model and the protocol interference model.Chen et al. [20] proved that theMLAS problem is NP-hard. Fur-thermore, they designed a -approximation algorithm forthis problem, where is the maximum degree of the topolog-ical graph of a network. Subsequently, Huang et al. [21] pro-posed another data aggregation algorithm that has a better per-formance. By analysis, they showed that the delay of their al-gorithm is upper-bounded by ( and isdefined in Section V), where is the network radius. Recently,Wan et al. [1] proposed three data aggregation algorithms of la-tency upper-bounded by , ,and , respectively. Xu et al. [22] studiedperiodic query scheduling for data aggregation with the min-imum delay consideration. They designed the centralized ag-gregation scheduling algorithms under various wireless interfer-ence models and analyzed the induced delay of each algorithm.As explained in Section I, centralized algorithms have manyshortcomings in distributed wireless networks. To overcomethese shortcomings, some state-of-the-art distributed algorithmsare proposed under the UDG model and the protocol interfer-ence model [23][25]. In [23], Yu et al. proposed a distributed

  • JI AND CAI: DISTRIBUTED DATA COLLECTION IN LARGE-SCALE ASYNCHRONOUS WSNs 1277

    CDS-based data aggregation schedule algorithm with latencyupper-bounded by , where is the networkdiameter. Xu et al. [24] also proposed a distributed data aggre-gation algorithm with a better latency bound of ,where is the inferior network radius that satisfies

    . The most recently published distributed data aggre-gation algorithm is [25], in which Li et al. proposed an aggre-gation scheme of latency upper-bounded by .Unlike the previous works, we design an -based

    DDA algorithm. The main differences between this DDA andthe previous works can be summarized as follows. First, DDA isa distributed and asynchronous algorithm, while many previousalgorithms (e.g., [1], [20][22]) are centralized. Since WSNstend to be distributed systems, distributed and asynchronous al-gorithms are more practical and suitable. Second, DDA runsunder the generalized physical interference model, while mostof the previous works are under the UDG model or the protocolinterference model. Compared to the generalized physical in-terference model, the protocol interference model is simplifiedand can make the analysis process much easier. On the otherhand, the generalized physical interference model considers theaggregated interference in a WSN, which is more practical aswell as more complicated.The description of our algorithm is shown in Algorithm 2.

    DDA is similar to DDC. The main difference is that each nodeonly transmits one data packet to its parent node,

    while in DDC, it may have to transmit multiple data packets toits parent node, i.e., the traffic load of a data collection task ismuch heavier than that of a data aggregation task.

    Algorithm 2: The DDA Algorithm

    input: CDS-based data collection treeoutput: a distributed asynchronous data aggregation plan

    1 sets its CR as according to therequired threshold of data receiving rate ;

    2 while has not received the aggregation data do3 if is a leaf node in or has received the

    aggregation data from all of its children in then4 if is a nonleaf node then5 obtains the aggregation value of its data

    and the data of its children by applying theaggregation function;

    6 randomly sets a backoff time for its datatransmission in window ;

    7 while is not countdown to 0 do8 senses the channel with ;9 if senses that the channel is busy then10 stops the countdown process (the backoff

    timer is frozen) until the channel becomesfree again;

    11 if senses that the channel is free then12 ;13 if , i.e., the backoff timer expires then14 transmits the aggregation data to its parent

    node in .

    In Algorithm 2, the routing structure is a CDS-based data col-lection tree as in DDC, and we also assume no two transmit-ters within the CR of each other have their backoff timers ex-pired at the exactly same time instant.Now, we analyze the delay performance of DDA. Similar to

    the delay of DDC, we can obtain the upper bound of the timeconsumption of DDA as shown in Theorem 5.Theorem 5: The induced delay of DDA is upper-bounded

    by time-slots, where is theheight of the data collection (aggregation) tree , and

    is a constant value dependingon .

    Proof: From Lemma 2, the upper bound of the numberof nodes within a disk of radius is

    . Therefore, for any node, it waits at mosttime-slots before transmitting its data to its parent node

    (minus two means the transmitter and its parent node are notcounted). Therefore, it takes at most time to aggregateall the data at to according to the schedulestrategy in DDA. After time, there is no data fortransmission at nodes in . Based on Lemma 4, thenumber of dominators and connectors within a disk of radius

    is upper-bounded by . Consequently,according to DDA, a node in has an opportunity to transmitone data packet within time . Consideringthe height of the data collection (aggregation) tree is (whichimplies the number of hops from the sink to any node inis at most ), the number of time-slots consumed by DDAis upper-bounded by

    (7)(8)

    (9)(10)

    where .

    VII. DATA COLLECTION AND AGGREGATION UNDER THEPOISSON DISTRIBUTION MODEL

    In Section III, we assume that all the sensor nodes are inde-pendent and identically distributed. Based on that network dis-tribution model, we obtain the achievable capacity of the pro-posed data collection method DDC, which is order-optimal, andthe delay upper bound of the designed data aggregation methodDDA. To be more genral, in this section, we consider anotherfrequently employed non-i.i.d. model, named thePoisson distri-butionmodel, and analyze the performances of DDC and DDA.Under the Poisson distribution model, we assume that one

    sink node and sensor nodes are distributedaccording to a two-dimensional Poisson point process with den-sity in some area with size . To make data collec-tion and aggregation meaningful, we also assume that the net-work is connected. Then, by the same method in Section V, aCDS-based data collection tree can be constructed. Therefore,we can still exploit DDC and DDA to finish data collection and

  • 1278 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 21, NO. 4, AUGUST 2013

    aggregation tasks under the Poisson distribution model. Now,we analyze the delay performance of DDC and DDA.Let .We

    first analyze the average/upper bound of the number of sensornodes within the CR of a node as shown in the following lemma.Lemma 5: Let the random variable denote the number

    of sensor nodes within the CR of a node. Then, we have thefollowing.a) .b) It is almost sure that the number of sensor nodes withinthe CR of a node is upper-bounded by , where

    .Proof: See Appendix D.

    Based on Lemma 5, it is reasonable to take as theupper bound of the number of sensor nodes within the CR ofa node. Then, we have the following theorem, which indicatesthe upper bound of the induced delay of our data collection al-gorithm DDC under the Poisson distribution model.Theorem 6: Under the Poisson distribution model, the in-

    duced delay of DDC to collect all the data ( data packets) to thesink is upper-bounded by ,where .

    Proof: Based on Lemma 5 and by similar methods toTheorem 2 and Corollary 1, it can be proven that the timeconsumption to collect all the data packets at to

    is upper-bounded by . Subsequently, similarto Theorem 3, the time consumption to collect all the datapackets to the sink is

    (11)(12)

    where .Based on Theorem 6, the achievable data collection capacity

    of DDC can be obtained as shown in the following theorem.Theorem 7: Under the Poisson distribution model, the

    achievable data collection capacity of DDC is lower-boundedby , which is scalable andorder-optimal.

    Proof: By a similar method to Theorem 4, this theorem canbe proven.Now, we analyze the induced delay of DDAunder the Poisson

    distribution model, which is shown in Theorem 8Theorem 8: Under the Poisson distribution model, the in-

    duced delay of DDA is upper-bounded by, where

    and are constant values.Proof: By a similar method to Theorem 5, this theorem can

    be proven.

    VIII. SIMULATION RESULTS

    In this section, we present simulation results to validate theperformances of DDC and DDA. In all the simulations, we con-sider the WSNs consisting of one sink node and sensor nodesthat are randomly deployed in a square area with size .Thus, the node density is . Since our primary concern is the

    achievable capacity and scalability (respectively, induced delay)of DDC (respectively, DDA), we make some simplification andnormalization on the simulation settings. The maximum trans-mission radius of a node is normalized to one, and any node canwork on the RS mode with the IPCS technique [45]. During thedata collection period, every node produces a data packet whosesize is also normalized to one. Furthermore, all the nodes workwith the same transmission power and over a commonwireless channel with bandwidth normalized to one, which im-plies the transmission time of a data packet is 1 in the idealcase. Then, we set the backoff contention windowfor DDC and DDA in all the simulations. For a data transmis-sion, the background noise is negligible compared to the inter-ference brought by concurrent transmissions. Hence, we do notconsider the background noise. For other important system pa-rameters, e.g., the network size , the node density , thenumber of nodes , the path loss exponent , etc., we specifythem later in each group of simulations.The compared algorithm for DDC is the Multi-Path Sched-

    uling (MPS) algorithm proposed in [10], which is the mostrecently published centralized and synchronized data collectionmethod under the simplified protocol interference model forWSNs. In MPS, the interference radius ,where is a constant and is the communication ra-dius of a node. Thus, in the following simulations, we set

    , which guarantees that MPS can also initiatedata transmissions with a satisfied data receiving rate . Thecompared algorithm for DDA is the Enhanced Pipelined Ag-gregation Scheduling (E-PAS) algorithm [1], which is the bestand latest centralized data aggregation algorithm. Since E-PASis also designed under the protocol interference model, we setthe interference radius of E-PAS to according todifferent values. In the following, each group of simulationsis repeated for 100 times and the results are the average values.

    A. DDC Capacity Versus andIn this section, we consider the WSNs deployed in a square

    area with size and the node density is 3. Theimpacts of and on the capacities of DDC and MPS areshown in Fig. 4. From Fig. 4(a)(c), we can see that with the in-crease of , the achievable capacities of both DDC and MPSincrease. Although a large implies a large (shownin Fig. 1), which further implies that fewer nodes can conducttransmissions concurrently, on the other hand a largealso implies short transmission time of a data packet. Further-more, with the increase of , the decrease of the transmissiontime of a data packet is faster than the increase of ,i.e., dominates the achievable data collection capacity. It fol-lows that a large leads to a high capacity for both DDC andMPS.From Fig. 4(d)(f), we can see that with the increase of , the

    achievable capacities of DDC andMPS also increase. This is be-cause, for any transmission, the interference impact from otherconcurrent transmissions decreases quickly with the increase of. Thus, a large implies a small and results in morenodes being able to initiate transmissions concurrently. There-fore, the achievable data collection capacities of DDC and MPSincrease when increases.

  • JI AND CAI: DISTRIBUTED DATA COLLECTION IN LARGE-SCALE ASYNCHRONOUS WSNs 1279

    Fig. 4. DDC capacity versus MPS capacity. (a) . (b) . (c) . (d) . (e) . (f) .

    From Fig. 4, we can also see that DDC achieves similar datacollection capacity to the centralized and synchronous MPS, al-though DDC is a distributed and asynchronous data collectionalgorithm. The reason is that we set a proper CR for DDC. Bysetting the CR of each node as , as many as pos-sible nodes can initiate data transmissions concurrently with aguaranteed data receiving rate at the receivers. This can alsobe seen from Theorem 1. From the proof of Theorem 1, bypacking all the possible concurrent data transmissions in thedensest manner, we obtain a small proper CR maximizing thenumber of concurrent transmissions. Consequently, as many aspossible transmissions can be scheduled simultaneously withoutinterference at any time, inducing high achievable data collec-tion capacity of DDC. Particularly, the average capacity differ-ences between DDC and MPS are 5.25%, 4.99%, and 4.27%when , , and , respectively, which indicatesthat DDC achieves comparable capacity as centralized and syn-chronized MPS.

    B. Scalability of DDCWe examine the scalability of DDC with respect to the

    number of sensor nodes in a WSN. In the following simula-tions, we set the path loss exponent to 4, to 1 (i.e., theCR for DDC is 1-PCR), the default network size to 10 10,and the default node density to be 4. The impacts of the nodedensity and the network size on the scalability and achievablecapacities of DDC and MPS are shown in Fig. 5. where wecan see that with the increase of the number of sensor nodes[by fixing the network size and increasing the node density inFig. 5(a) and fixing the node density and increasing the networksize in Fig. 5(b)], the achievable capacity of DDC keeps stableas that of centralized and synchronized MPS, which impliesDDC is scalable with respect to , the number of sensor nodesin a WSN. This is because the capacity of DDC only dependson , which is a distance-dependent parameter. Thus,DDC is scalable for WSNs with different network sizes andnode densities.

    Fig. 5. (a) DDC/MPS capacity versus node density and (b) versus network size.

    C. Performance of DDA

    In this section, we examine the performance of DDA withrespect to and the number of sensor nodes . In all thesimulations, we set the node density to 4. The results are shownin Fig. 6.From Fig. 6(a)(c), we can see that with the increase of the

    guaranteed data receiving rate , the induced delay by bothDDA and E-PAS increases for different values. This is dif-ferent from the data collection situation, where the capacitiesof both DDC and MPS increase when increases. This is be-cause of the following.1) With the increase of , the corresponding in-creases as well (which can be seen from Fig. 1). It followsthat fewer data transmissions can be conducted simultane-ously in DDA and E-PAS. On the other hand, even a larger

    implies more data can be transmitted during one datatransmission, i.e., fewer transmission times. The induceddelay of DDA and E-PAS still increases with the increaseof since now plays the dominating role indata aggregation

    2) Data collection has much more traffic (which is of order of) than data aggregation (which is of order of ).

    Therefore, the data transmission rate (decided by hasmore impacts on the delay (as well as capacity) of data col-

  • 1280 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 21, NO. 4, AUGUST 2013

    Fig. 6. Data aggregation delay of DDA and E-PAS. (a) . (b) . (c) . (d) . (e). (f) .

    lection, while the data transmission concurrency (decidedby has more impacts on the delay of data ag-gregation, i.e., the guaranteed data receiving rate willdominate the delay increasing of data collection while thecarrier-sensing (interference) range will dom-inate the delay increasing of data aggregation.

    From Fig. 6(a)(c), we can also see that DDA has similar delayperformance to E-PAS although DDA schedules data transmis-sion in a distributed and asynchronous manner. On average,the delay differences between DDA and E-PAS in Fig. 6(a)(c)are around 3.1%, 3.2%, and 2.6% respectively, which are quitesmall.The data aggregation delay of DDA and E-PAS inWSNswith

    different sizes is shown in Fig. 6(d)(f). From Fig. 6(d)(f), wecan see that the induced delay of DDA and E-PAS increaseswhen the network becomes larger. The reason is straightfor-ward since more sensor nodes imply heavier traffic load. FromFig. 6(d)(f), we can also see that the delay difference betweenDDA and E-PAS is very small. Particularly, in Fig. 6(d)(f), theaverage delay differences between DDA and E-PAS are about6.1%, 4.4%, and 3.3% respectively, which implies DDA hascomparable delay performance as the best centralized data ag-gregation algorithm.

    IX. CONCLUSION AND FUTURE WORKSince WSNs in practice tend to be distributed asynchronous

    systems and most of the existing works study the networkcapacity issues for centralized synchronized WSNs, we inves-tigate the achievable data collection capacity for distributedasynchronous WSNs in this paper. To avoid data transmissioncollisions/interferences, we derive an -proper carrier-sensingrange under the generalized physical interference model. Bytaking -PCR as its carrier-sensing range, any node caninitiate a data transmission with a guaranteed data receivingrate. Subsequently, based on the obtained -PCR, we proposea scalable Distributed Data Collection algorithm with fairness

    consideration for asynchronous WSNs. Theoretical analysisof DDC surprisingly shows that its achievable data collectioncapacity is also order-optimal as that of centralized synchro-nized algorithms. Moreover, we study how to apply -PCRto distributed data aggregation in asynchronous WSNs andpropose a Distributed Data Aggregation algorithm. By analysis,the delay bound of DDA is present. To be more general, weinvestigate the delay and capacity of DDC and DDA under thePoisson node distribution model. The analysis again shows thatDDC is order-optimal and scalable with respect to achievabledata collection capacity. The extensive simulation resultsdemonstrate that DDC has comparable data collection capacitycompared to the most recently published centralized and syn-chronized data collection algorithm, and DDC is scalable inWSNs with different network sizes and node densities. DDAalso has similar performance to the latest and best centralizeddata aggregation algorithm.The future work can be conducted along the following direc-

    tions. First, we would like to apply the derived PCR to otherissues in WSNs, e.g., broadcast scheduling, multicast sched-uling, etc, and propose efficient distributed solutions for theseissues. Second, we study the data collection and aggregationproblems for randomly deployed WSNs in this paper. However,it is still an open problem to design an order-optimal data col-lection algorithm in arbitrarily distributed WSNs. The reasonis that the nodes may distribute according to any model in arbi-trary WSNs, and thus there are many challenges to design anorder-optimal data collection algorithm with accurate capacityanalysis. Therefore, we will study order-optimal distributeddata collection and aggregation issues for arbitrarily distributedWSNs. Finally, there is a tradeoff between network capacityand lifetime. In this paper, we focus on designing a distributeddata collection algorithm with the objective to maximize theachievable capacity. In the future work, we would like to studyhow to implement an order-optimal data collection algorithmand meanwhile maximize network lifetime.

  • JI AND CAI: DISTRIBUTED DATA COLLECTION IN LARGE-SCALE ASYNCHRONOUS WSNs 1281

    Fig. 7. (a) Link abstraction and (b) hexagon packing.

    APPENDIX APROOF OF THEOREM 1

    Let and . To make any -setalways an -feasible state, for , assuming its

    destination node is , then we have

    (13)(14)

    (15)

    (16)

    (17)

    (18)

    Now, we derive the lower bound of. Evidently,

    since is the maximum transmissionrange of a node (defined in Section III). Furthermore, ifwe abstract a data transmission link as a node as shown inFig. 7(a), then for the nodes in , the densest packing ofnodes is the hexagon packing [45] with edge length as shownin Fig. 7(b). Subsequently, the nodes in can be layered withrespect to (abstracted by the transmission link from to), with the th layer having at most nodes. Furthermore,

    the distance between and any node at the th layer is no lessthan . Then, we have

    (19)

    (20)

    (21)

    In (21), , where is the Riemann

    zeta function. Considering that , then. It follows that . Thus, we have

    (22)

    (23)

    (24)

    (25)

    where . It follows that

    (26)

    Therefore, to make (18) valid, it is sufficient to have

    (27)

    (28)

    (29)

    (30)

    Therefore,.

    APPENDIX BPROOF OF LEMMA 2

    Since all the wireless nodes are i.i.d. in an area with size, then for any node, it is located at the carrier-sensing area

    of a particular node with probability . Then,satisfies the binomial distribution with parameters . Thus,the average number of the nodes within the carrier-sensing areaof a node is .Now, we prove the second statement. Let

    . Then, applying the Chernoff bound andfor any , we have

    (31)

    (32)

    (33)

    (34)

    (35)

    (36)

    Particularly, let , then

    (37)(38)

  • 1282 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 21, NO. 4, AUGUST 2013

    Fig. 8. Number of dominators and connectors within the CR of a node.

    (39)

    (40)

    (41)

    is the Riemann zeta function with parameter 2, and

    . It follows that

    according to the Borel-Cantelli Lemma, i.e., it is almost surethat the carrier-sensing area of a node contains no more than

    nodes. Thus, it is reasonable to useas the upper bound of the number

    of the nodes within the carrier-sensing area of a node, i.e.,.

    APPENDIX CPROOF OF LEMMA 4

    Since is a disk of radius , it is possible forsome connectors in only connecting some dominators out ofdisk as shown in Fig. 8. On the other hand, all the dominatorsadjacent to the connectors in must locate in a concentricdisk of with radius , denoted byas shown in Fig. 8.Now, if is normalized to 1, then (respectively, ) is a

    disk of radius (respectively, ), and is a set of nodeswith mutual distance of at least 1. Then, by Lemma 3, we have

    (respectively,), i.e., the number

    of the dominators within (respectively, ) is at most(respectively, ). Additionally, according to the aforemen-tioned discussion and the CDS-based data collection tree con-struction process, each connector in must have a dom-inator parent located at disk , which implies

    . It follows thatis proven.

    APPENDIX DPROOF OF LEMMA 5

    1) Since the sensor nodes are distributed according to atwo-dimensional Poisson point process with density ,we have

    (42)

    (43)

    (44)

    2) Similar to the proof in Lemma 2, applying the Chernoffbound and for any , we have

    (45)

    (46)

    (47)

    (48)

    Since is upper-bounded by , it fol-lows that the number of sensor nodes within the CR of anode is upper-bounded by almost surely, where

    .

    REFERENCES[1] P.-J. Wan, S. C.-H. Huang, L. Wang, Z. Wan, and X. Jia, Min-

    imum-Latency aggregation scheduling in multihop wireless net-works, in Proc. MobiHoc, 2009, pp. 185194.

    [2] C. Liu and G. Cao, Distributedmonitoring and aggregation in wirelesssensor networks, in Proc. IEEE INFOCOM, 2010, pp. 19.

    [3] M. Yan, J. S. He, S. Ji, and Y. Li, Multi-regional query scheduling inwireless sensor networks with minimum latency, Wireless Commun.Mobile Comput., 2012, DOI: 10.1002/wcm.2238, accepted for publi-cation.

    [4] S. Ji, A. S. Uluagac, R. Beayah, and Z. Cai, Practical unicast andconvergecast scheduling schemes for cognitive radio networks, J.Combin. Optim., 2012, DOI:10.1007/s10878-011-9446-7, acceptedfor publication.

    [5] M. Yan, J. S. He, S. Ji, and Y. Li, Minimum latency scheduling formulti-regional query in wireless sensor networks, in Proc. IEEEIPCCC, 2011, pp. 18.

    [6] P. Gupta and P. R. Kumar, The capacity of wireless networks, IEEETrans. Inf. Theory, vol. 46, no. 2, pp. 388404, Mar. 2000.

    [7] C. Luo, F. Wu, J. Sun, and C. W. Chen, Compressive data gatheringfor large-scale wireless sensor networks, in Proc. MobiCom, 2009, pp.145156.

    [8] X. Zhu, B. Tang, and H. Gupta, Delay efficient data gathering in sensornetworks, in Proc. MSN, 2005, pp. 380389.

    [9] S. Chen, S. Tang, M. Huang, and Y. Wang, Capacity of data collec-tion in arbitrary wireless sensor networks, in Proc. IEEE INFOCOM,2010, pp. 15.

    [10] S. Ji, Y. Li, and X. Jia, Capacity of dual-radio multi-channel wire-less sensor networks for continuous data collection, in Proc. IEEEINFOCOM, 2011, pp. 10621070.

    [11] S. Ji, Z. Cai, Y. Li, and X. Jia, Continuous data collection capacityof dual-radio multi-channel wireless sensor networks, IEEE Trans.Parallel Distrib. Syst., vol. 23, no. 10, pp. 18441855, Oct. 2012.

    [12] S. Chen, Y. Wang, X.-Y. Li, and X. Shi, Order-optimal data collec-tion in wireless sensor networks: Delay and capacity, in Proc. IEEESECON, 2009, pp. 19.

    [13] S. Chen, Y. Wang, X.-Y. Li, and X. Shi, Data collection capacityof random-deployed wireless sensor networks, in Proc. IEEEGLOBECOM, 2009, pp. 16.

    [14] S. Ji, R. Beyah, and Y. Li, Continuous data collection capacity ofwireless sensor networks under physical interference model, in Proc.IEEE MASS, 2011, pp. 222231.

    [15] S. Ji, J. S. He, A. S. Uluagac, R. Beyah, and Y. Li, Cell-based snapshotand continuous data collection in wireless sensor networks, Trans.Sensor Netw., 2012, accepted for publication.

    [16] T.Moscibroda, The worst-case capacity of wireless sensor networks,in Proc. IPSN, 2007, pp. 110.

  • JI AND CAI: DISTRIBUTED DATA COLLECTION IN LARGE-SCALE ASYNCHRONOUS WSNs 1283

    [17] S. Ji, R. Beyah, and Z. Cai, Snapshot/continuous data collection ca-pacity for large-scale probabilistic wireless sensor networks, in Proc.IEEE INFOCOM, 2012, pp. 10351043.

    [18] Z. Cai, S. Ji, J. S. He, and A. G. Bourgeois, Optimal distributeddata collection for asynchronous cognitive radio networks, in Proc.ICDCS, 2012, pp. 245254.

    [19] C. Wang, C. Jiang, Y. Liu, X.-Y. Li, S. Tang, and H. Ma, Aggregationcapacity of wireless sensor networks: Extended network case, in Proc.IEEE INFOCOM, 2011, pp. 17011709.

    [20] X. Chen, X. Hu, and J. Zhu, Minimum data aggregation time problemin wireless sensor networks, in Proc. MSN, 2005, pp. 133142.

    [21] S. C.-H. Huang, P.-J. Wan, C. T. Vu, Y. Li, and F. Yao, Nearly con-stant approximation for data aggregation scheduling in wireless sensornetworks, in Proc. IEEE INFOCOM, 2007, pp. 366372.

    [22] X. H. Xu, X.-Y. Li, P.-J. Wan, and S. J. Tang, Efficient scheduling forperiodic aggregation queries in multihop sensor networks, IEEE/ACMTrans. Netw., vol. 20, no. 3, pp. 690698, Jun. 2012.

    [23] B. Yu, J. Li, and Y. Li, Distributed data aggregation schedulingin wireless sensor networks, in Proc. IEEE INFOCOM, 2009, pp.21592167.

    [24] X. H. Xu, S. G. Wang, X. F. Mao, S. J. Tang, and X. Y. Li, Animproved approximation algorithm for data aggregation in multi-hopwireless sensor networks, in Proc. FOWANC, 2009, pp. 4756.

    [25] Y. Li, L. Guo, and S. K. Prasad, An energy-efficient distributed algo-rithm for minimum-latency aggregation scheduling in wireless sensornetworks, in Proc. IEEE ICDCS, 2010, pp. 827836.

    [26] X.-Y. Li, S.-J. Tang, and O. Frieder, Multicast capacity for large scalewireless ad hoc networks, in Proc. MobiCom, 2007, pp. 266277.

    [27] Y.Wang, X. Chu, X.Wang, and Y. Cheng, Optimal multicast capacityand delay tradeoffs in MANETs: A global perspective, in Proc. IEEEINFOCOM, 2011, pp. 640648.

    [28] C. Wang, C. Jiang, X.-Y. Li, S. Tang, and P. Yang, General capacityscaling of wireless networks, in Proc. IEEE INFOCOM, 2011, pp.712720.

    [29] P. Kyasanur and N. H. Vaidya, Capacity of multi-channel wirelessnetworks: Impact of number of channels and interfaces, in Proc. Mo-biCom, 2005, pp. 4357.

    [30] O. Goussevskaia, R. Wattenhofer, M. M. Halldorsson, and E. Welzl,Capacity of arbitrary wireless networks, in Proc. IEEE INFOCOM,2009, pp. 18721880.

    [31] M. Andrews andM. Dinitz, Maximizing capacity in arbitrary wirelessnetworks in the SINR model: Complexity and game theory, in Proc.IEEE INFOCOM, 2009, pp. 13321340.

    [32] E. I. Asgeirsson and P. Mitra, On a game theoretic approach to ca-pacity maximization in wireless networks, in Proc. IEEE INFOCOM,2011, pp. 30293037.

    [33] U. Niesen, P. Gupta, and D. Shah, On capacity scaling in arbitrarywireless networks, IEEE Trans. Inf. Theory, vol. 55, no. 9, pp.39593982, Sep. 2009.

    [34] U. Niesen, P. Gupta, and D. Shah, The balanced unicast and multicastcapacity regions of large wireless networks, IEEE Trans. Inf.n Theory,vol. 56, no. 5, pp. 22492271, May 2010.

    [35] M. Garetto, P. Giaccone, and E. Leonardi, On the capacity of ad hocwireless networks under general node mobility, in Proc. IEEE IN-FOCOM, 2007, pp. 357365.

    [36] G. Sharma, R. Mazumdar, and N. B. Shroff, Delay and capacity trade-offs in mobile ad hoc networks: A global perspective, IEEE/ACMTrans. Netw., vol. 15, no. 5, pp. 981992, Oct. 2007.

    [37] K. Lee, Y. Kim, S. Chong, I. Rhee, andY. Yi, Delay-capacity tradeoffsfor mobile networks with Lvy walks and Lvy flights, in Proc. IEEEINFOCOM, 2011, pp. 31283136.

    [38] V. Bhandari and N. H. Vaidya, Connectivity and capacity ofmulti-channel wireless networks with channel switching constraints,in Proc. IEEE INFOCOM, 2007, pp. 785793.

    [39] V. Bhandari and N. H. Vaidya, Capacity of multi-channel wirelessnetworks with random (c; f) assignment, in Proc. MobiHoc, 2007, pp.229238.

    [40] H.-N. Dai, K.-W. Ng, R. C.-W. Wong, and M.-Y. Wu, On the ca-pacity of multi-channel wireless networks using directional antennas,in Proc. IEEE INFOCOM, 2008, pp. 628636.

    [41] H. Li, Y. Cheng, P.-J.Wan, and J. Cao, Local sufficient rate constraintsfor guaranteed capacity region in multi-radio multi-channel wirelessnetworks, in Proc. IEEE INFOCOM, 2011, pp. 990998.

    [42] P. Li,M. Pan, andY. Fang, The capacity of three-dimensional wirelessAd Hoc networks, in Proc. IEEE INFOCOM, 2011, pp. 14851493.

    [43] P. Li, X. Huang, and Y. Fang, Capacity scaling of multihop cellularnetworks, in Proc. IEEE INFOCOM, 2011, pp. 28312839.

    [44] X.-Y. Li, J. Zhao, Y. Wu, S. Tang, X. Xu, and X. Mao, Broadcastcapacity for wireless ad hoc networks, in Proc. IEEE MASS, 2008,pp. 114123.

    [45] L. Fu, S. C. Liew, and J. Huang, Effective carrier sensing in CSMAnetworks under cumulative interference, in Proc. IEEE INFOCOM,2010, pp. 19.

    Shouling Ji (S10) received the B.S. and M.S. de-grees in computer science from Heilongjiang Univer-sity, Harbin, China, in 2007 and 2010, respectively,and is currently pursuing the Ph.D. degree in com-puter science at Georgia State University, Atlanta.His research interests include wireless sensor net-

    works, data management and analysis of cognitiveradio networks, and network evolution analysis of so-cial networks.Mr. Ji is a student member of the Association

    for Computing Machinery (ACM) and the IEEECOMSOC.

    Zhipeng Cai (S06M12) received the B.S. degreein computer science from the Beijing Institute ofTechnology, Beijing, China, in 2001, and the M.S.and Ph.D. degrees in computing science from theUniversity of Alberta, Edmonton, AB, Canada, in2004 and 2008, respectively.He is currently an Assistant Professor with the

    Department of Computer Science, Georgia StateUniversity, Atlanta. His research interests includewireless networks, bioinformatics, and optimizationtheory.

    Dr. Cai has served as a Guest Editor for Algorithmica and the Journal ofCombinatorial Optimization, and as an Editor for the International Journal ofSensor Networks. He has received a number of awards and honors, including theNSERC Postdoc Fellowship and the Ph.D. Outstanding Research AchievementAward.