Download pdf - Distributed estimation over complex networks

Information Sciences 197 (2012) 91–104

Contents lists available at SciVerse ScienceDirect

Information Sciences

journal homepage: www.elsevier .com/locate / ins

Distributed estimation over complex networks

Ying Liu a, Chunguang Li a,⇑, Wallace K.S. Tang b, Zhaoyang Zhang a

a Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, PR Chinab Department of Electronic Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong

a r t i c l e i n f o a b s t r a c t

Article history:Received 8 March 2011Received in revised form 8 February 2012Accepted 13 February 2012Available online 18 February 2012

Keywords:Complex networkDiffusion LMSDistributed estimationNetwork topologyScale-freeSmall-world

0020-0255/$ - see front matter � 2012 Elsevier Incdoi:10.1016/j.ins.2012.02.008

⇑ Corresponding author.E-mail address: [email protected] (C. Li).

Distributed estimation is an appealing technique for in-network signal processing. In thispaper, we investigate the impacts of network topology on the performance of a distributedestimation algorithm, namely adaptive-then-combine diffusion LMS, based on the datawith or without the temporal and spatial independence assumptions. The study covers dif-ferent network models, including the regular, the small-world, the random and the scale-free, while the performance is analyzed according to the mean stability, mean-squareerrors, communication cost and robustness. Simulation results show that the estimationperformance is largely dependent on the topological properties of the networks, such asthe average path length, the clustering coefficient and the degree distribution, indicatingthat the network topology indeed plays an important role in distributed estimation. Fromthe design point of view, this study also provides some guidelines on how to design a net-work such that the qualities of estimates are optimized.

� 2012 Elsevier Inc. All rights reserved.

1. Introduction

As one of the major challenges in networked systems, distributed signal processing has received much attention duringthe last decade [15,18,23,24,30]. It deals with the information extraction from multiple spatially distributed nodes over ageographic area. In contrast to the classical centralized processing (for which all the nodes convey information to a centralunit for processing), distributed processing performs the task based on the local computations at each node and the infor-mation obtained from its neighboring nodes. Distributed processing is well-known for its significant improvement in theflexibility and the fault tolerance of the network. Therefore, it has been widely applied in the contexts of in-network signaland information processing, precision agriculture, environmental monitoring, military surveillance, source localization andextraction [11,13,20,33].

Distributed detection and distributed estimation are two main focuses in distributed signal processing. The former is tomanage the event detection in a distributed fashion by minimizing the average probability of error based on the Bayesianrule. This has been studied for some structured networks, such as the centralized parallel networks [39], the serial or incre-mental networks [35], and the networks with similar structure [6,10]. On the other hand, distributed estimation is to esti-mate a vector of interest for each node, where the accuracy is improved by accessing to the measurements from a subset ofits neighbors.

Generally, according to the manner by which the nodes communicate with each other, distributed estimation schemescan be classified into incremental algorithms and diffusion algorithms. In incremental algorithms, such as incrementalleast-mean squares (LMS) [21,22] and incremental recursive least squares (RLS) [29], information is transmitted in a sequen-tial manner from one node to one of its adjacent nodes. Although a relatively low communication overhead is entailed, a

. All rights reserved.

http://dx.doi.org/10.1016/j.ins.2012.02.008

mailto:[email protected]


http://www.sciencedirect.com/science/journal/00200255

http://www.elsevier.com/locate/ins

92 Y. Liu et al. / Information Sciences 197 (2012) 91–104

cyclic path through the network is demanded. In the eventuality of a sensor failure, a new cyclic path has to be re-estab-lished. Since such a path finding problem is NP-hard, the applicability of incremental algorithms in large size network isquestionable [31]. The diffusion algorithms, however, allow each node to communicate with all of its neighbors as reflectedby the network topology. Typical examples include diffusion LMS [9,22,27,34], diffusion RLS [7], and diffusion Kalman filter-ing [8]. As a cyclic pathway is no longer required, these algorithms are more preferable in practical engineering.

Yet, accompanying with the diffusion cooperative protocol, a natural question has arisen: How does the network topologyaffect the performance of distributed signal processing? This also pertains to the design of network topology so that the re-lated properties, such as the robustness, communication cost and complexity, can be optimized. For example, in the wirelesssensor networks, the optimal cooperative rule governed by the network topology is more desirable and critical so that lessdata transmission is required, thereby saving bandwidth and energy usage. This in turn provides a longer network lifetimeand a reduction of the network latency since the wireless sensors are commonly battery powered and energy limited [11,13].

The impact of network topology on the performance of distributed detection has been reported in literature. For instance,Aldosari and Moura have investigated the performance of distributed detection in small-world sensor networks [2,3]. Theyfound that small-world topology can significantly improve the performance of detection. In their improved work [19], thestudy has been further extended to the topological design for distributed inference problem.

However, to the best of our knowledge, the same issue has not yet been studied in the content of distributed estimation.Due to the aforementioned importance of network structure, it is interesting to know how the distributed estimation per-forms over different kinds of complex networks. In other words, how does the network topology affect the distributed esti-mation? This is also related to the network topological design: If the total number of links is fixed, how can we maximize theperformance of distributed estimation with a low communication cost? Should we design a regular network or a networkwith some specific topological structure? Will the performance be enhanced if a few shortcuts are added?

To answer all of these questions, a systematic study is carried out in this paper to analyze the impacts of network topol-ogy on the performance of distributed estimation. We focus on the adaptive-then-combine (ATC) diffusion LMS, which hasbeen proved to be superior to the other diffusion LMS algorithms [9]. The mean stability of this algorithm is mathematicallyanalyzed, while the mean-square performances over different network models, including the regular, the small-world[25,37], the random [14], and the scale-free [4], are compared by numerical simulations.

The rest of the paper is organized as follows. In Section 2, we briefly revisit the ATC diffusion LMS strategy. Its perfor-mances in terms of mean stability and mean-square errors are then discussed in Section 3. In Section 4, the performancesof the ATC diffusion LMS over different network models are presented and the results are discussed in detail. Finally, con-clusions are drawn in Section 5.

Notations: In what follows, we use boldface and normal letters to denote the random variables and deterministic (non-random) quantities, e.g. u and u, respectively. Capital letters denote matrices and small letters are for vectors or scalars.Note that all the vectors are column vectors except for regression vectors, which is denoted by uk,i throughout. Operators(�)⁄, �, col{�}, diag{�}, E[�], and k�k2 represent the complex conjugation, the Kronecker product, the vectorization by stackingthe specified matrix into a column vector, the (block) diagonal matrix, the expectation, and the Euclidean 2-norm, respec-tively. In denotes an n by n identity matrix. For other notations, they will be introduced if needed.

2. Distributed estimation based on diffusion LMS

Let’s consider a network of N nodes as shown in Fig. 1. At each time instant i, every node k has access to a scalar mea-surement dk(i) and an M-dimensional row regression vector uk,i. The measurements {dk(i),uk,i} follow a standard model givenby:

dkðiÞ ¼ uk;iwo þ vkðiÞ; ð1Þ

where wo is an unknown M-dimensional vector to be estimated, and vk(i) is the background noise.Our objective is to estimate the vector of interest wo from the data collected at N nodes spread in the network using the

ATC diffusion LMS, which is originally proposed by Lopes and Sayed [22]. The implementation of this algorithm is brieflyintroduced below and readers of interest can refer to Lopes and Sayed [22] for details.

Fig. 1. Distributed network with N nodes. {dk(i),uk,i} denotes the time realization for each node k.

Fig. 2. ATC diffusion LMS algorithm.

Y. Liu et al. / Information Sciences 197 (2012) 91–104 93

Fig. 2 shows schematically the cooperation strategy for the ATC diffusion LMS. Assuming that each node in the networkonly exchanges information with its neighbors (those nodes directly connected to it), the ATC diffusion LMS algorithm can beimplemented in two steps, namely adaptation and combination. In the adaptation step, each node k adaptively updates itsestimate, denoted as /k,i, with a steepest-descent implementation of the mean-square performance. Afterward, in the com-bination step, the node consults its peer nodes within its neighborhood and combines their estimates (denoted as {/l,i; l 2 Nk},where Nk is the set of nodes in the neighborhood of node k including itself) by a linear function to generate a new estimateuk,i. Mathematically, it is implemented as follows:

ATC diffusion LMS: Starting with uk,�1 = 0 for each node k, we compute:

/k;i ¼ uk;i�1 þ lku�k;iðdkðiÞ � uk;iuk;i�1Þuk;i ¼

Pl2Nk

cl;k/l;i

8<: ð2Þ

for each time instant i P 0, where lk > 0 is the step-size at node k, coefficients cl,k are real, non-negative constants satisfyingthe following conditions:

Xl2Nk

cl;k ¼ 1; and cl;k ¼ 0 for all l R Nk: ð3Þ

Coefficients cl,k govern the node’s cooperative rule, which are determined by the network topology. In regard to the com-bination protocols, several models, including the Metropolis rule [9,22], the relative degree [7], the Laplacian matrix [9], andadaptive combiners [29,34] have been suggested. In this paper, we are interested in the following Metropolis rule as it issuperior to the others [9]:

cl;k ¼ 1=maxðnk; nlÞ if l 2 Nk n k;

cl;k ¼ 1�P

l2Nknkcl;k if l ¼ k;

cl;k ¼ 0 if l R Nk;

8>><>>:

ð4Þ

where Nknk denotes the set of nodes in the neighborhood of node k excluding itself, nk and nl are the degrees for nodes k and l,respectively.

3. Performance analysis

In this section, we will analyze the performance of the ATC diffusion LMS with different network topologies under themeasures of mean stability and mean-square errors. To briefly present the convergence property of the data model in termsof global quantities, we introduce the following notations: ui

, col ui1; . . . ;ui

N

� �, /i�1

, col /i�11 ; . . . ;/i�1

N

n o,

U i , diagfu1;i; . . . ;uN;ig, di , colfd1ðiÞ; . . . ;dNðiÞg, v i , colfv1ðiÞ; . . . ;vNðiÞg, wðoÞ , colfwo; . . . ;wog. The sizes of ui, /i�1 andw(o) are NM � 1, those of di and vi are N � 1, and that of Ui is N � NM.

Based on the above quantities, the data model (1) can be rewritten as:

di ¼ U iwðoÞ þ v i; i ¼ 0;1; . . . : ð5Þ

To begin with, the followings are assumed:

(A.1) The regressor uk,i is independent identically distributed (i.i.d) in time and spatially independent, and Ru;k ¼ E½u�k;iuk;i�.(A.2) The noise vk(i) is i.i.d. zero-mean in time and spatially independent, and r2

v ;k , E½v2kðiÞ�. In addition, vk(i) is independent

of the regressor uk,i.

Remark 1. The temporal and spatial independence assumptions (A.1) and (A.2) are widely adopted in the context of adap-tive signal processing [9,16,21,22,27,30,34,38]. Although they may not be always held in practice, these assumptions are use-ful for mathematical tractability. As shown in literature [9,21,22,27,30,34], a good match between the theory andsimulations is possible for sufficiently small step-size lk. The temporal independence assumption can also be justified in


certain applications, and the noise vk,i is likely to be spatially independent of the data uk,i [16]. For example, in channelestimation [30], it is justified that the sensing noise is i.i.d. and independent of all the other data, including the regressiondata uk,i. Lastly, it should be noticed that, even if these assumptions are not satisfied, good mean-square performances forLMS algorithms are generally observed. A simulated example is given in [28], in which a colored noise vk,i consisting of anarrowband signal embedded in white noise has been applied. In Section 4, tests with dependent noises are performed soas to further justify the applicability of our approach.

3.1. Mean stability analysis

The mean stability analysis aims to find out the sufficient conditions such that the local estimate at each node convergesin the mean to the unknown parameter wo. Let D = diag{l1IM, . . . ,lNIM} be an NM-dimensional diagonal matrix collecting thelocal step-sizes and follow the similar procedures given in Refs. [9,22], we now derive the condition for ensuring the meanstability. Based on (3), the state-space model for the ATC diffusion LMS can be expressed as:

/i ¼ ui�1 þ DU�i ðdi � U iui�1Þ; ð6aÞ

ui ¼ G/i; ð6bÞ

where G = C � IM is an NM-dimensional transition matrix and C is an N-dimensional diffusion combination matrix with en-tries cl,k given in (4). Substituting (6a) into (6b), we have

ui ¼ Gui�1 þ GDU�i ðdi � U iui�1Þ: ð7Þ

Now, Let’s consider the below global error vector

~ui, wðoÞ �ui: ð8Þ

Based on the condition (3), we have CqN = qN, where qN , col{1, . . . ,1}. Given that G = C � IM, we obtain Gw(o) = w(o). Then,using (5), (7) and (8), the global error ~ui evolves according to the following recursion:

~ui ¼ G INM � DU�i U i� �

~ui�1 � GDU�i v i: ð9Þ

Based on the assumptions (A.1) and (A.2), the expectations of both sides of (9) are given by:

E½ ~ui� ¼ GðINM � DRuÞE½ ~ui�1�; ð10Þ

where Ru = diag{Ru,1, . . . ,Ru,N} is an NM � NM block diagonal matrix. To ensure the convergence of error vector E½ ~ui�, it re-quires that

kðGBÞ < 1; ð11Þ

where B = INM � DRu, and k(�) denotes the eigenvalues of a matrix in absolute form. Note that B is equivalent to the transmis-sion matrix without cooperation. In other words, the spectral radius of GB (maximum eigenvalue norm, denoted by q(GB)),should be inside the unit circle [16]. Condition (11) also indicates that the network stability in the mean convergence de-pends on the space–time data statistics (as represented by B) and the cooperation matrix C. B is usually fixed once the ob-served data are determined, while the cooperation matrix C is governed by the network topology which can be designed.

Using the matrix 2-norm, we have

kGBk2 6 kGk2 � kBk2: ð12Þ

Due to the block structure of Ru, B is Hermitian. Thus, it can be further concluded that

qðGBÞ 6 kCk2 � qðBÞ; ð13Þ

since G = C � IM. Based on the definition of the Metropolis cooperation rule given in (4), we have kCk2 6 1. Condition (13) canthen be reduced to

qðGBÞ 6 qðBÞ: ð14Þ

As indicated in (14), the spectral radius of matrix GB is in general smaller than the spectral radius of B (the case withoutcooperation), implying that the cooperative processing outperforms the non-cooperative case [9,22]. Moreover, (14) alsosuggests that the spectral radius can be served as a measure to evaluate the mean stability of different networks. The smallerthe spectral radius of q(GB), the faster the rate of convergence.

Remark 2. From the design point of view, the above analysis indicates that the problem of optimization with regard totopological design for distributed estimation is equivalent to the following spectral radius minimization problem:

Minimize qðGBÞ; subject to CqN ¼ qN: ð15Þ

Unfortunately, this problem is very difficult, as the objective function, i.e. the spectral radius of a matrix, is usuallynon-convex and even not Lipschitz continuous. Some related spectral radius minimization problems are also proved to beNP-hard [5,26]. In this paper, we consider this problem in some other way. The spectral radius is used as a measure to


evaluate the performance of distributed estimation over different network models. As a result, the impacts of network topol-ogy on distributed estimation can be revealed. In addition, it also provides some guidelines on the design of network topol-ogy so as to acquire a better performance for distributed estimation.

3.2. Mean-square performance

The performance of the diffusion LMS algorithm can also be measured by the mean-square-deviation (MSD) and the ex-cess-mean-square-error (EMSE) [7,9,34]. For each node k at time instant i, the MSD and EMSE are defined as below:

MSD : gkðiÞ ¼ Ekwo �uk;ik22 ¼ Ek~uk;ik2

2; ð16aÞEMSE : fkðiÞ ¼ Ekuk;iðwo �uk;iÞk

22 ¼ Ekuk;i ~uk;ik2

2: ð16bÞ

Consequently, the MSD and EMSE for the whole network are computed by:

MSD : gðiÞ ¼ 1N

XN

k¼1

gkðiÞ; ð17aÞ

EMSE : fðiÞ ¼ 1N

XN

k¼1

fkðiÞ: ð17bÞ

Referring to the definitions (16) and (17), it is intuitively that the MSD and EMSE of each node depend on the error vector ~uk;i

and hence those for the whole network will rely on ~ui. Noticing that the expectation of ~ui is determined by the spectralradius q(GB) as specified by Eqs. (8)–(11), it can be concluded that the mean-square errors are also related to q(GB). Sinceit is very hard, if not impossible, to measure this effect quantitatively, it is to be verified by numerical simulations and thedetails are given in the coming section.

4. Simulations

We have performed a series of simulations to investigate the performance of the diffusion LMS algorithm over differentkinds of complex networks from the viewpoint of mean stability, mean-square errors, communication cost and robustness,respectively.

4.1. Network models and data generation

The network models adopted in our study are briefly explained below:

(1) Small-world network: In order to describe the transition from a regular lattice to a random graph, Watts and Strogatzproposed an interesting small-world network model, termed as WS small-world network [37]. It is generated from aregular nearest-neighbor network consisting of N nodes arranged in a ring, and each node has 2K nearest neighbors.Links are then modified by rewiring one end to another node with a probability p while keeping another endunchanged. Nevertheless, no two nodes are allowed to be connected by more than one link. The network correspondsto the original nearest-neighbor network when p = 0, and it is almost like the ER random graph [14] when p = 1.0. Thedegree distribution of the small-world network (0 < p < 1) follows a Poisson-like distribution [36,37]. It peaks at anaverage value and decays exponentially. Such a network is also called homogenous network, as each node has nearlythe same number of link connections.

(2) Scale-free network: By considering the growth characteristics of real-world networks and the preferential attachmentof the forthcoming nodes, Barabási and Albert proposed a scale-free network model, named BA scale-free network [4]. Itstarts with a small network composing of N(0) nodes and L(0) links. A new node is then added in each step and linkedwith m existing nodes using the standard preferential attachment mechanism until a network of N nodes is obtained.After t time steps, the algorithm results in a network with N = N(0) + t nodes and L = L(0) + mt links. Studies in scale-free networks have shown that the nodes’ degrees follow a power-law distribution and thus the network is free ofcharacteristic scale [1,4]. Consequently, it is referred as heterogenous network, because most nodes have very few linkconnections except a few high-degree nodes.

In our simulations, the regression data are generated by regressors with shifted structure of size M, i.e.uk,i = col{uk(i),uk(i � 1), . . . ,uk(i �M + 1)} with uk(i) being a time series generated as follow:

ukðiÞ ¼ akukði� 1Þ þ bkzkðiÞ; for i > �1;

where ak 2 [0,1) is the correlation index, zk(i) is a spatially independent white Gaussian process with unit variance, and

bk ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffir2

u;k � 1� a2k

� �q. It is straightforward to show that the resulting regressors have Toeplitz covariance matrices Ru,k, with


correlation sequence rkðiÞ ¼ r2u;kðakÞjij, i = 0, . . . ,M � 1. The power of the regressor r2

u;k lies between (0,1]. The measurementdk(i) is generated according to model (1) with unknown M-dimensional vector set as wo ¼ colf1;1; . . . ;1g=

ffiffiffiffiffiMp

.

4.2. Distributed estimation over different network models

We now apply the ATC diffusion LMS algorithm to estimate the unknown vector wo from the data {dk(i),uk,i} across all theN nodes in different kinds of networks. The small-world networks are generated by the WS algorithm with K = 2 and p = 0.1.In this way, a total of L = 2000 links are generated. The initial regular network (p = 0) and random network (p = 1.0) are alsoused for comparison. For the BA scale-free network, we set N(0) = 5, L(0) = 5, and m = 2. After t = 995 steps, we have a total of1000 nodes and 1995 links. Since removing an edge may reduce the spectral radius of the Laplacian, which in turn improvethe asymptotic speed of convergence of the matrix GB, the same values of N and L must be used for each model for a faircomparison. Therefore, five more links are randomly added into the BA model so that the total number of links also becomes2000. For such a large network, we assume that this minor change will not affect its scale-free property.

The mean stability based on the criterion of spectral radius of the matrix GB as given in (11) is firstly investigated. In thesimulation, an i.i.d. Gaussian noise with variance r2

v ;k ¼ 5� 10�3 is considered and a small step-size lk = 0.03 is adopted. Thelength of the regressor is set as M = 3, and thus there are total 3000 eigenvalues of the matrix GB, denoted as k(GB). Fig. 3depicts these eigenvalues for different networks, where the eigenvalues of the matrix B, denoted as k(B), correspond tothe case without cooperation. The results are averaged over 200 times of independent experiments. As illustrated, we havethe spectral radii satisfy qRan < qWS < qBA < qRG < qNC < 1, and hence the convergence of the estimate in the mean sense can beassured. Moreover, it is noticed that the cooperation reduces the eigenvalues as compared with the non-cooperative scheme,confirming our analysis given in Section 3.1.

Fig. 4 plots the transient MSD (g) and the EMSE (f) of the entire network in dB for different network models. From Fig. 4,we notice that the regular network has the worst MSD performance. However, by rewiring a certain number of links, theaccuracy of the estimation is significantly improved with the WS small-world network (p = 0.1). The results are even betterwhen p = 1.0, corresponding to the random network. The performance of BA scale-free network lies somewhere between theregular and the small-world. It is also worth pointing out that the results in Figs. 3 and 4a are consistent. With the decreaseof q(GB), the convergence rate is increased and the accuracy of the estimate is improved. However, the impact of networktopology for the EMSE performance is slightly different from that for the MSD performance. From Fig. 4b, it is observed thatthe EMSE performances of the small-world and the random networks are approximately the same. The EMSE performancesof the regular and the scale-free networks are also quite close, probably due to the combination effect of the measurementsuk,i. Based on the complex network theory, all these networks have shorter average path length as compared with the regularnetworks [36]. This suggests that the average path length is one of the key factors affecting the performance of in-networkdistributed estimation. In general, the shorter the average path length, the better the performance of the distributedestimation.

Table 1 lists the means and the variances of the steady-state network MSD g;r2�g

� �and EMSE �f;r2

�f

� �for the above 200

simulations. The results are obtained by averaging the last 200 samples after 1500 iterations. Comparing the statistical re-sults for different kinds of networks, it is found that the variances of homogenous networks, including the regular, the WSsmall-world, and the random networks, are of the same level, but that of the BA scale-free network is much larger. It is be-cause most of the nodes have about the same number of neighbors in homogenous network. Therefore, their correspondingcooperation coefficients cl,k are similar, leading to a similar performance for each node and thus a smaller variance. On thecontrary, most nodes in BA scale-free network only connect to a few neighbors and hence the accuracy improvementthrough cooperation is rather weak. However, for those few nodes with large numbers of neighbors, the noise can be greatly

0 500 1000 1500 2000 2500 30000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Network eigenmodes

|λ(G

B)|,

|λ(B

)|

ρNC=0.9999

ρRG=0.9537

ρWS=0.9409

ρRan=0.9345

ρBA=0.9521

Fig. 3. The eigenvalues of different network models with N = 1000, L = 2000, M = 3, lk = 0.03 and r2v ;k ¼ 5� 10�3, and qNC denotes the spectral radius

(maximum eigenvalue norm) of the non-cooperative matrix B, while qRG, qWS, qRan and qBA denote those of diffusion cooperative matrices GB for theregular, the WS small-world, the random and the BA scale-free networks, respectively.

0 500 1000 1500−55

−50

−45

−40

−35

−30

−25

−20

Iterations

MSD

(dB)

Regular network (p=0)WS small−world network (p=0.1)Random network (p=1.0)BA scale−free network

(a)

0 500 1000 1500−55

−50

−45

−40

−35

−30

−25

−20

Iterations

EMSE

(dB)


(b)

Fig. 4. Learning behaviors of the ATC diffusion LMS algorithm over different networks with N = 1000, L = 2000, M = 3, lk = 0.03 and r2v ;k ¼ 5� 10�3. (a) MSD.

(b) EMSE.

Table 1The steady-state MSD and EMSE of the ATC diffusion LMS algorithm over different network models.

Regular WS small-world Random BA scale-free

�g ðdBÞ �48.9712 �52.4679 �54.3278 �50.0971r2

�g 1.2105 1.3864 1.6589 6.1980

�f ðdBÞ �50.5015 �55.5896 �56.7148 �50.1353

r2�f

11.1158 12.8974 15.9472 21.0175


reduced and hence a better estimation can be obtained. Therefore, the heterology in degree distribution of the scale-free net-works can cause a large variance in the mean-square errors.

To further study the feasibility of our study due to the assumptions (A.1) and (A.2), we have also carried out simulationsbased on temporally and spatially dependent noises vk,i. Similar to [32], we consider a sinusoidal perturbation with a fre-quency of 50 Hz. Referring to the ATC diffusion LMS given in (2), each node communicates directly with its neighbors andit is natural to let vk,i be correlated with the data in its neighborhood, i.e. ul,i for l 2 Nk. Therefore, vk,i is set asvk;i ¼ 0:1 sinð100p

Pl2Nk

cl;kul;iÞ. The obtained MSD is plotted in Fig. 5 with all the other parameter settings being the sameas those used in Fig. 4. For a fair comparison, the results based on an i.i.d. Gaussian noise with similar level are also pre-sented. Comparing the results in Fig. 5a and b, similar performances are observed when a sufficiently small step-size,lk = 0.03, is used.

Furthermore, we investigate the impact of the step-size lk to the performance of the estimation for different networks.Similar results are noticed for different network models, and thus only those with small-world network are presented herein

0 500 1000 1500−55

−50

−45

−40

−35

−30

−25

−20

Iterations

MSD

(dB)


(a)

0 500 1000 1500−55

−50

−45

−40

−35

−30

−25

−20

Iterations

MSD

(dB)


(b)

Fig. 5. The Learning behaviors of the ATC diffusion LMS algorithm under different conditions. (a) Temporal and spatial independence. (b) Temporal andspatial dependence.

10−3 10−2 10−1−75

−70

−65

−60

−55

−50

−45

−40

μk

MSD

(dB)

IndependenceDependence

Fig. 6. The MSD performance against the change of the step-size lk for a small-world network.


for illustration. As shown in Fig. 6, the MSD performances under the independent and dependent conditions are quite similar.This thus further justifies the applicability of our theoretical results, as discussed in Remark 1. It is also noticed that a largerlk will cause a bigger difference, and so a small step-size is necessary, agreeing with the conclusions obtained in [9,16,22,30].


In all of the following simulations, the step-size is set as lk = 0.03, and only the case with temporal and spatial independencecondition is focused. However, similar conclusions can be extended to other cases, such as temporally and spatially depen-dent data.

4.3. Effect of rewiring/adding probability of small-world networks

The results in Section 4.2 have shown that the estimation performance depends on the rewiring probability (see the re-sults for the regular, the small-world, and the random networks). To systematically investigate the effect of link rewiring/adding to distributed estimation, we have carried out a number of link-rewiring/adding experiments.

4.3.1. Link-rewiring experimentOur simulations are performed by starting with a regular network of N = 1000 nodes, where each node is coupled to its

2K = 4 nearest neighbors. The rewiring probability p is then varied from 0 (regular) to 1 (random). Fig. 7 depicts the normal-ized clustering coefficient (�p/�0) and the average path length (np/n0), where �p and np denote the results of the network witha rewiring probability p, while �0 and n0 denote those of the initial regular network. From Fig. 7, it can be observed that theinitial structured regular network (p = 0) has the highest clustering coefficient and the longest average path length. In con-trast, the random network (p = 1) has the lowest clustering coefficient and the shortest average path length. The small-worldnetworks lie somewhere between these two cases.

Similarly, the ATC diffusion LMS is applied to estimate the vector of interest wo over different network models. Their cor-responding steady-state MSD and EMSE are computed by averaging the last 200 samples after 1500 iterations over 200experiments, and the results are depicted in Fig. 8. It is interesting to notice that a similar changing trend can be observedas compared with the normalized clustering coefficient (�p/�0) given in Fig. 7. Initially, with the increase of the rewiring prob-ability p, the MSD decreases slightly (see p < 0.01); when p increases from 0.01 to 0.2, the MSD drops sharply; afterwards,further increase the value of p does not affect the estimation performance significantly. From Fig. 8, the MSD reaches its min-imum at about p = 0.4. Similar trend is also observed for the EMSE. However, instead of decreasing monotonically as in theMSD, the EMSE reaches its minimum at around p = 0.4 and then rises up slightly. These results are consistent with those of

0.0001 0.001 0.01 0.1 10

0.2

0.4

0.6

0.8

1

Link−rewiring probability p

εp/ε

0

ξp/ξ

0

Fig. 7. Normalized average clustering coefficient (�p/�0) and path length (np/n0) against different link-rewiring probabilities p. �p and np denote the resultswith a rewiring probability p, while �0 and n0 denote the results of the initial regular network.

10−4 10−3 10−2 10−1 100−61

−60

−59

−58

−57

−56

Link−rewiring probability p

Erro

r (dB

)

Network MSDNetwork EMSE

Fig. 8. The mean-square errors against different link-rewiring probabilities p with N = 1000, L = 2000, M = 3, lk = 0.03 and r2v;k ¼ 10�3.


the distributed detection reported in the small-world sensor networks [2,3]. In conclusion, the small-world topology doescontribute to enhancing the performance of distributed signal processing.

Remark 3. Although it seems in Fig. 8 that the random network (p = 1) is superior to the small-world network in terms ofthe estimation accuracy, it may not always be preferable in the real case. In practical network designs, the cost must be takeninto account. In some situations, some level of accuracy is to be sacrificed for improving the efficiency and reducing the costin return. For example, in the formation of sensor networks, the spatial distance must be considered and a shorter physicallink is always preferable because of the lower communication cost and less power consumption. As shown in Fig. 7, theclustering coefficient of the random network is quite small. That is, one’s neighbor may not communicate with its otherneighbors but a node far away. This implies that a long physical link is needed, causing a high cost in setting up the network.Moreover, the accuracy of estimates based on data from a long-distance node may also be discounted due to the effect ofchannel fading. On the other hand, though the accuracy of the small-world network is slightly inferior to the randomnetwork as shown in Fig. 8, this difference is comparative smaller than that of their clustering coefficients as given in Fig. 7.For example, the difference between the mean-square errors of the small-world network (p = 0.2) and the random network(p = 1) is less than 1 dB (see Fig. 8), while its clustering coefficient is about 90 times higher than that of the random network(see Fig. 7). Due to such a strong locality, the high cost in constructing long physical link can be eliminated. This suggests thatnot only a short path length but also a reasonable clustering coefficient should be taken into account for the design of realnetworks. In this sense, the small-world topology is somehow preferable in practice [12,17].

4.3.2. Link-adding experimentFig. 8 has demonstrated the significance of small-world topology generated by link-rewiring in performance enhance-

ment. Since the small-world effects generated by link-rewiring and link-adding are quite similar when a small operationalprobability is adopted for a large network [25], it is worth to see whether the same performance enhancement can be ob-served when a small number of shortcuts are added into the network.

To carry out the study, we perform the link-adding experiments based on the NW small-world network proposed byNewman and Watts [25], while the other parameters are just the same as those used for Fig. 8. Comparing to the WS net-work, the NW network does not rewire any connection between nodes. Instead, a link between a pair of nodes is added witha probability p. The NW small-world network corresponds to the original nearest-neighbor coupled network when p = 0, andbecomes a fully connected network when p = 1.

Fig. 9 shows the steady-state MSD and EMSE for the whole network against the link-adding probability on a semi-log plot.Roughly speaking, the changing trend is similar to the rewiring experiment, except that the reduction in mean-square errorsis much larger, probably due to the significant reduction in the average path length by adding the links. As demonstrated inFig. 9, both MSD and EMSE decrease by more than 14 dB over the initial regular network with a link-adding probabilityp = 0.2. Afterward, further increase of p does not improve the accuracy drastically.

Remark 4. Fig. 9 also indicates a tradeoff between the cost and the accuracy. For instance, a fully connected network isobviously the optimal solution, but with a high cost of message transmissions and computations. On the contrary, if nodesare all isolated without exchanging information with any other nodes in the network, no data transmission is needed but apoor estimation is observed [9]. The results in Fig. 9 suggest a balance between the accuracy and the complexity to certainextent. By adding about 20% of the total number of links (N(N � 1)/2), the estimation performance can be comparable to afully connected network, while at the same time it is more cost-effective than the latter.

10−4

10−3

10−2

10−1

100

−80

−75

−70

−65

−60

−55

Link−adding probability p

Erro

r (dB

)

Network MSDNetwork EMSE

Fig. 9. The mean-square errors against different link-adding probabilities p with N = 1000, K = 2, M = 3, lk = 0.03 and r2v ;k ¼ 10�3.


4.4. Robustness analysis

Since the phenomena of node and link failure generally exist in nature, the in-use network should be robust to these ef-fects [11]. Thus, in the followings, some analyses are performed for the study of network robustness.

4.4.1. Node failureIn a practical sensor network, a sensor may be out of power, damaged or attacked, making its measurement become unre-

liable. If this happens, the sensor will only observe the pure noise and certainly degrade the estimation performance. Werefer to this phenomenon as node failure. Once a node fails to work, it is assumed to keep silent and no information is ex-changed with the others. This is simulated by removing this node together with its links from the network. Two ways of nodefailure or node removal are studied, namely random removal and intentional removal. In random removal, randomly selectednodes are deleted from the network. When intentional removal is applied, the highest-degree nodes are removed instead.

The steady-state network MSD is used as the measure to evaluate the robustness of different networks. Fig. 10 plots theratios of g(pr)/g(0) against the process of node removal, where g(pr) denotes the MSD of the networks after removing thenodes with a probability of pr and g(0) denotes that of the original network. Obviously, in the case that pr = 1, all the nodesare removed and g(pr)/g(0) ? 0. This case is omitted here, and only the results of pr ranging from 0 to 0.9 are presented forcomparison. From Fig. 10a, we find that the homogenous networks, i.e. the random and the small-world, show similar ten-dencies for random removal while the robustness of the small-world network is slightly better. However, a big difference isnoticed when intentional removal is applied, especially when pr ranges from 0.2 to 0.7 (see Fig. 10b). It reveals that therobustness of distributed estimation can be enhanced by introducing a certain level of small-world structure into the net-work, which similar to the previous results given in [2,3] for distributed detection.

The BA scale-free network shows the so-called ‘‘robust yet fragile’’ property. On one hand, it is highly robust against ran-dom removal. As shown in Fig. 10a, the ratios g(pr)/g(0) are higher than the others especially when more nodes are removed.On the other hand, the performance of the BA scale-free network is more sensitive to intentional removal. As shown inFig. 10b, by removing only a small proportion of high-degree nodes, the estimation performance is overwhelmingly dam-aged. After removing about 40% of nodes, the performance more or less becomes that of the random network. Further node

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.5

0.6

0.7

0.8

0.9

1

Random node removal probability pr

η(p r)/ η

(0)

WS small−world network (p=0.1)Random network (p=1.0)BA scale−free network

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.5

0.6

0.7

0.8

0.9

1

Intentional node removal probability pr

η(p r)/ η

(0)


(b)

Fig. 10. Robustness of different networks against node removal with N = 1000, L = 2000, M = 3, lk = 0.03 and r2v ;k ¼ 0:01. (a) Random removal. (b)

Intentional removal. g(pr) denotes the results of the networks by removing the nodes with a probability of pr and g(0) denotes that of the original network.

0 0.2 0.4 0.6 0.8 1.0

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Link removal probability pr

η(p r)/η

(0)


Fig. 11. Robustness of different networks against link failure with N = 1000, L = 2000, M = 3, lk = 0.03, and r2v ;k ¼ 0:03. g(pr) denotes the results of the

networks after removing the nodes with a probability of pr and g(0) denotes that of the original network.


removal does not change the estimation performance a lot until all the nodes are removed. This property is due to the het-erogeneous degree distribution of the network. Those few high-degree nodes serve as ‘‘fusion centers’’ of a parallel or cen-tralized architecture. Since randomly removing a fraction of nodes is unlikely to remove such a fusion center, significantdegradation in estimation performance is not observed. In contrast, the removal of even a very small fraction of high-degreenodes will greatly reduce the number of links and seriously affect the information exchange in the network. In the worstscenario, it may result in the breakdown of the entire network.

4.4.2. Link failureDue to the adverse environmental factors, some links may become unreliable and hence affect the availability of the infor-

mation. We refer to this phenomenon as link failure. Once this occurs, the link is assumed to be broken and removed from thenetwork. Fig. 11 plots the ratios g(pr)/g(0) obtained by randomly removing links from different networks with a probabilityof pr. It can be observed that the small-world, the random and the scale-free networks share similar trend in general. All theratios g(pr)/g(0) decrease drastically with the increase of removing probability initially (when pr < 0.4). After that, the deg-radation of accuracy is obscure, and the result is close to that without cooperation (pr = 1). Similar to the node failure exper-iment, the link failure study also confirms the essence of the small-world in robustness. As shown in Fig. 11, the randomnetwork is more fragile. After introducing a certain level of locality, the small-world network becomes more robust. Thestrong robustness of the scale-free network can again be reasoned by the degree distribution. For the large-degree nodes,they provide good estimations and randomly removal of a link will only slightly affect their performances. For the nodes withsmall degrees, their original estimations are indeed not very good and may have little contribution to the noise reduction ofthe whole network. Therefore, removing these links will not significantly degrade the performance of the whole network. Asa result, the BA scale-free network turns to be more robust.

Remark 5. Although only the results for the ATC diffusion LMS algorithm have been presented, the same properties can beobserved in the other diffusion strategies, such as combine-then-adapt (CTA) diffusion LMS [9,22], and diffusion RLS [7].Since the results are similar, they are not shown here.

5. Conclusion

In this paper, we have investigated the performance of the ATC diffusion LMS algorithm over different network topologiesfrom the viewpoints of mean stability, mean-square performance, communication cost and robustness. The results haveshown that the estimation performance largely relies on the statistical properties of the networks, including the average pathlength, the clustering coefficient and the degree distribution, which are determined by network topology. It is found that: (i)Due to the short average path length, the small-world, the random and the scale-free networks outperform the structuredregular network in estimation accuracy. Similarly, by rewiring/adding a small proportion of links in regular networks, theaverage path length is reduced drastically while the estimation performance can be improved significantly; (ii) By keepinga certain locality in small-world networks, the number of long physical links can be reduced, thereby saving the resourcesand also improving the network robustness; (iii) The scale-free property in BA scale-free network makes it ‘‘robust yet frag-ile’’, i.e. robust to random node and link failures but fragile to the intentional removal of nodes. It is also worth to point outthat similar conclusions can be drawn for the cases based on data with or without the temporal and spatial independenceassumptions.


The study is also inspiring for network design, targeting for distributed estimation. As illustrated in the simulation results,each network has its own advantages and the design of network structure should be determined by what aspect is primarilyconcerned. For instance, if the high cost in constructing long physical links is not preferable, the small-world network may bea good candidate. However, if the estimation performance is critical, the random network is chosen instead. Or if it is allow-able, a small number of links can be introduced into the network to further improve the estimation accuracy. These connec-tions are not necessary to be totally random. Instead, they can be confined to some neighborhood of the nodes so as to keep acertain locality. On the other hand, under adverse or unreliable working environments, the robustness must be the main con-cern and hence the BA scale-free network can be the winner. As a conclusive remark, this study provides a good understand-ing of how network topology affects the estimation performance, and this piece of information is essential for making theoptimal choice of network design.

Acknowledgments

We gratefully acknowledge the anonymous reviewers for providing useful comments and suggestions in improving ourpaper. This work is supported by the National Natural Science Foundation of China (Grant No. 61171153 and 61101045), theFoundation for the Author of National Excellent Doctoral Dissertation of P.R. China, the Scientific Research Fund of ZhejiangProvincial Education Department (Grant No. Y201017301), Open Research Grants of the Information Processing and Auto-mation Technology Prior Discipline of Zhejiang Province and a Grant from City University of Hong Kong (Project No.7008105).

References

[1] R. Albert, A.L. Barabási, Statistical mechanics of complex networks, Rev. Mod. Phys. 74 (2002) 47–91.[2] S.A. Aldosari, J.M.F. Moura, Distributed detection in sensor networks: connectivity graph and small-world networks, in: Proc. 39th Asilomar Conf.

Signals, Syst., Comput., 2005, pp. 230–234.[3] S.A. Aldosari, J.M.F. Moura, Topology of sensor network in distributed detection, in: Proc. Int. Conf. Acoust., Speech, Signal Process (ICASSP), 2006, pp.

1061–1064.[4] A.L. Barabási, R. Albert, Emergence of scaling in random networks, Science 286 (15) (1999) 509–512.[5] V. Blondel, J.N. Tsitsiklis, NP-hardness of some linear control design problems, SIAM J. Control Optim. 35 (6) (1997) 2118–2127.[6] R.S. Blum, S.A. Kassam, H.V. Poor, Distributed detection with multiple sensors: Part II – Advanced topics, Proc. IEEE 85 (1) (1997) 64–79.[7] F.S. Cattivelli, C.G. Lopes, A.H. Sayed, Diffusion recursive least-squares for distributed estimation over adaptive networks, IEEE Trans. Signal Process. 56

(5) (2008) 1865–1877.[8] F.S. Cattivelli, C.G. Lopes, A.H. Sayed, Diffusion strategies for distributed Kalman filtering: formulation and performance analysis, in: Proc. Workshop on

Cognitive Inf. Process., Santorini, Greece, 2008, pp. 36–41.[9] F.S. Cattivelli, A.H. Sayed, Diffusion LMS strategies for distributed estimation, IEEE Trans. Signal Process. 58 (3) (2010) 1035–1048.

[10] J.F. Chamberland, V.V. Veeravalli, Decentralized detection in sensor networks, IEEE Trans. Signal Process. 51 (2) (2003) 407–416.[11] H.B. Chen, C.K. Tse, J.C. Feng, Impact of topology on performance and energy efficiency in wireless sensor networks for source extraction, IEEE Trans.

Parallel Distrib. Syst. 20 (6) (2009) 886–897.[12] R. Chitradurga, A. Helmy, Analysis of wired short cuts in wireless sensor networks, in: IEEE/ACSInternational Conference on Pervasive Services, 2004.[13] A.G. Dimakis, S. Kar, J.M.F. Moura, M.G. Rabbat, A. Scaglione, Gossip algorithms for distributed signal processing, Proc. IEEE 98 (11) (2010) 1847–1864.[14] P. Erdös, A. Rényi, On the evolution of random graphs, Publ. Math. Inst. Hung. Acad. Sci. 5 (1959) 17–60.[15] H. Goldingay, J. Mourika, The effect of load on agent-based algorithms for distributed task allocation, Inf. Sci., in press. doi:10.1016/j.ins.2011.06.011.[16] S. Haykin, Adaptive Filter Theory, Prentice-Hall, Englewood Cliffs, NJ, 2001.[17] A. Helmy, Small worlds in wireless networks, IEEE Commun. Lett. 7 (10) (2003) 490–492.[18] S. Ilarri, E. Mena, A. Illarramendi, Using cooperative mobile agents to monitor distributed and dynamic environments, Inf. Sci. 178 (9) (2008) 2105–

2127.[19] S. Kar, S.A. Aldosari, J.M.F. Moura, Topology for distributed inference on graphs, IEEE Trans. Signal Process. 56 (6) (2008) 2609–2613.[20] J. Liu, M. Chu, J.E. Reich, Multitarget tracking in distributed sensor networks, IEEE Signal Process. Mag. 24 (3) (2007) 36–46.[21] C.G. Lopes, A.H. Sayed, Incremental adaptive strategies over distributed networks, IEEE Trans. Signal Process. 55 (8) (2007) 4064–4077.[22] C.G. Lopes, A.H. Sayed, Diffusion least-mean squares over adaptive networks: formulation and performance analysis, IEEE Trans. Signal Process. 56 (7)

(2008) 3122–3136.[23] N. Moghim, S.M. Safavi, M.R. Hashemi, Performance evaluation of a new end-point admission control algorithm in NGN with improved network

utilization, Int. J. Innov. Comput. Inf. Control 6 (7) (2010) 3067–3080.[24] T. Nakashima, T. Sueyoshi, A performance simulation for stationary end nodes in ad hoc networks, Int. J. Innov. Comput. Inf. Control 5 (3) (2009) 707–

716.[25] M.E.J. Newman, D.J. Watts, Renormalization group analysis of the small-world network model, Phys. Lett. A 263 (1999) 341–346.[26] M.L. Overton, R.S. Womersley, On minimizing the spectral radius of a nonsymmetric matrix function-optimality conditions and duality theory, SIAM J.

Matrix Anal. Appl. 9 (4) (1988) 473–498.[27] A. Rastegarnia, M.A. Tinati, A. Khalihi, A diffusion least-mean-square algorithm for distributed estimation over senor networks, Int. J. Electr. Comput.

Syst. Eng. 2 (1) (2008) 15–19.[28] M. Reuter, J.R. Zeidler, Nonlinear effects in LMS adaptive equalizers, IEEE Trans. Signal Process. 47 (6) (1999) 1570–1579.[29] A.H. Sayed, C.G. Lopes, Adaptive processing over distributed networks, IEICE Trans. Fundam. Electron., Commun. Comput. Sci. 90 (8) (2007) 1504–1510.[30] A.H. Sayed, Fundamental of Adaptive Filtering, Wiley, New Jersey, 2003.[31] I.D. Schizas, G. Mateos, G.B. Giannakis, Distributed LMS for consensus-based in-network adaptive processing, IEEE Trans. Signal Process. 57 (6) (2009)

2365–2382.[32] M.S. Stankovic, K.H. Johansson, D.M. Stipanovic, Distributed seeking of Nash equilibria in mobile sensor networks, in: Proc. CDC, 2010, pp. 5598–5603.[33] W. Sung, C. Chen, Parallel data fusion for an industrial automatic monitoring system using radial basis function networks, Int. J. Innov. Comput. Inf.

Control 6 (6) (2010) 2523–2536.[34] N. Takahashi, I. Yamada, A.H. Sayed, Diffusion least-mean squares with adaptive combiners: formulation and performance analysis, IEEE Trans. Signal

Process. 58 (9) (2010) 4795–4810.[35] P.K. Varshney, Distributed Detection and Data Fusion, Springer-Verlag, New York, 1996.[36] X. Wang, G. Chen, Complex network: small-world, scale-free and beyond, IEEE Circ. Syst. Mag. 3 (2) (2003) 6–20.[37] D.J. Watts, S.H. Strogatz, Collective dynamics of ‘small world’ networks, Nature 393 (4) (1998) 440–442.



[38] B. Widrow, J.M. McCool, M.G. Larimore, C.R. Johnson, Stationary and nonstationary learning characteristics of the LMS adaptive filter, Proc. IEEE 64 (8)(1976) 1151–1162.

[39] P. Willett, P.F. Swaszek, R.S. Blum, The good, bad and ugly: distributed detection of a known signal independent Gaussian noise, IEEE Trans. SignalProcess. 48 (12) (2000) 3266–3279.