19
Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu oday’s Internet is a mas- sive, distributed network which continues to explode in size as e-commerce and related activities grow. The heterogeneous and largely unregulated structure of the Internet ren- ders tasks such as dynamic routing, optimized ser- vice provision, service-level verification, and detection of anomalous/malicious behavior increas- ingly challenging tasks. The problem is compounded by the fact that one cannot rely on the cooperation of individ- ual servers and routers to aid in the col- lection of network traffic measure- ments vital for these tasks. In many ways, network monitoring and inference prob- lems bear a strong resemblance to other “in- verse problems” in which key aspects of a sys- tem are not directly observable. Familiar signal processing problems such as tomographic image re- construction, system identification, and array process- ing all have interesting interpretations in the networking context. This article introduces the new field of network MAY 2002 IEEE SIGNAL PROCESSING MAGAZINE 47 1053-5888/02/$17.00©2002IEEE COMSTOCK, INC. 1998 T

Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

  • Upload
    ngodat

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu

oday’s Internet is a mas-sive, distributed networkwhich continues to explodein size as e-commerce and

related activities grow. The heterogeneous andlargely unregulated structure of the Internet ren-

ders tasks such as dynamic routing, optimized ser-vice provision, service-level verification, and

detection of anomalous/malicious behavior increas-ingly challenging tasks. The problem is compounded by

the fact that one cannot rely on the cooperation of individ-

ual servers and routers to aid in the col-lection of network traffic measure-ments vital for these tasks. In many ways,network monitoring and inference prob-lems bear a strong resemblance to other “in-verse problems” in which key aspects of a sys-tem are not directly observable. Familiar signalprocessing problems such as tomographic image re-construction, system identification, and array process-ing all have interesting interpretations in the networkingcontext. This article introduces the new field of network

MAY 2002 IEEE SIGNAL PROCESSING MAGAZINE 471053-5888/02/$17.00©2002IEEE

C

OM

ST

OC

K,I

NC

.199

8

T

Page 2: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

tomography, a field which we believe will benefitgreatly from the wealth of signal processing theory andalgorithms.

Introduction

The Internet has evolved from a small tightly controllednetwork serving only a few users in the late 1970s to theimmense multilayered collection of heterogeneous termi-nals, routers, and other platforms that we encounter to-day when web-surfing. Unlike the telephone networkwhich evolved in a slower and more controlled manner,the Internet has evolved very rapidly in a largely unregu-lated and open environment. The lack of centralized con-trol and the heterogeneous nature of the Internet leads toa very important problem: mapping network connectiv-ity, bandwidth, and performance as functions of spaceand time. A wide variety of Internet maps have been pro-duced using existing networking tools such as ping andtraceroute. Information on these tools, along with acollection of interesting Internet mapping projects, canbe found on the CAIDA (Cooperative Association forInternet Data Analysis) website [1]. The popular sciencebook Atlas of Cyberspace [2] contain a survey of manyInternet mapping projects and their results. The mappingtechniques described in the references above, however,usually provide only a partial picture of the Internet be-cause they do not produce quantitative performance in-formation. The decentralized nature of the Internetmakes quantitative assessment of network performancevery difficult. One cannot depend on individual serversand routers to freely transmit vital network statistics suchas traffic rates, link delays, and dropped packet rates. Thecollection of network statistics at servers and internalrouters can impose an impracticable overhead expense interms of added computing, communication, and hard-ware requirements. Even if such statistics can be col-lected, an Internet service provider (ISP) may regard suchinformation as highly confidential. Moreover, the trans-mission of statistics to a central processing point mayconsume considerable bandwidth, adding to networkload and congestion.

In certain cases, however, useful network statistics canbe indirectly acquired without special-purpose coopera-tion from servers and routers and with little or no impacton network load. These statistical quantities can revealhidden network structure and help to detect and isolatecongestion, routing faults, and anomalous traffic. The ac-quisition of the statistics relies on the application of so-phisticated methods of active network probing or passivetraffic monitoring. These methods do not directly pro-vide the desired information. The problem of extractingthe hidden information from active or passive traffic mea-surements falls in the realm of statistical inverse problems,an area which has long been of interest to signal and im-age processing researchers. Signal processing know-how,acquired in areas such as image reconstruction, patternrecognition, system identification, and sensor array signalprocessing, can provide tremendous insight into net-working inverse problems.

This article deals with network monitoring and infer-ence for wired networks such as the Internet. The word“inference” is intended to more sharply delineate the fieldof study addressed in the article, precluding approachesthat directly measure network statistics or rely on com-plete cooperation from the network. The task of inferen-tial network monitoring gives rise to problems thatinvolve the estimation of a potentially very large numberof spatially distributed parameters, e.g., link-by-link lossrates, delay distributions, connectivity, and traffic flow.To tackle such large estimation problems, researchersadopt the simplest possible models for network trafficand ignore many intricacies of packet transport such asfeedback and latency. These simpler models, although notsuitable for fine-grain analysis of individual queuingmechanisms and network traffic behavior, are generallyadequate for the inference of gross-level performancecharacteristics. Focus is shifted from detailed mathemati-cal modeling of network dynamics [3], [4] to careful han-dling of measurement and probing strategies, large-scalecomputations, and model validation. The measurementmethodologies require software tools for monitoringtraffic flow and generating probe traffic, statistical model-ing of the measurement process, and sampling strategiesfor online data collection. The underlying computationalscience involves complexity reducing hierarchical statisti-cal models, moment and likelihood based estimation; ex-pectation-maximization algorithms, Markov chainMonte Carlo algorithms, and other iterative optimizationmethods. Model validation includes study of parameteridentifiability conditions, feasibility analysis viaCramér-Rao bounds and other bounding techniques, im-plementation of network simulation software such as thens-2 network simulation environment [5], and applica-tion to real network data.

Many in the network community have long been inter-ested in measuring internal network parameters and inmathematical and statistical characterization of networkbehavior. Researchers in the fields of computer science,

48 IEEE SIGNAL PROCESSING MAGAZINE MAY 2002

The Internet has evolved from asmall tightly controlled networkserving only a few users in thelate 1970s to the immensemultilayered collection ofheterogeneous terminals,routers, and other platformsthat we encounter todaywhen web-surfing.

Page 3: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

network measurement, and network protocols have de-veloped software for measuring link delays, detecting in-truders and rogue nodes, and isolating routing tableinconsistencies and other faults. Researchers from thefields of networking, signal processing, automatic con-trol, statistics, and applied mathematics have been inter-ested in modeling the statistical behavior of networktraffic and using these models to infer data transport pa-rameters of the network. Previous work can be dividedinto three areas: i) development of software tools to mon-itor/probe the network, ii) probabilistic modeling of net-works of queues, and iii) inference from measurements ofsingle stream or multiple streams of traffic.

Computer scientists and network engineers have devel-oped many tools for active and passive measurement of thenetwork. These tools usually require extra cooperation (inaddition to the basic cooperation required for routinepacket transmission) among the nodes of the network. Forexample, in sessions running under RTCP (real-time con-trol protocol), summary sender/receiver reports on packetjitter and packet losses are distributed to all session partici-pants [6]. Active probing tools such asping, pathchar(pchar), clink, and traceroute measure and re-port packet transport attributes of the round-trip path(from sender to receiver and back) of a probe (see [1] for asurvey of these and other measurement tools). Trajectorysampling [7] is another example of an active probing soft-ware tool. These methods depend on accurate reporting byall nodes along the route and many require special assump-tions, e.g., symmetric forward/reverse links, existence ofstore-and-forward routers, nonexistence of fire walls. Asthe Internet evolves towards decentralized, uncooperative,heterogeneous administration and edge-based control,these tools will be limited in their capability. In the future,large-scale inference and tomography methods such asthose discussed here will become of increasing importancedue to their ability to deal with uncooperative networks.

Network queueing theory offers a rich mathematicalframework which can be useful for analyzing small-scalenetworks with a few interconnected servers. See recentbooks [3] and [4] for overviews of this area. The limita-tions of queuing network models for analyzing real,large-scale networks can be compared to the limited util-ity of classical Newtonian mechanics in complexlarge-scale interacting particle systems: the macroscopicbehavior of an aggregate of many atoms appears qualita-tively different from what is observed at a microscopicscale with a few isolated atomic nuclei. Furthermore, de-tailed information on queuing dynamics in the network isprobably unnecessary when, by making a few simple ap-proximations, one can obtain reasonably accurate esti-mates of average link delays, dropped packetprobabilities, and average traffic rates directly from exter-nal measurements. The much more computationally de-manding queuing network analysis becomes necessarywhen addressing a different set of problems that can besolved off-line. Such problems include calculating accu-

rate estimates of fine grain network behavior, e.g., the dy-namics of node traffic rates, service times, and queuelengths.

The area of statistical modeling of network traffic is amature and active field [8]-[12]. Sophisticated fractal andmultifractal models of single traffic streams can account forlong-range dependency, non-Gaussian distributions, andother peculiar behaviors. Such self-similar behavior of traf-fic rates has been validated for heavily loaded wired net-works [13]. For a detailed overview of these and otherstatistical traffic models we refer the reader to the compan-ion articles in this special issue. To date these models areoverly complicated to be incorporated into the large-scalenetwork inference problems discussed in this article. Sim-plifying assumptions such as spatial and temporal inde-pendence are often made to devise practical and scalableinference algorithms. By making these assumptions, a fun-damental linear observation model can be used to simplifythe inference process. While some progress has been madeon incorporating simple first order spatio-temporal de-pendency models into large-scale network inference prob-lems [14] much work remains to be done.

This article attempts to be fairly self-contained; only amodest familiarity with networking principles is requiredand basic concepts are defined as necessary. For morebackground information, some recent textbooks[15]-[21] provide an excellent introduction to the field ofnetworking. The article is organized as follows. First webriefly review the area of large-scale network inferenceand tomography. We then discuss link-level inferencefrom path measurements and focus on two examples: lossrate and delay distribution estimation. We consider theproblem of determining the connectivity structure or to-pology of a network and then turn to origin-destinationtraffic matrix inference from link measurements in thecontext of both stationary and nonstationary traffic.

MAY 2002 IEEE SIGNAL PROCESSING MAGAZINE 49

� 1. An arbitrary network topology. Each node represents a com-puter or a cluster of computers or a router. Each edge in thegraph represents a direct link between two nodes. The topol-ogy here depicts “clusters” corresponding to local area net-works or other subnetworks connected together via thenetwork “backbone.” The width of each edge reflects the band-width of the corresponding connection (thicker edge implieshigher bandwidth).

Page 4: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

Network TomographyNetwork Tomography Basics

Large-scale network inference problems can be classifiedaccording to the type of data acquisition and the perfor-mance parameters of interest. To discuss these distinc-tions, we require some basic definitions. Consider thenetwork depicted in Fig. 1. Each node represents a com-puter terminal, router, or subnetwork (consisting of mul-tiple computers/routers). A connection between twonodes is called a path. Each path consists of one or morelinks—direct connections with no intermediate nodes.The links may be unidirectional or bidirectional, depend-ing on the level of abstraction and the problem context.Each link can represent a chain of physical links connectedby intermediate routers. Messages are transmitted bysending packets of bits from a source node to a destina-tion node along a path which generally passes throughseveral other nodes.

Broadly speaking, large-scale network inference in-volves estimating network performance parametersbased on traffic measurements at a limited subset of thenodes. Vardi was one of the first to rigorously study thissort of problem and coined the term “network tomogra-phy” [22] due to the similarity between network infer-ence and medical tomography. Two forms of networktomography have been addressed in the recent literature:i) link-level parameter estimation based on end-to-end,path-level traffic measurements [23]-[32] and ii)sender-receiver path-level traffic intensity estimationbased on link-level traffic measurements [22], [33]-[36].

In link-level parameter estimation, the traffic measure-ments typically consist of counts of packets transmittedand/or received between nodes or time delays betweenpacket transmissions and receptions. The time delays aredue to both propagation delays and router processing de-lays along the path. The measured path delay is the sum ofthe delays on the links comprising the path; the link delaycomprises both the propagation delay on that link and thequeuing delay at the routers lying along that link. Apacket is dropped if it does not successfully reach the in-put buffer of the destination node. Link delays and occur-rences of dropped packets are inherently random.Random link delays can be caused by router output bufferdelays, router packet servicing delays, and propagationdelay variability. Dropped packets on a link are usuallydue to overload of the finite output buffer of one of therouters encountered when traversing the link, but mayalso be caused by equipment down-time due to mainte-nance or power failures. Random link delays and packetlosses become particularly substantial when there is alarge amount of cross-traffic competing for service byrouters along a path.

In path-level traffic intensity estimation, the measure-ments consist of counts of packets that pass throughnodes in the network. In privately owned networks, thecollection of such measurements is relatively straightfor-ward. Based on these measurements, the goal is to esti-

mate how much traffic originated from a specified nodeand was destined for a specified receiver. The combina-tion of the traffic intensities of all these origin-destinationpairs forms the origin-destination traffic matrix. In thisproblem not only are the node-level measurements inher-ently random, but the parameter to be estimated (the ori-gin-destination traffic matrix) must itself be treated not asa fixed parameter but as a random vector. Randomnessarises from the traffic generation itself, rather than pertur-bations or measurement noise.

The inherent randomness in both link-level andpath-level measurements motivates the adoption of sta-tistical methodologies for large-scale network inferenceand tomography. Many network tomography problemscan be roughly approximated by the (not necessarilyGaussian) linear model

y A= +θ ε, (1)

where y is a vector of measurements, e.g., packet countsor end-to-end delays, taken at a number of different mea-surement sites; A is a routing matrix; θ is a vector ofpacket parameters, e.g., mean delays, logarithms ofpacket transmission probabilities over a link, or the ran-dom origin-destination traffic vector; ε is a noise termwhich can result from random perturbations ofθabout itsmean value and possibly also additive noise in the mea-sured data y (in the origin-destination traffic matrix esti-mation problem ε is generally assumed to be zero.Typically, but not always, A is a binary matrix (the i j, th el-ement is equal to one or zero) that captures the topologyof the network. The problem of large-scale network infer-ence refers to the problem of estimating the network pa-rameters θ given y and either a set of assumptions on thestatistical distribution of the noiseε or the introduction ofsome form of regularization to induce identifiability. Spe-cific examples are discussed below.

What sets the large-scale network inference problem(1) apart from other network inference problems is thepotentially very large dimension of A which can rangefrom a half a dozen rows and columns for a few packetparameters and a few measurement sites in a small localarea network, to thousands or tens of thousands of rowsand columns for a moderate number of parameters andmeasurements sites in the Internet. The associatedhigh-dimensional problems of estimating θ are specificexamples of inverse problems. Inverse problems have avery extensive literature in signal processing [37], statis-tics [38], and in applied mathematics [39]. Solutionmethods for such inverse problems depend on the na-ture of the noise ε and the A matrix and typically requireiterative algorithms since they cannot be solved directly.In general, A is not full-rank, so that identifiability con-cerns arise. Either one must be content to resolve linearcombinations of the parameters or one must employ sta-tistical means to introduce regularization and induceidentifiability. Both tactics are utilized in the examples inlater sections of the article. In most of the large-scale

50 IEEE SIGNAL PROCESSING MAGAZINE MAY 2002

Page 5: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

Internet inference and tomography problems studied todate, the components of the noise vector ε are assumedto be approximately independent Gaussian, Poisson, bi-nomial, or multinomial distributed. When the noise isGaussian distributed with covariance independent ofAθ, methods such as recursive linear least squares can beimplemented using conjugate gradient, Gauss-Seidel,and other iterative equation solvers. When the noise ismodeled as Poisson, binomial, or multinomial distrib-uted more sophisticated statistical methods such asreweighted nonlinear least squares, maximum likeli-hood via expectation-maximization (EM), and maxi-mum a posteriori (MAP) via Monte Carlo Markov chain(MCMC) algorithms can be used. These approaches willbe illustrated in later sections.

Examples of Network TomographyLet us consider three concrete examples of the linearmodel (1). First, consider the problem of estimating thepacket success probability on each link given end-to-end,origin-to-destination (OD) counts of packet losses (theloss probabilities or “loss rates” are simply one minus theprobability of successful transmission.). Let θ denote thecollection of log success probabilities for each link. TheOD log success probability is simply A θ, where A is thebinary routing matrix described above. Assuming aknown number of packets sent from each source to desti-nation, a binomial measurement model can be adopted[25]. When the number of packets sent and received arelarge, then the binomial model can be approximated witha Gaussian likelihood, leading to the classical linear modelabove (1). Second, suppose that end-to-end, OD delaysare measured and the goal is estimation of the delay prob-ability distributions along each link. In this case, let θ be avector composed of the cumulant generating functions ofthe delay densities on each link. Then, with appropriateapproximation arguments [31], y is again related to θ ac-cording to the linear model (1). Third, in the OD trafficmatrix estimation case, y are link-level packet count mea-surements and θ are the OD traffic intensities. Gaussianassumptions are made on the origin-destination trafficwith a mean-variance relationship in high count situa-tions in [17] leading to the linear equation (1) withoutthe error term ε. In each of these cases, the noise ε may becorrelated and have a covariance structure depending onA and/orθ, leading to less than trivial inference problems.Moreover, in many cases the limited amount of datamakes Gaussian approximations inappropriate and dis-crete observation models (e.g., binomial) may be moreaccurate descriptions of the discrete, packetized nature ofthe data. These discrete observation models necessitatemore advanced inference tools such as the EM algorithmand Monte Carlo simulation schemes.

Let us consider two further embellishments of the ba-sic network inference problem described by the linearmodel (1). First, all quantities may, in general, betime-varying. For example, we may write

y At t t t= +θ ε , (2)

where t denotes time. The estimation problems now in-volve tracking time-varying parameters. In fact, thetime-varying scenario more accurately reflects the dy-namical nature of the true networks. There have been sev-eral efforts aimed at tracking nonstationary networkbehavior which involve analogs of classical Kalman-filter-ing methods [34], [26]. Another variation on the basicproblem (1) is obtained by assuming that the routing ma-trix A is not known precisely. This leads to the so-called“topology identification” problem [30], [40]-[44] and issomewhat akin to blind deconvolution or system identifi-cation problems.

Link-Level Network Inference

Link-level network tomography is the estimation oflink-level network parameters (loss rates, delay distribu-tions) from path-level measurements. Link-level parame-ters can be estimated from direct measurements when allnodes in a network are cooperative. Many promising toolssuch as pathchar (pchar), traceroute, clink,and pipechar use Internet control message protocol(ICMP) packets (control packets that request informationfrom routers) to estimate link-level loss, latencies, andbandwidths. However, many routers do not respond to orgenerate ICMP packets or treat them with very low prior-ity, motivating the development of large-scale link-levelnetwork inference methods that do not rely on coopera-tion (or minimize cooperation requirements).

In this article we discuss methods which require coop-eration between a subset of the nodes in the network,most commonly the edge nodes (hosts or ingress/egressrouters). Research to date has focused on the parametersof delay distributions, loss rates and bandwidths, but thegeneral problem extends to the reconstruction of otherparameters such as available bandwidths and service dis-ciplines. The Multicast-based Inference of Network-in-ternal Characteristics (MINC) Project at the Universityof Massachusetts [23] pioneered the use of multicastprobing for network tomography and stimulated muchof the current work in this area [23]-[27], [29]-[32],[45], [31].

MAY 2002 IEEE SIGNAL PROCESSING MAGAZINE 51

Unlike the telephone networkwhich evolved in a slower andmore controlled manner, theInternet has evolved very rapidlyin a largely unregulated andopen environment.

Page 6: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

We now outline a general framework for the link-leveltomography problems. Consider the network depicted inFig. 2(a). This illustrates the scenario where packets aresent from a set of sources to a number of destinations.The end-to-end (path-level) behavior can be measuredvia a coordinated measurement scheme between thesender and the receivers. The sender can record whether apacket successfully reached its destination or wasdropped/lost and determine the transmission delay byway of some form of acknowledgment from the receiverto the sender upon successful packet reception. It is as-sumed that the sender cannot directly determine the spe-cific link on which the packet was dropped nor measuredelays or bandwidths on individual links within paths. Anetwork can be logically represented by a graph consist-ing of nodes connected by links. Potentially, a logical linkconnecting two nodes represents many routers and thephysical links between them, as depicted in Fig. 2.

Each node is numbered j m=0, ,K , and each link is as-signed the number of the connected node below it. Letthere be n distinct measurement paths (from a sender to areceiver) through the network, enumerated i n=1, ,K .Define a ij to be the probability that the ith measurementpath contains the jth link. In most cases a ij will take val-ues of zero or one, but it is useful to maintain a level ofgenerality which can handle random routing. A is therouting matrix having ijth element a ij . The rows of Acorrespond to paths from the sender to the receivers andthe columns correspond to individual links in those paths.Fig. 3 illustrates a simple network consisting of a singlesender (node 0), two receivers (the leaves of the tree,nodes 2 and 3), and an internal node representing a routerat which the two communication paths diverge (node 1).Only end-to-end measurements are possible, i.e., thepaths are (0,2), and (0,3), where ( , )s t denotes the pathbetween nodes s and t. There are three links and twopaths/receivers, and therefore the matrix A is 2 3× dimen-sional and has the form

A=

1 1 01 0 1 .

(3)

Note that in this example, A is not full rank. We discussthe ramifications in later sections.

A number of key assumptions underpin currentlink-level network tomography techniques, determiningmeasurement frameworks and mathematical models. Therouting matrix is usually assumed to be known and con-stant throughout the measurement period. Although therouting tables in the Internet are periodically updated,these changes occur at intervals of several minutes. How-ever, the dynamics of the routing matrix may restrict theamount of data that can be collected and used for infer-ence. Most current methodologies usually assume thatperformance characteristics on each link are statisticallyindependent of all other links; however this assumption isclearly violated due to common cross-traffic flowingthrough the links. Assumptions of temporal stationarityare also made in many cases. In link-level delay tomogra-phy, it is generally assumed that synchronized clocks areavailable at all senders and receivers. Although many ofthe simplifying assumptions do not strictly hold, such“first-order” approximations have been shown to be rea-sonable enough for the large-scale inference problems ofinterest here [23]-[32].

There are two common modes of communication innetworks: multicast and unicast. In unicast communica-tion, each packet is sent to one and only one receiver. Inmulticast communication, the sender effectively sendseach packet to a group of subscribing receivers. At inter-nal routers where branching occurs, e.g., node 1 in Fig. 3,each multicast packet is replicated and sent along eachbranching path. We now overview the different ap-proaches to link-level network tomography that are en-abled by the two modes of communicat ion.Subsequently, we provide two detailed examples oflink-level network tomography applications.

Multicast Network TomographyNetwork tomography based on multicast probing wasone of the first approaches to the problem [24]. Considerloss rate tomography for the network depicted in Fig. 3.If a multicast packet is sent by the sender and received by

52 IEEE SIGNAL PROCESSING MAGAZINE MAY 2002

� 2. Physical and logical networks. The “cloud” indicates portions of the network that areinaccessible by direct measurement. (a) Physical structure for single sender multiple re-ceiver network. (b) Logical topology.

0

1

2 3

Link 1

Link 3Link 2

� 3. Tree-structured topology.

Page 7: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

node 2 but not by node 3, then it can be immediately de-termined that loss occurred on link 3 (successful recep-tion at node 2 implies that the multicast packet reachedthe internal node 1). By performing such measurementsrepeatedly, the rate of loss on the two links 2 and 3 can beestimated; these estimates and the measurements enablethe computation of an estimate for the loss rate on link 1.

To illustrate further, let θ1 , θ2 , and θ3 denote the logsuccess probabilities of the three links in the network,where the subscript denotes the lower node attached tothe link. Let $

|p

2 3denote the ratio of the number of

multicast packet probes simultaneously received at bothnodes 2 and 3 relative to the total number received atnode 3. Thus, $

|p

2 3is the empirical probability of success

on link 2 conditional upon success on link 3, which pro-vides a simple estimate ofθ2 . Define $

|p

3 2in a similar fash-

ion, and also let $pi , i=2 3, , denote the ratio of the totalnumber of packets received at node i over the total num-ber of multicast probes sent to node i. We can then write

log $

log $

log $

log $|

|

pppp

2

3

2 3

3 2

1 1 01 0 10

≈ 1 00 0 1

1

2

3

θθθ

.

(4)

A least squares estimate of { }θ i is easily computed forthis overdetermined system of equations. Sophisticatedand effective algorithms have been derived for large-scalenetwork tomography in [24], [46], [25], and [47].

Similar procedures can be conducted in the case of de-lay distribution tomography. There is a certain minimumpropagation delay along each measurement path, whichis assumed known. Multicast a packet from node 0 tonodes 2 and 3, and measure the delay to each receiver. Thedelay on the first link from 0 to 1 is identical for both re-ceivers, and any discrepancy in the two end-to-end delaymeasurements is solely due to a difference in the delay onlink 1 to 2 and the delay link 1 to 3. This observation al-lows us to estimate the delay distributions on each indi-vidual link. For example, if the measured end-to-enddelay to node 2 is equal to the known minimum propaga-tion delay, then any extra delay to node 3 is incurred onlink 1 to 3. Collecting delay measurements from repeatedexperiments in which the end-to-end delay to node 2 isminimal allows construction of a histogram estimate ofthe delay distribution on link 1 to 3. In larger and moregeneral trees, the estimation becomes more complicated.Advanced algorithms have been developed formulticast-based delay distribution tomography on arbi-trary tree-structured networks [29], [46].

Unicast Network TomographyAlternatively, one can tackle loss rate and delay distribu-tion tomography using unicast measurements. Unicastmeasurements are more difficult to work with thanmulticast, but since many networks do not support

multicast, unicast-based tomography is of considerablepractical interest. The difficulty of unicast-based tomog-raphy is that although single unicast packet measure-ments allow one to estimate end-to-end path loss ratesand delay distributions, there is not a unique mapping ofthese path-level parameters to the corresponding individ-ual link-by-link parameters. For example, referring againto Fig. 3, if packets are sent from node 0 to nodes 2 and 3and nk and mk denote the numbers of packets sent to andreceived by receiver node k, k=2 3, , then

log $

log $

pp

A

2

3

1

2

3

1 1 01 0 1

1 24 34

θθθ

(5)

where $ /p m nk k k= andθ j , j =1 2 3, , denotes the log successprobability associated with each link. Clearly, a unique so-lution for { }θ j does not exist since A is not full rank.

To address this challenge in unicast loss tomography,the authors of [25] and [28] independently proposedmethodologies based on measurements made usingunicast, back-to-back packet pairs. These measurementsprovide an opportunity to collect more informative statis-tics that can help to resolve the link-level loss rates and de-lay distributions. A packet pair consists of two packetssent one after the other by the sender, possibly destinedfor different receivers, but sharing a common set of linksin their paths. In networks whose queues obey a standarddroptail policy, if two back-to-back packets are sent acrossa common link and one of the pair is successfully trans-mitted across the link, then it is highly probable that theother packet is also successful. A droptail queuing policymeans that a packet is dropped by a queue only if itreaches the queue and there is insufficient space in thebuffer. In active queuing strategies, such as ran-dom-early-drop (RED) [48], packets can be droppedwith a certain probability even if they have already en-tered the queue. Similarly, the two packets in each pairwill experience roughly the same delay through sharedlinks. These observations have been verified experimen-tally in real networks [27], [49]. If one assumes that theprobability of success for one packet conditioned on thesuccess of the other is approximately unity, then the samemethodology developed for multicast-based tomography(as described above) can be employed with unicast,packet-pair measurements [27].

In the case of bandwidth tomography, the authors of[50] addressed the challenge of nonuniqueness throughclever use of the header fields of unicast packets. Thetime-to-live (TTL) field in each packet header indicateshow many hops the packet should traverse. At each routerthe packet encounters the TTL counter is decremented byone, and when the counter reaches zero the next routerdiscards the packet. The nettimer program describedin [50] uses “tailgating” to collect measurements: manypacket-pairs are sent from the source, each consisting of alarge packet followed by a small packet. The TTL field of

MAY 2002 IEEE SIGNAL PROCESSING MAGAZINE 53

Page 8: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

the large packet is varied during the measurement periodso that it is propagated through only a portion of thepath. The end-to-end delay measured by the small packet(in a relatively uncongested network) is primarily com-prised of the propagation delay experienced by the largepacket, enabling inference of the bandwidth of thesubpath traversed by the large packet. Referring to thesimple triad network in Fig. 3 for illustration, nettimermight send packet-pairs from node 0 along links 1 and 2.If the TTL of the large packet is set to one, the tailgatingsmaller packet measures the propagation delay on link 1.

Unicast measurement can be conducted either activelyor passively. In the case of active measurement, probe pack-ets are sent by the senders specifically for the purpose of es-timation. In passive monitoring, the sender extracts datafrom existing communications (e.g., records of TCP ses-sions) [47], [51]. Data transmission in the Internet is pri-marily handled by the transmission control protocol(TCP) and Internet protocol (IP). TCP/IP were devel-oped by the Department of Defense to allow cooperatingcomputers to share resources across a network. IP is re-sponsible for moving packets of data from node to node,and TCP coordinates the delivery between the sender andreceiver (server and client). Loss rate and delay distribu-tion tomography methods have been developed specifi-cally for unicast packet pairs in [14], [25], [28],and [47].Unicast packet stripes (triples, quadruples, etc.) have alsobeen investigated for loss rate tomography [27].

Example: Unicast Inference of Link Loss RatesLink loss rates can be inferred from end-to-end,path-level unicast packet measurements using the approx-imate linear model given in (1) when the numbers packetcounts are large. As stated earlier, however, the discreteprocess of counting the number of sent and receivedpackets suggests the use of a discrete probability distribu-tion in our modeling and analysis. We give a brief intro-duction and example of this approach here, and for moredetails the interested reader is referred to related papers[25], [26], and [52].

The successful traversal of a single packet across a linkcan be reasonably modeled as a sequence of Bernoullievents. Associate with the jth link in the network a singleparameter governing the Bernoulli model. This parameteris the probability (rate) of successful transmission on thelink α j . The complete set for all m logical links in the net-work is denoted byα α≡ ={ }j j

m1 , which comprise the success

rates that network loss tomography strives to identify.

Measurements are collected by sending nk single pack-ets along the path to receiver k and recording how manysuccessfully reach the destination, denoted as mk . Deter-mination of the success of a given packet is handled by anacknowledgment sent from the receiver back to thesender. For example, such acknowledgments are a built-infeature of TCP. The likelihood of mk given nk is binomial(since Bernoulli losses are assumed) and is given by

l m n pmm p pk k k

k

kkm

kn mk k k( | , ) ( )=

− −1 ,

(6)

where pk j P k j= ∈Π ( , )0 α and P k( , )0 denotes the sequence ofnodes in the path from the sender 0 to receiver k.

If the routing matrix A is full rank, then unique maxi-mum likelihood estimates of the loss rates can be formedby solving a set of linear equations. If A is not full rank,then there is no unique mapping of the path success proba-bilities to the success probabilities on individual links (be-tween routers) in the path. To overcome this difficulty,measurements are made using back-to-back packet pairs orsequences of packets as discussed above [25], [27], [28].

If two back-to-back packets are sent to node j from itsparent node ρ( )j , then define the conditional successprobability as

β ρj j≡ →Pr st packet( ( )1 j j j| ( )2nd packet ρ → ),

where ρ( )j j→ is shorthand notation denoting the suc-cessful transmission of a packet from ρ( )j to j. That is,given that the second packet of the pair is received, thenthe first packet is received with probability β j anddropped with probability 1− β j . It is anticipated thatβ j ≈1 for each j, since knowledge that the second packetwas successfully received suggests that the queue for linkj was not full when the first packet arrived. Evidence forsuch behavior has been provided by observations of theInternet [49], [53]. Denote the complete set of condi-tional success probabilities by β β≡ ={ }j j

m1 .

Suppose that each sender sends a large number ofback-to-back packet pairs in which the first packet is des-tined for one of its receivers k and the second for anotherof its receivers l. The time between pairs of packets mustbe considerably larger than the time between two packetsin each pair. Let nk l, denote the number of pairs for whichthe second packet is successfully received at node l, and letmk l, denote the number of pairs for which both the firstand second packets are received at their destinations.With this notation, the likelihood of mk l, given nk l, is bi-nomial and is given by

l m n pnm p pk l k l k l

k l

k lk l

m

k lk l( | , ) ( ), , ,

,

,, ,

,=

−1 n mk l k l, ,− ,

where pk l, is a product whose factors are β elements on theshared links and α elements on the unshared links. Theoverall likelihood function is given by

54 IEEE SIGNAL PROCESSING MAGAZINE MAY 2002

Large-scale network inferenceinvolves estimating networkperformance parameters basedon traffic measurements at alimited subset of the nodes.

Page 9: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

l m n p l m n p l m n pkk k k k lk l k l k l( | , ) ( | , ) ( | , ),, , ,≡ ×∏ ∏ . (7)

The goal is to determine the vectorsα and β that maxi-mize (7). Maximizing the likelihood function is not a sim-ple task because the individual likelihood functionsl m n pk k k( | , )or l m n pk l k l k l( | , ), , , involve products of theβ and/or α probabilities. Consequently, numericaloptimization strategies are required. The EM algorithmis an especially attractive option that offers a stable, scal-able procedure whose complexity grows linearly withnetwork dimension [25]. A closely related EM algorithmcan be employed in link-level delay density tomography[26], [54].

The link-level loss inference framework is evaluated in[47] and [52] using the ns-2 network simulation envi-ronment [5]. Measurements were collected by passivelymonitoring existing TCP connections. The experimentsinvolved simulation of the 12-node network topologyshown in Fig. 4(a), and the estimated success probabili-

ties determined using the network tomography algo-rithm above are depicted in Fig. 4. This topology reflectsthe nature of many networks—a slower entry link fromthe sender, a fast internal backbone, and then slower exitlinks to the receivers.

In the simulations, numerous short-lived TCP connec-tions were established between the source (node 0) andthe receivers (nodes 5-11). In addition, there is cross-traf-fic on internal links, such that in total there are approxi-mately 30 TCP connections and 30 user datagramprotocol (UDP) connections operating within the net-work at any one time. (UDP is a simpler protocol thanTCP. UDP simply sends packets and does not receive anacknowledgment from the receiver.) The average utiliza-tion of the network is in all cases relatively high. All theTCP connections flowing from the sender to the receiversare used when collecting packet and packet-pair measure-ments (see [47] for details on the data collection process).Measurements were collected over a 300-s interval.

MAY 2002 IEEE SIGNAL PROCESSING MAGAZINE 55

0

1

2 3

4

5 6 7 8 9 10 11

1 Mb/s

3 Mb/s1 Mb/s

1 Mb/s

0.92 Mb/s

1 Mb/s 1 Mb/s1 Mb/s

2 Mb/s

0.66 Mb/s0.8 Mb/s

0.95

0.950.95

0.95

0.950.9

0.9

0.90.9

0.9

0.90.8

True Prob.Est. Prob.

True Prob.Est. Prob.

True Prob.Est. Prob.

True Prob.Est. Prob.

True Prob.Est. Prob.

True Prob.Est. Prob.

0.02

0.020.02

0.01

0.010.01

0

00

Drop TailRED

Drop TailRED

Drop TailRED

0 3

33

5

55

7

77

9

99

11

1111

(b) Scenario 1: Heavy Losses on Links 2 and 5

(d) Scenario 3: Mixed Traffic and Medium Losses(c) Scenario 2: Heavy Losses on Links 2 and 8

(a) A network consisting of a single sender (node 0),four internal routers (nodes 1-4), and seven receivers (nodes 5-11).The bandwidth in megabits/second (Mb/s) is indicated for each link.

11

1

1

1

1

1

1

� 4. Performance of the link-level loss tomography framework examined through ns-2 simulation of the network in (a). Parts (b)-(d)show true and estimated link-level success rates of TCP flows from the source to receivers for several traffic scenarios, as labeledabove. In (b)-(d), the three panels in each display for the success probability (vertical axis) versus link 1-11 (horizontal axis): (top) anexample of true and estimated success rates with droptail queues, (middle) true and estimated success rates with RED queues, and(bottom) mean absolute error between estimated and true success rates over ten independent trials of a 300-s observation interval.

Page 10: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

The experiments were designed to ascertain whetherthe unicast link-level loss tomography framework is capa-ble of discerning where significant losses are occurringwithin the network. They assess its ability to determinehow extensive the heavy losses are and to provide accurateestimates of loss rates on the better performing links.Three traffic scenarios were explored. In Scenario 1, links2 and 5 experience substantial losses, thereby testing theframework’s ability to separate cascaded losses. In Sce-nario 2, links 2 and 8 experience substantial loss (testingthe ability to resolve distributed losses in differentbranches of the network). In Scenario 3, many moreon-off UDP and on-off TCP connections were intro-duced throughout the topology. Fig. 4 displays the simu-lation results for each of the different traffic scenarios.

Example: Unicast Inference ofLink Delay DistributionsWhen the link delays along a path are statistically inde-pendent the end-to-end delay densities are related to thelink delay densities through a convolution. Several meth-ods for unraveling this convolution from the end-to-enddensities are: 1) transformation of the convolution into amore tractable matrix operator via discretization of thedelays [26], [29], [31]; 2) estimation of low-order mo-ments such as link delay variance [46] from end-to-enddelay variances which are additive over the probe paths;3) nonparametric density estimation methods in combi-nation with EM tomography algorithms [54]; 4) estima-tion of the link delay cumulant generating function(CGF) [31], [31] from the end-to-end delay CGFs whichare also additive over the probe paths. Here we discuss theCGF estimation method from which any set of delay mo-ments can be recovered.

Let Yi denote the total end-to-end delay of a probesent along the ith probe path. Then

Y a X a X i ni i i im im= +⋅⋅⋅+ =1 1 1, ,K (8)

where X ij is the delay of the ith probe along the jth link inthe path and a ij ∈{ , }0 1 are elements of the routing matrix

A. Here { }X ij in=1 are assumed to be i.i.d. realiza-

tions of a random variable X j associated with thedelay of the jth link.

The CGF of a random variable Y is defined asK t E eY

tY( ) log [ ]= where t a real variable. When Yis a sum of a set { }X j j

m=1 of statistically independ-

ent random variables the CGF satisfies the addi-tive property K t K tY Xj

m

j( ) ( )= =∑ 1 . Therefore, in

view of the end-to-end delay representation (8)and assuming independent X Xi im1 , ,K (spatialindependence), the vector of CGFs of theend-to-end probe delays { }Yi i

m=1 of the ith probe

satisfies the linear system of equations

K KY Xt A t( ) ( )= , (9)

where the n-element and m-element vector functions of tare, respectively, K Y Y Y

Tt K t K tn

( ) [ ( ), , ( )]=1

K andK X X X

Tt K t K tm

( ) [ ( ), , ( )]=1

K .The linear equation (9) raises two issues of interest: 1)

conditions on A for identifiability of K X t( ) from K Y t( );and 2) good methods of estimation of K X t( ) fromend-to-end delay measurements Yi , i n=1, ,K .

When A is not full rank, only linear combinations ofthose link CGFs lying outside of the null space of Acan bedetermined from (9). We call such a linear combinationan identifiable subspace of CGFs. Depending on therouting matrix A, identifiable subspaces can correspondto weighted averages of CGFs α jj

m

XK tj=∑ 1 ( ) over a re-

gion of the network. This motivates a multiresolutionsuccessive refinement algorithm for detecting and isolat-ing bottlenecks, faults, or other spatially localized anoma-lies. In such an algorithm, large partially overlappingregions of the network are probed with a small number ofprobes just sufficient for each of the CGF linear combina-tions to be sensitive to anomalous behavior of the aggre-gate regional delay distributions. An example of theanomalous behaviors of interest is a sudden shift of themass of the delay distribution towards larger delay values,possibly indicating an emerging region of congestion. Ifone of the regions is identified as a potential site of anom-alous behavior, a similar probing process can be repeatedon subregions of the suspected region. This process con-tinues down to the single link level within a small regionand requires substantially fewer probe paths than wouldbe needed to identify the set of all link delay CGFs.

Estimation of the CGF vector K X t( ) from an i.i.d. se-quence of end-to-end probe delay experiments can be for-mulated as solving a least squares problem in a linearmodel analogous to (1):

$ ( ) ( ) ( )K KY Xt A t t= +ε (10)

where $K Y is an empirical estimate of the end-to-end CGFvector and ε is a residual error. Different methods of solv-ing for K X result from assuming different models for thestatistical distribution of the error residual. One model,

56 IEEE SIGNAL PROCESSING MAGAZINE MAY 2002

Probe 5

Probe 4Probe 3Probe 2Probe 1

Link 1

Link 2

Link 3

Link 4

� 5. Unicast delay estimation probe routing used in ns-2 simulation. Tail-gating can be used to emulate the internal probes 3, 4, and 5.

Page 11: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

discussed in [31], is obtained by using amethod-of-moments (MOM) estimator forK Y and invoking the property that MOM esti-mators are asymptotically Gaussian distributedas the number of experiments gets large. Thebias and covariance of $K Y can then be approxi-mated via bootstrap techniques and an approx-imate maximum likelihood estimate of K Xmay be generated by solving (10) usingiteratively reweighted least squares (LS).Using other types of estimators of K Y , e.g.,kernel-based density estimation or mixture models withknown or approximatable bias and covariance, wouldlead to different LS solutions for K X .

The ns-2 network simulator was used to perform asimulation of the four-link network shown in Fig. 5.

Each link was a drop-tail queue with buffer size of 50packets. The internal “bottleneck” link, link 3 in Fig. 5,was assigned bandwidth 5 Mb/s with latency 50 ms.Links 1, 2, and 4 were assigned bandwidths 1 Mb/s andlatency of 10 ms. The background traffic consisted ofboth exponential on-off UDP traffic and TCP traffic(links 1-4 were assigned different numbers of back-ground UDP and TCP traffic sources in UDP/TCP pro-portions 6/3, 5/2, 8/4, and 4/2, respectively). Probeswere generated as 40-byte UDP packets at each sendernode according to a Poisson process with meaninterarrival time being 16 ms and rate being 20 Kb/s. Thenumber of probes per path was 3000. Probe-derived linkCGF estimators with and without bias correction werecomputed and compared with the true link CGFs (com-puted from direct link measurements of background traf-fic alone). Differences between the true CGFs and theestimated CGFs can be attributed to both statistical esti-mation error and systematic bias due to probe-inducedperturbations of background traffic. The link CGF esti-mate without bias correction was obtained by finding theLS fit to the vector K X t( ) in relation (10) with $ ( )K Y t ob-tained by straight empirical averaging over the N =3000measured probe delays. Specifically, the ith element of$ ( )K Y t is the raw sample average $ ( )K t N eY

tYk

N

i

ik= −=∑11 ,

where { }Yik kN

=1 are the probe delays along the ith probepath among those indicated in Fig. 5. The bias correctedlink CGF was estimated using the bootstrap proceduredescribed in [31]. In this procedure we aggregated 40separate estimates of $ ( )K Y t each computed over a ran-domly selected subset of 2800 probe delays taken fromthe 3000 measured probe delays.

Fig. 6 shows the trajectories of the CGF estimates withand without bias correction in addition to the true CGFfor links 1 and 3. Table 1 shows the average squared errorper unit t of the link CGF estimates over the ranget ∈ −[ , ]200 200 . These estimates were based on applyingordinary LS to (10) with and without bootstrap bias cor-rection. Note from the table that the average MSE of thebias corrected CGF estimate is almost 9% lower than theaverage MSE incurred by the raw CGF estimate.

We next illustrate the application of the CGF estima-tion technique to bottleneck detection. Define a bottle-neck as the event that a link delay exceeds a specified delaythreshold. The Chernoff bound specifies an upper boundon the probability of bottleneck in the jth link in terms ofthe CGF

( )P X e ej tt t tX j( ) min

( )≥ ≤ >

−δ δ0

K. (11)

In Table 2, we show the estimated Chernoff bounds Pj onthe bottleneck probability P X j( )≥δ . These were esti-mated by plugging bias corrected CGF estimates into theright hand side of (11). Hereδ=0005. s. Note that the esti-

MAY 2002 IEEE SIGNAL PROCESSING MAGAZINE 57

100

100

101

10 1–

10 1–

10 2–

10 2–

10 3–

10 3–

10–4

Kt

x(

)K

tx

()

0

0

50

50

100

100

150

150

200

200

Link 1 (t)(a)

Link 3 (t)(b)

True Delay CGFBias Corrected Delay CGF EstimateBiased Delay CGF Estimate

True Delay CGFBias Corrected Delay CGF EstimateBiased Delay CGF Estimate

� 6. Estimates of the CGF function K t tX j( ), ,≥0 for links 1 and 3

compared to the true CGF function.

Table 1. MSE of $KX j(Bias Correction) and $K X j

′(No Bias Correction) for Estimated Link CGFs.

Link 3 Is Bottleneck Link.

Link 1 2 3 4

MSE of $K Xj0.000909 0.000421 0.000974 0.000325

MSE of $K Xj′ 0.001171 0.000327 0.001026 0.000363

Page 12: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

mated Chernoff bounds correctly identify the bottlenecklink (link 3) as that link having probability close to one.In particular if we set the following criterion for detectionof a bottleneck: “the probability that X j exceeds 0.005 sis at least 0.5,” we see that the estimated Chernoff boundcorrectly identifies link 3 as the bottleneck link.

Example: Topology Identification

Most of the network tomography problems addressed inearlier sections dealt with the identification of networkperformance parameters, with full knowledge of the net-work (routing) topology. The network topology is ex-pressed by the matrix A in (1). Knowledge of A is crucialfor most network tomography problems; such knowl-edge, however, is not always readily available. Most exist-ing tools for network topology mapping, such astraceroute, rely on the cooperation of routers andthus can only reveal those portions of the network that arefunctioning properly and wish to be known. These coop-erative conditions are often not met in practice and maybe increasingly uncommon as the network grows and pri-vacy and proprietary concerns increase.

For situations in which common tools such astraceroute are not applicable, a number of methodshave been proposed for the identification of network(routing) topology based on end-to-end measurementsthat measure the degree of correlation between receivers[30], [40], [41], [42]-[44]. Most of these approacheshave concentrated on identifying the tree structured to-pology connecting a single sender to multiple receivers. Itis assumed that the routes from the sender to the receiverare fixed. With only end-to-end measurements, it is onlypossible to identify the logical topology defined by thebranching points between paths to different receivers.

The key idea in most of the existing topology identifi-cation methods is to collect measurements at pairs of re-ceivers that behave (in an average sense) as a monotonic,increasing function of the number of shared links orqueues in the paths to the two receivers. A simple exam-ple is the case of delay covariance. If two receivers sharesome portion of their paths, then the covariance betweenthe end-to-end delays to the two receivers is reflective ofthe sum of the variances on the shared links (assuming thedelays are not correlated on unshared links). The moreshared links (larger shared portion of their paths), thelarger the covariance between the two.

Metrics possessing this type of monotonicity propertycan be estimated from a number of different end-to-endmeasurements including counts of losses, counts of zero

delay events (utilization), delay correlations, and delaydifferences [30], [40]-[44]. Using such metrics, topol-ogy identification can be cast as a maximum likelihood es-timation problem as follows. The estimated metricsx ≡{ },xi j , where the indices i j, refer to different pairs ofreceivers, can be interpreted as observations of the truemetric values γ γ≡{ },i j contaminated by some random-ness or noise. The estimated metrics are randomly distrib-uted according to a density (whose precise form dependson the contamination model) that is parameterized by theunderlying topology T and the set of true metric values,written as p( | , )x γ T . The estimated metrics x are fixedquantities and hence when p( | , )x γ T is viewed as a func-tion of T and γ it is called the likelihood of T and γ . Themaximum likelihood tree is given by

T TT F G* max max ( | , )= ∈ ∈arg γ γp x , (12)

where F denotes the forest of all possible tree topologiesconnecting the sender to the receivers and Gdenotes theset of all metrics satisfying the monotonicity property.

The likelihood optimization in (12) is quite formida-ble, and we are not aware of any method for computingthe global maximum except by a brute force examinationof each tree in the forest. Consider a network with N re-ceivers. A very loose lower bound on the size of the forestF is N!/ 2. For example, if N =10 then there are morethan 18 106. × trees in the forest. This explosion of thesearch space precludes the brute force approach in all butvery small (logical) networks. While determining theglobally optimal tree is prohibitive in most cases,suboptimal algorithms based on deterministic and MonteCarlo optimization methods can provide good estimatesof the topology. As far as deterministic algorithms areconcerned, the deterministic binary tree (DBT) classifica-tion algorithm proposed in [40] is a representative exam-ple. The DBT algorithm is a recursive selection andmerging/aggregation process that generates a binary treefrom the bottom-up (receivers to sender). The greedy na-ture of the DBT algorithm can lead to very suboptimal re-sults. To avoid this pitfall, an MCMC procedure has beenproposed to quickly search through the “topology space,”concentrating on regions with the highest likelihood[43]. The most advantageous attribute of the MCMCprocedure is that it attempts to identify the topologyglobally, rather than incrementally (and suboptimally) asmall piece at a time.

To illustrate the topology identification problem, con-sider the network topology depicted in Fig. 7(a). This isthe true topology connecting a sender (at Rice Univer-sity) to a number of other computers in North Americaand some in Europe. In this case, traceroutewas usedto obtain the true topology (in many cases this may not bepossible, but here it provides a convenient “ground-truth” for our experiment). End-to-end measurementsusing a special-purpose unicast probes called “sandwich”probes were used to obtain a set of metrics satisfying themonotonicity property [43]. The sandwich probing

58 IEEE SIGNAL PROCESSING MAGAZINE MAY 2002

Table 2. Estimated Chernoff Bounds Pj onP X j( . )≥ 0 005s . Bottleneck at Link 3 Is CorrectlyIdentified by Its High Probability of Large Delay.

Link 1 2 3 4

Pj 0.439 0.415 0.964 0.392

Page 13: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

scheme is delay based, but it measures only delay differ-ences, so that no clock synchronization is required. Fig.7(b) depicts the most commonly identified topology(over many different experiments on different days and atdifferent times of day). The identified topology generallyagrees with the true topology.

Origin-Destination Tomography

Origin-destination tomography is essentially the antithe-sis of link-level network tomography: the goal is the esti-mation of path-level network parameters frommeasurements made on individual links. By far the mostintensively studied origin-destination network tomogra-phy problem is the estimation of OD traffic from measur-able traffic at router interfaces. In privately ownednetworks, the collection of link traffic statistics at routerswithin the network is often a far simpler task than per-forming direct measurement of OD traffic. The OD traf-fic matrix, which indicates the intensity of traffic betweenall OD pairs in a network, is a key input to any routing al-gorithm, since the link weights of the open shortest pathfirst (OSPF) routing protocol are related to the traffic onthe paths. Ideally, a data-driven OD matrix should be cen-tral to the routing optimization program. (OSPF is alink-state routing protocol developed for IP networksthat calls for the sending of link-state advertisements to allother routers in the same hierarchical area. A link statetakes the form of a weight, effectively the cost of routingvia that link.)

There are currently two ways to obtain OD trafficcounts. Indirect methods collect sums of OD trafficcounts and are considered in [22], [33], [36], and [34]; adirect method to measure OD traffic counts via softwaresuch as NetFlow supported by Cisco routers is describedin [34], [55]. Both approaches need the cooperation ofthe routers in the network, but this is not problematic forprivately owned networks. The link traffic counts at rout-ers are much easier to collect relative to the direct ap-proach via NetFlow and lead to a linear inverse problem.There are noticeable features about this particular inverseproblem worthy of elaboration. First, the OD traffic vec-tor to be estimated is not a fixed parameter vector, but arandom vector, denoted by x; secondly, the linear equa-tion (1) is used without the error term ε (stochastic vari-ability is captured in x). Although A is singular as in othercases discussed, the techniques in [22], [33], [34], and[36] use statistical means to induce a regularization en-abling the recovery of the entire x (or the traffic intensitiesunderlying x). Moreover, the most recent work [34] ad-dressing this problem also deals with the time-varying ornonstationary aspect of the data.

Vardi was the first to investigate the OD network to-mography problem. In [22] he studies a network with ageneral topology, using an independent and identicallydistributed (i.i.d.) Poisson model for the OD traffic bytecounts. He specifies identifiability conditions under the

Poisson model and develops a method that uses the EMalgorithm on link data to estimate Poisson parameters inboth deterministic and Markov routing schemes. To miti-gate the difficulty in implementing the EM algorithm un-der the Poisson model, he proposes a moment methodfor estimation and briefly discusses the normal model asan approximation to the Poisson. Related work treatedthe special case involving a single set of link counts andalso employed an EM algorithm [36]. A Bayesian formu-lation and MCMC estimation technique has also beenproposed [33].

Cao et al. [34] use real data to revise the Poisson modeland to address the nonstationary aspect of the problem.Their methodology is validated through comparison withdirect (but expensive) collection of OD traffic. Cao et al.represent link count measurements as summations of vari-ous OD counts that were modeled as independent randomvariables. (Even though TCP feedback creates depend-ence, direct measurements of OD traffic indicate that thedependence between traffic in opposite directions is weak.This renders the independence assumption a reasonableapproximation.) Time-varying (or nonstationary) trafficmatrices estimated from a sequence of link counts werevalidated on a small subnetwork with four origins/destina-

MAY 2002 IEEE SIGNAL PROCESSING MAGAZINE 59

Rice ECE

Rice ECE

Berkeley

Berkeley

Portugal

Portugal

IST

IST

Lisboa

Lisboa

U.

U.

Wisc.

Wisc.

Illinois

Illinois

MSU

MSU

Rice

Rice

TX

IND

(a)

(b)

� 7. (a) The topology of the network used for Internet experi-ments. (b) Most commonly estimated topology using the MPLcriterion. The link between TX and IND is not detected and anextra common link is associated with the Rice clients, but oth-erwise the estimated topology is a faithful representation ofthe true topology.

Page 14: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

tions by comparing the estimates with actual OD countsthat were collected by running Cisco’s NetFlow software onthe routers. Such direct point-to-point measurements areoften not available because they require additional routerCPU resources, which can reduce packet forwarding effi-ciency, and involve a significant administrative burdenwhen used on a large scale.

Let x =( , , )x x nT

1 K denote the unobserved vector ofcorresponding byte counts for all OD pairs during a giventime interval in the network. Here T indicates transposeand x is the “traffic matrix” even though it is arranged as acolumn vector for convenience. One natural way to enu-merate all the OD variables into a vector is to first enumer-ate all the routers and then the end nodes ororigin-destination nodes by 1 through, say, I, and makethese indices blocked by routers: the end nodes connectedto the first router in the first block, and those connected tothe second router in the second block, and so forth. Then,to form the OD vector, we put the OD traffic accounts inthe order ( , ),( , ),...,( , ),( , ),( , ),...,( , ),...,(1 1 1 2 1 2 1 2 2 2I I I , ),1( , ),...,( , )I I I2 , where ( , )i j is the index of the OD trafficfrom the ith end node to the jth end node. Lety =( , , )y ym

T1 K denote the observed column vector of in-

coming/outgoing byte counts measured on each routerlink interface during a given time interval, again blockedinto first the link measurements on the interfaces of thefirst router and so on. One element of x, for example, cor-responds to the number of bytes originating from a speci-fied origin node to a specified destination node, whereasone element of y corresponds to bytes sent from the originnode regardless of their destination. Thus each element ofy is a sum of selected elements of x, so

y Ax= (13)

where A is defined as before, an m n× routing matrix ofzeros and ones that is determined by the routing schemeof the network. The orders of elements in x and y deter-mine the positions of the zeros and ones of A accordingly.The work of [34] only considers fixed routing, i.e., thereis only one route from an origin to a destination. The un-observed OD byte counts are modeled as

xi i ic~ ( , ),normal independentlyλ ϕλ , (14)

where c is a fixed power constant (its specification isfound to be robust in the sense that both c=1 and c=2work well with the Lucent network data as shown in [34,35]). This implies

y A A A~ ( , )normal λ Σ T , (15)

where

λ λ λ φ λ λ= =( , , ) , ( , , )1 1K KnT c

ncand diagΣ .

Here λ >0 is the vector of OD mean rates. φ >0 is ascale parameter that relates the variance of the counts totheir mean, since usually larger counts have larger vari-ance. The mean-variance relationship is necessary to en-sure the identifiability of the parameters in the model.Heuristically, under this constraint, the covariances be-tween the ys give the identifiability of the parameters upto the scale parameter φ which can be determined fromthe expectation of a y.

Cao et al. [34] address the nonstationarity in the datausing a local likelihood model (cf. [56]); that is, for anygiven time t, analysis is based on a likelihood function de-rived from the observations within a symmetric window ofsize w h= +2 1 around t (e.g., in the experiments describedbelow, w=11corresponds to observations within about anhour in real time). Within this window, an i.i.d. assump-tion is imposed (as a simplified and yet practical way totreat the approximately stationary observations within thewindow). Maximum-likelihood estimation (MLE) is car-ried out for the parameter estimation via a combination ofthe EM algorithm and a second-order global optimizationroutine. The component-wise conditional expectations ofthe OD traffic, given the link traffic, estimated parameters,and the positivity constraints on the OD traffic, are used asthe initial estimates of the OD traffic. The linear equationy Ax= is enforced via the iterative proportional fitting al-gorithm (cf. [57], [58]) to obtain the final estimates of theOD traffic. The positivity and the linear constraints arevery important final steps to get reliable estimates of theOD traffic, in addition to the implicit regularization intro-duced by the i.i.d. statistical model.

To smooth the parameter estimates, a state-space modelis imposed in [34] on the logarithm of the parameters λ’sand φ over the time windows of size w h= +2 1 (in our im-plementation for the simple network of Router 1, we useh=5 or w=11). Let θ λ φt t t=( , ) be the parameter vectorfor the tth time window. We assume the following randomwalk model for the evolution of the log parameters:

log( ) log( )θ θt t t= +−1 v ,

where v 0t D~ ( , )normal , independent for different t,and D is a diagonal matrix obtained using estimates of θ tin the MLE approach described earlier. Given the param-eters, the link counts are assumed i.i.d. as before:

( ,..., ,..., ) | ~ ( , )Y Y Yt h t t hT

t t tT

− + θ λi.i.d.Normal A A AΣ .

60 IEEE SIGNAL PROCESSING MAGAZINE MAY 2002

Origin-destination tomography isthe antithesis of link-levelnetwork tomography: the goal isthe estimation of path-levelnetwork parameters frommeasurements made onindividual links.

Page 15: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

This leads to a two-pass algorithm on thedata. For the second pass, inference at timet is carried out in a sequential manner. Wefirst obtain the posterior probability den-sity p t( )θ −1 based on the first t −1windowsof data, then the prior probability densityπ θ( )t is updated via the random walk equa-tion, and then the maximum a posteriorestimate of θ t via numerical optimizationusing the observations in the tth time win-dow and the prior.

This state-space model does improveon the parameter estimates, but not somuch on the estimated OD traffic x t ,which implies an insensitivity of the finalOD traffic estimates. This insensitivity orrobustness to changes in parameter estimates is proba-bly due to the fact that even in the MLE approach,positivity and linear constraints are imposed on the ODestimates, and these constraints override the improve-ments brought about by the state-space model.

Example: Time-Varying OD TrafficMatrix EstimationFig. 8 is a network at Lucent Technologies considered in[34], [35]. Figs. 9 and 10 are taken from [34]: traffic plotsonly for the subnetwork around Router 1 with four OD end

MAY 2002 IEEE SIGNAL PROCESSING MAGAZINE 61

Switch

Router 1 Router 2 Router 3 Router 4

Router 5

Gateway

Firewall

fddi

loca

lco

rp

Inte

rnet

� 8. A network at Lucent Technologies.

Xhat X

1M

1M

1M

1M

1M

800K

800K

800K

800K

800K

600K

600K

600K

600K

600K

400K

400K

400K

400K

400K

200K

200K

200K

200K

200K

0

0

0

0

0

0 0 04 4 48 8 812 12 1216 16 1620 20 2024 24 24

Hour of Day

dst fddi

byte

s/se

c

dst switch dst local dst corp Total

corp fddi→ corp→switch corp→local corp→corp ori corp

local fddi→ local→switch local→local local→corp ori local

switch fddi→ switch→switch switch→local switch→corp ori switch

fddi fddi→ fddi→switch fddi→local fddi→corp ori fddi

� 9. Full-scale time series plots of OD traffic on 22 February 1999 for Router 1 subnetwork with four origins/destinations. In thelower-left 4 4× matrix, the rows (from TOP down) correspond to corp, local, switch and fddi and the columns (from RIGHT to LEFT)correspond to corp, local, switch, and fddi. These 4 4× main panels correspond to the 16 OD pairs. For example, the (1,2) panel iscorp → switch. The eight marginal panels (above and to the right of the main matrix) are the observed link traffic used to infer the 16OD traffic pairs. The top-right corner shows the total observed link traffic. Xhat is the estimated OD traffic and X is the observed ODtraffic. At this time-scale it is impossible to differentiate between estimated and observed OD traffic in most panels of the matrix.

Page 16: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

nodes. These plots show the validation (via NetFlow) and es-timated OD traffic based on the link traffic. Fig. 9 gives thefull scale, and Fig. 10 is the zoomed-in scale (20×). It is obvi-ous that the estimated OD traffic agrees well with theNetFlow measured OD traffic for large measurements (> 50Kbyte/s), but not so well for small measurements (< 20Kbyte/s) where the Gaussian model is a poor approxima-tion. From the point of view of traffic engineering, it is ade-quate that the large traffic flows are inferred accurately.Hence for some purposes such as planning andprovisioning activities estimates of OD traffic could be usedas inexpensive substitutes for direct measurements.

Even though the method described in [34] uses allavailable information to estimate parameter values andthe OD traffic vector x, it does not scale to networks withmany nodes. In general, if there are N e end nodes, thenumber of floating point operations needed to computethe MLE is at least proportional to N e

5 . A scalable algo-rithm that relies on a divide-and-conquer strategy tolower the computational cost without losing much of theestimation efficiency is proposed in [35].

Conclusion and Future Directions

This article has provided an overview of the area oflarge-scale inference and tomography in communication

networks. As is evident from the limited scale of the simu-lations and experiments discussed in this article, the field isonly just emerging. Deploying measurement/probingschemes and inference algorithms in larger networks is thenext key step. Statistical signal processing will continue toplay an important role in this area, and here we attempt tostimulate the reader with an outline of some of the manyopen issues. These issues can be divided into extensions ofthe theory and potential networking applications areas.

The spatiotemporally stationary and independent traf-fic and network transport models have limitations, espe-cially in tomographic applications involving heavilyloaded networks. Since one of the principal applicationsof network tomography is to detect heavily loaded linksand subnets, relaxation of these assumptions continues tobe of great interest. Some recent work on relaxing spatialdependence and temporal independence has appeared inunicast [31] and multicast [24] settings. However, we arefar from the point of being able to implement flexible yettractable models which simultaneously account for longtime traffic dependence, latency, dynamic random rout-ing, and spatial dependence. As wireless links and ad hocnetworks become more prevalent spatial dependence androuting dynamics will become dominant.

Recently, there have been some preliminary attempts todeal with the time-varying, nonstationary nature of net-

62 IEEE SIGNAL PROCESSING MAGAZINE MAY 2002

Xhat X

50K

50K

50K

50K

50K

40K

40K

40K

40K

40K

30K

30K

30K

30K

30K

20K

20K

20K

20K

20K

10K

10K

10K

10K

10K

0

0

0

0

0

0 0 04 4 48 8 812 12 1216 16 1620 20 2024 24 24

Hour of Day

dst fddiby

tes/

sec

dst switch dst local dst corp Total

corp fddi→ corp→switch corp→local corp→corp ori corp

local fddi→ local→switch local→local local→corp ori local

switch fddi→ switch→switch switch→local switch→corp ori switch

fddi fddi→ fddi→switch fddi→local fddi→corp ori fddi

� 10. Time-series plots of OD traffic like in Fig. 9, except that the scale is zoomed in. At this zoomed-in time-scale it is easier to differenti-ate between estimated and observed OD traffic in most panels, particularly when there is a small traffic load.

Page 17: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

work behavior. In addition to the estimation of time-vary-ing OD traffic matrices discussed, others have adopted adynamical systems approach to handle nonstationarylink-level tomography problems [14]. Sequential MonteCarlo inference techniques are employed in [14] to tracktime-varying link delay distributions in nonstationary net-works. One common source of temporal variability inlink-level performance is the nonstationary characteristicsof cross-traffic. Fig. 11 illustrates this scenario and displaysthe estimated delay distributions at different time instances(see [14] for further details).

There is also an accelerating trend toward network se-curity that will create a highly uncooperative environ-ment for active probing: firewalls designed to protectinformation may not honor requests for routing informa-tion, special packet handling (multicast, TTL, etc.), andother network transport protocols required by many cur-

rent probing techniques. This has prompted investiga-tions into more passive traffic monitoring techniques, forexample based on sampling TCP traffic streams [47].Furthermore, the ultimate goal of carrying out networktomography on a massive scale poses a significant com-putational challenge. Decentralized processing and datafusion will probably play an important role in reducingboth the computational burden and the high communi-cations overhead of centralized data collection fromedge-nodes.

The majority of work reported to date has focused onreconstruction of network parameters which may only beindirectly related to the decision-making objectives of theend-user regarding the existence of anomalous networkconditions. An example of this is bottleneck detectionwhich has been considered in [31] and [32] as an applica-tion of reconstructed delay or loss estimation. However,systematic development of large-scale hypothesis testingtheory for networks would undoubtedly lead to superiordetection performance. Other important decision-ori-ented applications may be detection of coordinated at-tacks on network resources, network fault detection, andverification of services.

Finally the impact of network monitoring, which is thesubject of this article, on network control andprovisioning could become the application area of mostpractical importance. Admission control, flow control,service level verification, service discovery, and efficientrouting could all benefit from up-to-date and reliable in-formation about link and router level performances. Thebig question is: can signal processing methods be devel-oped which ensure accurate, robust and tractable moni-toring for the development and administration of theInternet and future networks?

Acknowledgments

This work was supported by the National Science Founda-tion Grants MIP-9701692, ANI-0099148, FD01-12731,and ANI-9734025; the Office of Naval Research GrantN00014-00-1-0390; the Army Research Office GrantsDAAD19-99-1-0290, DAAD19-01-1-0643, andDAAH04-96-1-0337; and the Department of Energy GrantDE-FC02-01ER25462. We would also like to acknowledgethe invaluable contributions of J. Cao, R. Castro, D. Davis,M. Gadhiok, R. King, E. Rombokas, C. Shih, Y. Tsang, andS. Vander Wiel to the work described in this article.

Mark Coates received the B.E. degree (first class honors)in 1995 from the University of Adelaide (Australia) andthe Ph.D. degree in electrical engineering from the Uni-versity of Cambridge (UK). He spent 2000-2001 at RiceUniversity, in the first year as the Texas Instruments post-doctoral fellow, and in the second as a research associateand lecturer. He is currently an Assistant Professor atMcGill University. His research interests lie in the area ofcomputer networks and statistical signal processing. Par-

MAY 2002 IEEE SIGNAL PROCESSING MAGAZINE 63

X-Traffic

X-Traffic

(a)

Mean Delay

time(b)

0

1

2 3

4 5 6 7

� 11. Performance of the sequential Monte Carlo tracking oftime-varying link delays from end-to-end measurements. (a)Single-source, four-receiver simulated network withnonstationary cross-traffic. (b) True delay distributions (red)and estimates (blue) as a function of time.

Page 18: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

ticular focuses include network tomography, security, dis-tributed control and estimation, sequential Monte Carlomethods, and time-frequency analysis. He is a member ofthe IEEE.

Alfred O. Hero III received the B.S. (summa cum laude)from Boston University (1980) and the Ph.D fromPrinceton University (1984), both in electrical engineer-ing. Since 1984 he has been a Professor with the Univer-sity of Michigan, Ann Arbor, where he has appointmentsin the Departments of Electrical Engineering and Com-puter Science, Biomedical Engineering, and Statistics.His research interests include estimation and detection,digital communications, bioinformatics, signal process-ing, and image processing. He was an Associate Editorfor the IEEE Transactions on Information Theory and theIEEE Transactions on Signal Processing. He has served onmany committees, including Chairman of the StatisticalSignal and Array Processing (SSAP) Technical Commit-tee, Treasurer of the Conference Board of the IEEE SignalProcessing Society, and 2002 Workshop on Genomic Sig-nal Processing and Statistics (GENSIPS). He is currentlya member of the Signal Processing Theory and Methods(SPTM) Technical Committee and Vice President (Fi-nance) of the IEEE Signal Processing Society. He is Chairof Commission C of the U.S. National Commission ofthe International Union of Radio Science. He is a Fellowof the IEEE, a member of Tau Beta Pi, the American Sta-tistical Association, the Society for Industrial and Ap-plied Mathematics, and the U.S. National Commission(Commission C) of the International Union of RadioScience. He received the 1998 IEEE Signal ProcessingSociety Meritorious Service Award, the 1998 IEEE Sig-nal Processing Society Best Paper Award, and the IEEEThird Millennium Medal.

Robert Nowak received the B.S. (with highest distinction),M.S., and Ph.D. degrees in electrical engineering fromthe University of Wisconsin-Madison in 1990, 1992, and1995, respectively. He spent several summers with Gen-eral Electric Medical Systems’ Applied Science Labora-tory, where he received the General Electric Genius ofInvention Award and a U.S. patent for his work in 3-Dcomputed tomography. He was an Assistant Professor atMichigan State University from 1996-1999. He is nowan Assistant Professor at Rice University, Houston,Texas. His research interests include statistical image andsignal processing, multiscale analysis, medical imaging,and communication networks. He received the NationalScience Foundation CAREER Award in 1997, the ArmyResearch Office Young Investigator Program Award in1999, the Office of Naval Research Young InvestigatorProgram Award in 2000, and IEEE Signal Processing So-ciety Young Author Best Paper Award in 2000.

Bin Yu is Professor of Statistics at University of Califor-nia, Berkeley, where she teaches and conducts research instatistics, information theory/communications,

bioinformatics, and remote sensing. She received the B.S.degree in mathematics from Peking University, China, in1984 and the M.S. and Ph.D. degrees in statistics fromthe University of California, Berkeley, in 1987 and 1990,respectively. Before joining the Berkeley faculty in 1993,she held faculty positions at the University of Wisconsin,Madison, and Yale University. She is a Fellow of IEEEand The Institute of Mathematical Statistics (IMS) andwas an IMS Special Invited Lecturer (now MedallionLecturer) in 1999. She is an Associate Editor for The An-nals of Statistics and Statistica Sinica, an Action Editor forthe Journal of Machine Learning Research, and is on theBoard of IEEE Information Theory Society and on theCouncil of IMS. She is the co-editor of a special issue onbioinformatics for Statistica Sinica.

References1] CAIDA: Cooperative Association for Internet Data Analysis. Available:

http://www.caida.org/Tools/.

[2] M. Dodge and R. Kitchin, Atlas of Cyberspace. Reading, MA: Addi-son-Wesley, 2001.

[3] F.P. Kelly, S. Zachary, and I. Ziedins, Stochastic Networks: Theory and Appli-cations (Royal Statistical Society Lecture Note Series). Oxford, U.K.: Ox-ford Science Publications, 1996.

[4] X. Chao, M. Miyazawa, and M. Pinedo, Queuing Networks: Customers, Sig-nals and Product Form Solutions (Systems and Optimization). New York:Wiley, 1999.

[5] UCB/LBNL/VINT network simulator ns (version 2). Available:http://www.isi.edu/nsnam/ns/.

[6] RTP: A transport protocol for real-time applications, Jan. 1996. IETFInternet request for comments: RFC 1889 .

[7] N. Duffield and M. Grossglauser, “Trajectory sampling for direct traffic ob-servation,” in Proc. ACM SIGCOMM 2000, Stockholm, Sweden, Aug.2000, pp. 271-282 .

[8] W. Leland, M. Taqqu, W. Willinger, and D. Wilson, “On the self-similarnature of Ethernet traffic” (extended version), IEEE/ACM Trans. Net-working, vol. 2, pp. 1-15, Feb. 1994.

[9] V. Paxson, “End-to-end Internet packet dynamics,” in Proc. ACMSIGCOMM, Cannes, France, 1997, pp. 139-152.

[10] R. Riedi, M.S. Crouse, V. Ribeiro, and R.G. Baraniuk, “A multifractalwavelet model with application to TCP network traffic,” IEEE Trans. In-form. Theory (Special Issue on Multiscale Statistical Signal Analysis and its Ap-plications), vol. 45, pp. 992-1018, Apr. 1999.

[11] A. Feldmann, A.C. Gilbert, P. Huang, and W. Willinger, “Dynamics of IPtraffic: A study of the role of variability and the impact of control,” in Proc.ACM SIGCOMM, Cambridge, MA, 1999, pp. 301-313.

[12] A.C. Gilbert, W. Willinger, and A. Feldmann, “Scaling analysis of conser-vative cascades, with applications to network traffic,” IEEE Trans. Inform.Theory, vol. IT-45, pp. 971-991, Mar. 1999.

[13] A. Veres, Z. Kenesi, S. Molnár, and G. Vattay, “On the propagation oflong-range dependence in the Internet,” in Proc. ACM SIGCOMM 2000,Stockholm, Sweden, Aug. 2000, pp. 243-256.

[14] M. Coates and R. Nowak, “Sequential Monte Carlo inference of internaldelays in nonstationary communication networks,” IEEE Trans. Signal Pro-cessing, vol. 50, pp. 366-376, Feb. 2002.

[15] J. Kurose and K. Ross, Computer Networking: A Top-Down Approach Fea-turing the Internet. Reading, MA: Addison-Wesley, 2001.

64 IEEE SIGNAL PROCESSING MAGAZINE MAY 2002

Page 19: Mark Coates, Alfred O. Hero III, Robert Nowak, and Bin Yu Tnetworks.ece.mcgill.ca/sites/default/files/coates_SPMagazine02.pdf · Mark Coates, Alfred O. Hero III, Robert Nowak, and

[16] C. Huitema, Routing in the Internet. Englewood Cliffs, NJ: Prentice-Hall,2000.

[17] S. Keshav, An Engineering Approach to Computer Networking. Reading,MA: Addison-Wesley, 1997.

[18] L. Peterson and B. Davie, Computer Networks: A Systems Approach. SanMateo, CA: Morgan Kaufmann, 2000.

[19] W. Stevens, UNIX Network Programming, vol. 1: Networking APIs: Socketsand XTI. Englewood Cliffs, NJ: Prentice-Hall, 1997.

[20] A. Tanenbaum, Computer Networks. Englewood Cliffs, NJ: Prentice-Hall,1996.

[21] W. Stevens and G. Wright, TCP/IP Illustrated. Reading, MA: Addi-son-Wesley, 2002.

[22] Y. Vardi, “Network tomography: Estimating source-destination traffic in-tensities from link data,” J. Amer. Stat. Assoc., vol. 91, no. 433, pp.365-377, 1996.

[23] Multicast-based inference of network-internal characteristics (MINC).Available: http://gaia.cs.umass.edu/minc.

[24] R. Cáceres, N. Duffield, J. Horowitz, and D. Towsley, “Multicast-basedinference of network-internal loss characteristics,” IEEE Trans. Inform. The-ory, vol. 45, pp. 2462-2480, Nov. 1999.

[25] M. Coates and R. Nowak, “Network loss inference using unicastend-to-end measurement,” in ITC Seminar on IP Traffic, Measurement andModelling, Monterey, CA, Sep. 2000, pp 28:1-9.

[26] M. Coates and R. Nowak, “Network tomography for internal delay esti-mation,” in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., May2001, pp. 3409-3412.

[27] N.G. Duffield, F. Lo Presti, V. Paxson, and D. Towsley, “Inferring linkloss using striped unicast probes,” in Proc. IEEE INFOCOM 2001, Anchor-age, Alaska, Apr. 2001, nol. 2, pp. 915-923.

[28] A. Bestavros, K. Harfoush, and J. Byers, “Robust identification of sharedlosses using end-to-end unicast probes,” in Proc. IEEE Int. Conf. NetworkProtocols, Osaka, Japan, Nov. 2000, pp. 22-33.

[29] F. Lo Presti, N.G. Duffield, J. Horowitz, and D. Towsley,“Multicast-based inference of network-internal delay distributions,” Univ.Massachusetts, Amherst, MA, Tech. Rep. 99-55, 1999.

[30] S. Ratnasamy and S. McCanne, “Inference of multicast routing trees andbottleneck bandwidths using end-to-end measurements,” in Proc. IEEEINFOCOM 1999, New York, Mar. 1999, vol. 1, pp. 353-360.

[31] M.F. Shih and A.O. Hero, “Unicast inference of network link delay distri-butions from edge measurements,” in Proc. IEEE Int. Conf. Acoust., Speech,and Signal Processing., Salt Lake City, UT, May 2001, pp. 3421-3424.

[32] A.-G. Ziotopolous, A.O. Hero, and K. Wasserman, “Estimation of net-work link loss rates via chaining in multicast trees,” in Proc. IEEE Int. Conf.Acoust., Speech, and Signal Proc., Salt Lake City, UT, May 2001, pp.2517-2520.

[33] C. Tebaldi and M. West, “Bayesian inference on network traffic using linkcount data (with discussion),” J. Amer. Statist. Assoc., vol. 93, no. 442, pp.557-576, June 1998.

[34] J. Cao, D. Davis, S. Vander Wiel, and B. Yu, “Time-varying network to-mography: Router link data,” J. Amer. Statist. Assoc, vol. 95, pp.1063-1075, 2000.

[35] J. Cao, S. Vander Wiel, B. Yu, and Z. Zhu, “A scalable method for esti-mating network traffic matrices from link counts.” Available:http://www.stat.berkeley.edu/~binyu/publications.html, 2000.

[36] R.J. Vanderbei and J. Iannone, “An EM approach to OD matrix estima-tion,” Princeton University, Princeton, NJ, Tech. Rep. SOR 94-04, 1994.

[37] R.J. Mammone, “Inverse problems and signal processing,” in The DigitalSignal Processing Handbook. Boca Raton, FL: CRC Press, 1998.

[38] F. O’Sullivan, “A statistical perspective on ill-posed inverse problems,”Statistical Science, vol. 1, no. 4, pp. 502-527, 1986.

[39] F. Natterer, The Mathematics of Computerized Tomography. New York:Wiley, 1986.

[40] N.G. Duffield, J. Horowitz, F. Lo Presti, and D. Towsley, “Multicast to-pology inference from end-to-end measurements,” in ITC Seminar on IPTraffic, Measurement and Modelling, Monterey, CA, Sept. 2000, pp27:1-10.

[41] N.G. Duffield, J. Horowitz, F. Lo Presti, and D. Towsley, “Multicast to-pology inference from measured end-to-end loss,” IEEE Trans. Inform. The-ory, vol. 48, pp. 26-45, Jan. 2002.

[42] A. Bestavros, J. Byers, and K. Harfoush, “Inference and labeling of met-ric-induced network topologies,” Computer Science Department, BostonUniv. Boston, MA, Tech. Rep. BUCS-2001-010, June 2001.

[43] R. Castro, M.J. Coates, M. Gadhiok, R. King, R. Nowak, E. Rombokas,and Y. Tsang, “Maximum likelihood network topology identification fromedge-based unicast measurements,” Dept. Electrical and Computer Engi-neering, Rice Univ., Houston, TX, Tech. Rep. TREE0107, Oct. 2001.

[44] R. Castro, M. Coates, and R. Nowak, “Maximum likelihood identificationof network topology from end-to-end measurement,” Rice Univ., Hous-ton, TX, Tech. Rep. TREE0109, Feb. 2002.

[45] M. Coates and R. Nowak, “Networks for networks: Internet analysis us-ing Bayesian graphical models,” in Proc. 2000 IEEE Neural Network for Sig-nal Processing Workshop, vol. 2. Sydney, Australia, Dec. 2000, pp. 755-764.

[46] N. Duffield and F. Lo Presti, “Multicast inference of packet delay varianceat interior network links,” in Proc. IEEE INFOCOM 2000, vol. 3. Tel Aviv,Israel, Mar. 2000, pp. 1351-1360.

[47] Y. Tsang, M. Coates, and R. Nowak, “Passive network tomography usingEM algorithms,” in Proc. 2001 IEEE Int. Conf. Acoust., Speech, and SignalProcessing, vol. 3, May 2001, pp. 1469-1472.

[48] Recommendations on queue management and congestion avoidance inthe Internet, Apr. 1998. IETF Internet Request for Comments: RFC 2309

[49] V. Paxson, “End-to-end Internet packet dynamics,” IEEE/ACM Trans.Networking, vol. 7, pp. 277-292, June 1999.

[50] K. Lai and M. Baker, “Measuring link bandwidths using a deterministicmodel of packet delay,” in Proc. ACM SIGCOMM 2000, Stockholm, Swe-den, Aug. 2000 .

[51] Y. Tsang, M. Coates, and R. Nowak, “Passive unicast network tomogra-phy based on tcp monitoring,” Rice Univ., Houston, TX, Tech. Rep.TREE-05, 2000.

[52] R. Nowak and M. Coates, “Unicast network tomography using the EMalgorithm,” Rice Univ., Houston, TX, Tech. Rep. TREE0108, Dec. 200`

[53] J-C. Bolot, “End-to-end packet delay and loss behaviour in the Internet,”in Proc. ACM SIGCOMM 1993, SanFrancisco, CA, Sept. 1993, pp.289-298.

[54] Y. Tsang, M. Coates, and R. Nowak, “Nonparametric Internet tomogra-phy,” in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, May2002.

[55] A. Feldmann, A. Greenberg, C. Lund, N. Reingold, J. Rexford, and F.True, “Deriving traffic demands for operational IP networks: Methodologyand experience,” in Proc. ACM SIGCOMM 2000, Stockholm, Sweden,Aug. 2000, pp. 257-270.

[56] C. Loader, Local Regression and Likelihood. New York: Springer-Verlag,1999.

[57] W.E. Deming and F.F. Stephan, “On a least squares adjustment of a sam-pled frequency table when the expected marginal totals are known,” Ann.Math. Statist., vol. 11, pp. 427-444, 1940.

[58] I. Csiszáar, “Divergence geometry of probability distributions andminimization problems,” Ann. Prob., vol. 3, no. 1, pp. 146-158, 1975.

MAY 2002 IEEE SIGNAL PROCESSING MAGAZINE 65