12

P AM 2002: Harfoush, Bestavr os, and Byers, Periscope n A ... · justi ed b y its imp ortance for a v ariet y of net w ork-a are appli-cations. One widely adopted strategy to c harac-terize

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: P AM 2002: Harfoush, Bestavr os, and Byers, Periscope n A ... · justi ed b y its imp ortance for a v ariet y of net w ork-a are appli-cations. One widely adopted strategy to c harac-terize

PAM 2002: Harfoush, Bestavros, and Byers, Periscope: An Active Measurement API 1

PeriScope

An Active Measurement API

Khaled Harfoush Azer Bestavros John [email protected] [email protected] [email protected]

Computer Science DepartmentBoston UniversityBoston, MA 02215

Abstract|Growing interest in inference and pre-diction of network characteristics is justi�ed by itsimportance for a variety of network-aware appli-cations. One widely adopted strategy to charac-terize network conditions relies on active, end-to-end

probing of the network. Active end-to-end probingtechniques di�er in (1) the structural compositionof the probes they use (e.g., number and size ofpackets, the destination of various packets, the pro-tocols used, etc.), (2) the entity making the mea-surements (e.g. sender vs. receiver), and (3) thetechniques used to combine measurements in or-der to infer speci�c metrics of interest. In thispaper, we present Periscope: a Linux API thatenables the de�nition of new probing structuresand inference techniques from user space througha exible interface. PeriScope requires no sup-port from clients beyond the ability to respond toICMP ECHO REQUESTs and is designed to minimizeuser/kernel crossings and to ensure various con-straints (e.g., back-to-back packet transmissions,�ne-grained timing measurements) We show howto use Periscope for two di�erent probing purposes,namely the measurement of shared packet lossesbetween pairs of endpoints and for the measure-ment of subpath bandwidth. Results from Internetexperiments for both of these goals are also pre-sented.

Keywords| End-to-end measurement; loss rate;bottleneck bandwidth; Bayesian probing; Car-touche probing; network-aware applications.

I. Introduction

Measurement of various network characteristics (e.g.loss, delay, bandwidth) is crucial for many Internetapplications and protocols, especially those involv-ing the transfer of large �les and those involving thedelivery of content with real-time QoS constraints,

This work was partially supported by NSF research grantsANI-9986397, ANI-0095988 and CAREER ANI-0093296.

such as streaming media. Examples of the impor-tance of bandwidth estimation include request routingprotocols in Content Distribution Networks (CDNs)[1] or in Peer-to-Peer (P2P) networks [33], network-aware cache/replica placement and maintenance poli-cies [20], [31], ow scheduling and admission controlpolicies at massively-accessed content servers [9], end-system multicast and overlay network recon�gurationprotocols [8], [19], [2], among many others.

Current network characteristics inference strategiescan be classi�ed into two broad categories: The �rststrategy is to mine the data collected by network in-ternal resources, such as BGP routing tables, to gen-erate performance reports [16], [25], [7], [17]. This ap-proach is best applied over long-time scales to produceaggregated analyses such as Internet weather reports,but does not lend itself well to providing the timelyrequirements of network-aware applications. The sec-ond strategy is statistical inference of network internalcharacteristics based on measurements obtained fromprobing network resources (routers and/or endhosts)[5], [6], [34], [21], [32], [29], [26]. Using this strategy,information is gathered at the appropriate time scaleto address timely network-aware application require-ments. A probing approach can be further classi�edas active, i.e. it introduces additional probe traÆcinto the network, and passive, i.e. it makes inferencesonly from existing network traÆc. The bene�t of theformer is exibility: one can make measurements atthose locations and times which are most valuable;while the bene�t of the latter approach is that noadditional bandwidth and network resources are con-sumed just for the purpose of data collection.

Paper Contributions: In this paper, we adopt theactive probing strategy. Speci�cally, we present aLinux API (called Periscope1) that implements the

1Probing Engine for the Recovery of Internet Subgraphs

Page 2: P AM 2002: Harfoush, Bestavr os, and Byers, Periscope n A ... · justi ed b y its imp ortance for a v ariet y of net w ork-a are appli-cations. One widely adopted strategy to c harac-terize

PAM 2002: Harfoush, Bestavros, and Byers, Periscope: An Active Measurement API 2

functionality necessary to de�ne, activate and col-lect measurements for arbitrary probing structures.To the best of our knowledge, there is no publiclyavailable framework that has Periscope exibilityin de�ning arbitrary probing structures. For exam-ple, by de�ning a probing structure consisting of apair of probe packets of the the same size destinedto the same endpoint, Periscope can infer the bot-tleneck bandwidth along a path [7], [27], [29], [28].By directing the probe packets to di�erent endpoints,Periscope can infer the loss rate along the sharedpath, induced through IP routing, connecting theprobing server to the endpoints [14]. By de�ninga probing structure consisting of a pair of di�erentsize probe packets, Periscope can infer the capac-ity bandwidth of every physical link along the path[22], [23]. Also, by de�ning a probing structure con-sisting of a sequence of di�erent size and destinationpacket-pairs, Periscope can eÆciently infer the bot-tleneck bandwidth along arbitrary path segments andalong the shared paths between di�erent endpoints[15]. Periscope can also easily be extended to pro-vide additional functionality.

Using the Periscope API, we have developed atool to infer and label the loss, delay and band-width topologies connecting a server to a set ofclients. These topologies are instantiations of theMetric-Induced Network Topologies (MINT) frame-work, which aims at providing compact and ef-�cient respresentations of network resources. Ametric-induced topology is a labelled logical topology,parametrized with a sensitivity parameter c which isthe minimum value of a label which can be appliedto an internal link in the topology. For more detailsabout the MINT framework refer to [4].

Ideally, Periscope functionality should be in-stalled at both the server and the client sides. Client-side Periscope installation is only intended to in-tercept sever probe packets and record their relevantcharacteristics. However, acquiring control over allpotential clients is not always possible. It is thus de-sirable to communicate probe packets characteristicsfrom the clients back to the server side which couldbe done through replies to ICMP ECHO REQUESTprobe packets. The problem with this approach isthat the expected characteristics of the probe packetsthat are delivered to the clients may be altered eitherat the client itself or over the back channel, on theway back to the server. This may result in inaccurateestimates which is the price one has to pay if we areto deploy Periscope only at the server side.

In this paper we describe the details of the server-side Periscope architecture which, as discussedabove, is a superset of the client-side architecture.

Paper Overview: The rest of this paper is organizedas follows: In sections II and III, we review existingprobing literature and provide the probing terminol-ogy that we use throughout the paper. In sectionIV, we discuss the rationale behind Periscope designand in sections V and VI, we describe Periscope ar-chitecture and its user level API. In section VII, wedescribe some of our experiences with Internet valida-tion of Periscope, drawing on examples taken froma large set of experiments that are aimed at inferringfrom end-to-end measurements both the loss rate andthe bottleneck bandwidth along the shared path con-necting a server to a set of clients.

II. Related Work

As noted in the introduction, Periscope relies onactively probing network resources in order to esti-mate network properties. In general, there are twoapproaches to implement probing; we can do it (1) inuser space [24], [11], [23] or (2) in the kernel [12].

Kernel support is desirable because probing tech-niques require quality of service guarantees from theprobing host which cannot be provided from userspace. For example, the uniform packet-pair prob-ing technique requires that packets of each pair beinjected back-to-back in the network, and we discusswhy this cannot be done easily from user-space.

Acquiring probing functionalities in the kernel canbe achieved using two possible approaches: (a) hand-code kernel functionality [12], or (b) deploy kernelfunctionality from user space using Operating Systemabstractions (e.g. QLinux [30] or Dionisys [10]).

Hand-coding the kernel su�ers from lack of exibil-ity. The obtained functionality is usually not reusablefor di�erent probing techniques. Also, using systemabstractions in an environment running an operatingsystem with quality of service guarantees is costly.These systems are also inherently complex.

While Periscope is implemented in the kernel, it isnot intended to be a tool to estimate a speci�c prop-erty. Instead, it is designed to be a programmableprobing framework suitable for any inference tech-nique that relies on active probing. By implement-ing a rich set of basic probing functions in the kernel,Periscope can provide the necessary quality of ser-vice guarantees while avoiding the complexity of gen-

Page 3: P AM 2002: Harfoush, Bestavr os, and Byers, Periscope n A ... · justi ed b y its imp ortance for a v ariet y of net w ork-a are appli-cations. One widely adopted strategy to c harac-terize

PAM 2002: Harfoush, Bestavros, and Byers, Periscope: An Active Measurement API 3

eral purpose frameworks. By providing a generic userspace API, Periscope provides users with the exi-bility to de�ne and activate arbitrary probing struc-tures.

III. Probing Terminology

In this section, we describe the basic terminologyfor the various probing sequences implemented inPeriscope, and which we use throughout the restof the paper.

For the purposes of this paper, a probe is a sequenceof one or more packets transmitted from a commonorigin. We say that any contiguous subsequence ofpackets within a probe are transmitted back-to-backif there is no time separation between transmissionof the individual packets within the subsequence. Amulti-destination probe is one in which the constituentpackets of the probe do not all target the same des-tination IP address. Multi-destination probes havebegun to see wider use as emulations of notional mul-ticast packets|many of the same end-to-end infer-ences that can be made with multicast packets canbe made with multi-destination unicast probes (albeitwith added complexity) [13], [14]. A uniform probeis one in which all of the constituent packets are ofthe same size; likewise, a non-uniform probe consistsof packets of di�erent sizes. Finally, we say that anindividual packet is hop-limited if its TTL is set to anarti�cially small value so as not to reach the ostensibledestination. Hop-limited packets can be used to trig-ger an ICMP response from an intermediate router.

Throughout the paper we use various probing tech-niques that rely on sending sequences of probes. Theprobing techniques di�er in the number of packetsconstituting a probe, the size, and the path traversedby each probe packet. They also di�er in the host col-lecting the probing responses and the function used bythis host to perform the required estimation.

In many instances, the process of probing may in-volve sending a multitude of probe structures to var-ious destinations. For example, consider a multi-destination probe which targets two clients (from asingle source). Given a set of clients, it may be nec-essary to send such a probe to every possible pair ofclients in the set. We call such a set of probes a prob-ing round.

Each packet p transmitted within a probe is param-eterized by its size s(p) in bytes and its �nal destina-tion,D(p). In the event that a packet is hop-limited, ithas a third parameter, its maximum hop-count, h(p).

To denote a probe, we refer to each probe packet witha distinct lowercase letter, and represent the sequen-tial order in which they are transmitted from the prob-ing host by writing them from left to right. We denoteinterpacket spacing with square braces. As an exam-ple, [pq][pq][r] would denote transmission of a pair ofidentical two-packet probes followed by a single packetprobe which has di�erent characteristics.

Finally, we use the term interarrival time of packetsp and q at a link to denote the time elapsed betweenthe arrival of the last byte of p and the arrival of thelast byte of q at that link.

IV. Design Rationale

The design of Periscope was driven by a number ofobjectives aiming to: (1) Minimize user/kernel bound-ary crossings. (2) Provide enough primitives to en-able the de�nition of arbitrary probing structures andprobing techniques. (3) Provide a structured and well-de�ned interface for applications. (4) Ensure kernelcode modularity and restrict changes to the network-ing stack.

By implementing the scheduling and monitoringfunctionalities in the kernel, Periscope minimizesuser/kernel boundary crossings. The user/kernelboundary is crossed only during initialization oper-ations2 and during periodic application callbacks toreport inference results. This optimization is valuablefor busy servers.

Periscope is designed to be general and friendlyenough in the sense that users can de�ne from userspace the structure of their intended probes andvarious probing characteristics (e.g. probe packetsinter-departure times, the characteristics of a prob-ing round, the number of rounds needed, etc.). Also,although some versions of inference and labeling algo-rithms such as Bayesian Probing [14] and CartoucheProbing [15] are provided inside the kernel, a user canchoose to get from the kernel some native results (e.g.arrival times of the responses of all probe packets ina round of probing) and implement its own inferencealgorithm in user space.

The interface between applications and Periscopeis done through the use of control sockets. Systemcalls are translated through ioctl calls to performappropriate actions in the kernel. An application usesthe select() system call to receive Periscope call-backs. This approach (control socket + select + ioctl)

2As we will detail later, this includes group setup and owregistration.

Page 4: P AM 2002: Harfoush, Bestavr os, and Byers, Periscope n A ... · justi ed b y its imp ortance for a v ariet y of net w ork-a are appli-cations. One widely adopted strategy to c harac-terize

PAM 2002: Harfoush, Bestavros, and Byers, Periscope: An Active Measurement API 4

API

M A N A G E R

D EFINITIONS LOGIC

S C H E D U L E RM ONITOR

UserSpace

Per iScope

KernelSpace

IP Layer

Appl icat ion Cal lbacksAPI Cal ls

Fig. 1. Periscope Architecture

restricts code changes caused by Periscope to thenetworking stack and provides a well-de�ned and ex-ible interface for applications.

V. Periscope Architecture

Periscope server-side functionality allows for: (1)the maintenance of the de�nitions of probing struc-tures, (2) the orchestration of probe transmissions,(3) the collection of relevant probe packet responses(e.g., accurate arrival times, loss rates), and (4) theexecution of the inference processes.

Figure 1 depicts the main components of thePeriscope architecture.

The Manager keeps record of all endpoints (i.e.,clients) under consideration. Endpoints are par-titioned into application-de�ned groups|groups ofclients speci�ed by applications through an API call.The inference procedures are applied on ows destinedto endpoints belonging to a single group. This par-titioning of ows into groups is designed mainly togive the exibility of running more than one probingstructure or running the same probing structure be-tween di�erent set of endpoints simultaneously. Thismay be useful for probing techniques that use morethan one probing structure [14] or to diagnose sharedcongestion between di�erent sets of endpoints pos-sibly for aggregate congestion control purposes [3].The Manager associates with each group a probingstructure de�nition that contains the number of probepackets, the size and the destination (or the numberof hops) of each probe packet. It also associates with

each group a set of probing parameters (e.g., the num-ber of probes to send and the probing rate). Thesede�nitions and parameters are provided by the appli-cations in a generic way through API calls.

The Scheduler uses a timer for each group3.Whenever a timer associated with a group of end-points expires, a new probe for this group is insertedinto the IP stack for transmission. The inserted probestructure conforms to the probe de�nition associatedwith the group. The Scheduler ensures that theprobe packets are inserted back-to-back on the topof the IP stack. In order to send packets back-to-backwe extended the kernel library with a function thatsends packets back-to-back with a single system call.

The Monitor keeps track of the relevant character-istics (e.g., losses and arrival times) of the probe pack-ets responses. These statistics are updated as a resultof the receipt of an ECHO REPLY or an ICMP TIME

EXCEEDED message. A user may elect to receive thereplies in user space by itself and store the responsecharacteristics on its own.

VI. Periscope API

In this section, we present some details ofPeriscope's API4.

As we explained earlier, the Periscope API frees

3Note that the timer expiration time depends on the probingrate parameter associated with the group's probing structurede�nition

4For more details about Periscope functionality refer tohttp://cs-people.bu.edu/harfoush/periscope

Page 5: P AM 2002: Harfoush, Bestavr os, and Byers, Periscope n A ... · justi ed b y its imp ortance for a v ariet y of net w ork-a are appli-cations. One widely adopted strategy to c harac-terize

PAM 2002: Harfoush, Bestavros, and Byers, Periscope: An Active Measurement API 5

an application from having to manage the probingprocesses it requires, moreover it guarantees speci�cproperties that are not possible to ensure from userspace concerning the transmission of probe packetsfrom an end-host (e.g., guarranteeing that probe pack-ets are transmitted back-to-back or with a speci�c in-terdeparture time).

Using the Periscope API, an application can de-�ne and activate a probing sequence by following thesteps below:

1. Create a new group and provide the parameters as-sociated with the group (e.g., the length of the probingsequence, probing rate, frequency of callbacks, sensi-tivity constant to be used for inference, etc.),

2. Register the endpoints (i.e., set of clients) to beconsidered as part of this group,

3. De�ne the probing structure (e.g. the number ofprobe packets in a probe and for each probe packetthe ow that it belongs to, its TTL �eld value5 andits size),

4. Activate the group, and

5. Wait for feedback.

Figure 2 shows the user level data structures andAPI used by Periscope. We discuss these next.

Data Structures: The group param t structurecontains the di�erent parameters that de�ne the set-tings for a speci�c group. A phase is de�ned as atransmission of one round of probes. The max phase

and the probing rate �elds of the param t struc-ture de�ne the total number of probes and the prob-ing rate that the application wants Periscope touse. Based on the feedback type value the usermay elect to either receive probe packet replies byitself (feedback type=0) or use the Periscope feed-back mechanism (feedback type=1) and in this casethe user can �x the feedback frequency through thefeedback rate �eld. The group param t structureinclude other group settings. The flow param t

structure contains parameters relevant to a speci�c ow within a group. These include an identi�er of thegroup to which the ow belongs and structures thatde�ne the ow destination.

Periscope probe packets are transmitted as ECHOREQUESTs with a Periscope header in its payload.The payload t structure de�nes the sructure of the

5We set the TTL �eld value to 255 for probe packets that wewant delivered to an endpoint.

typedef struct group_param {int max_phase;int probing_rate;int feedback_type;int feedback_rate;int cartouche_dimension;...

} group_param_t;

typedef struct payload {int group_id;int phase_no;int probe_packet_no;struct timeval probing_time;struct timeval feedback_time;

} payload_t;

typedef struct probe_packet {int flow_id;int size;int TTL;

} probe_packet_t;

typedef struct probe_structure {int n_probe_packets;probe_packet_t *probe_packets;

} probe_structure_t;

int PS_open(void);int PS_new_group(int ctrl_fd,param_t params);int PS_register_flow(int ctrl_fd,

int group_id,u_char *host);int PS_register_probe_structure(int ctrl_fd,

int group_id,probe_structure_t p_structure);int PS_activate_group(int ctrl_fd,int group_id);

Fig. 2. Periscope Data Structures and User Level API

header to be included. A packet payload is times-tamped by the kernel when it is posted in the IPstack for transmission (probing time) and ECHO RE-

PLY messages are timestamped when they reach theserver (feedback time). Periscope manages two se-quence numbers that are also included in the payloadof the probe packets. These sequence numbers re ectthe phase number that this probe packet belongs to(phase no) and the probe packet position within theprobe (probe packet no). The sequence numbers aremandatory for the application to compile probe packetreplies.

In order to specify a probing construct, Periscopede�nes two structures: probe packet t and probe

structure t. The former de�nes the attributes of aprobe packet (the group and ow to which it belongs,the size and the TTL �eld value to be assigned to thisprobe packet). The second structure has a list of allthe probe packet de�nitions.

API: PS open() creates a new socket of typeSOCK PS (a Periscope-de�ned type used for thetransmission and receipt of probes) and of protocol IP-PROTO ICMP (the ICMP protocol). The return valueis socket �le descriptor ctrl fd. PS new group() allo-

Page 6: P AM 2002: Harfoush, Bestavr os, and Byers, Periscope n A ... · justi ed b y its imp ortance for a v ariet y of net w ork-a are appli-cations. One widely adopted strategy to c harac-terize

PAM 2002: Harfoush, Bestavros, and Byers, Periscope: An Active Measurement API 6

cates a new group under ctrl fd. The parameters ofthis group are passed along as arguments. The returnvalue from this function is a group identi�er group id.PS register flow() registers a new endpoint withgroup group id associated with the ctrl fd socket.The user should call PS register flow() as manytimes as there are ows to be associated with thegroup. PS register probe structure() is usedto pass a probe structure de�nition to Periscope

manager in the kernel and PS activate group() ac-tivates group id's probing timer, thus starting theperiodic probe transmission and statistics collectionprocedures for the group.

Probing Feedback: After activating a group, theuser waits to receive probe responses. The responsescan be either ECHO REPLY packets or ICMP TIME

EXCEEDED packets. The former response is receivedfrom an endhost replying to ECHO REQUEST probepacket and the latter is received from a router onthe way to the endhost after realizing that a packetTTL value reached 0 (as a result of a hop-limitedprobe packet). The user knows in advance, basedon the TTL �eld value that he/she de�ned in theprobe packet t structure whether a probe packetwill trigger an ECHO REPLY or an ICMP TIME EX-

CEEDED message. The most appropriate way fora user to wait for all probe responses is to use aselect() statement with its reading set of socket�le descriptors containing only the ctrl fd value re-turned by the PS open() API call. After receiving aresponse packet, the user should classify it based onthe phase no and probe packet no �elds in its pay-load to associate the response with the correct probepacket.

Sample Periscope Code: Figure 3 shows a sam-ple routine, Packet Pair(), detailing how we canprogram the uniform packet-pair (PP) techniqueto infer the end-to-end bottleneck bandwidth usingPeriscope API. The PP technique has, using theterminology of section III, the [pp] probing structure.The parameters to the Packet Pair() routine are theendpointD(p), the probing parameters to be used, thesize of the probe packets and the timeout value to bewaited before assuming that no more responses willbe received.

Subtleties: When the response packet is ICMP TIME

EXCEEDED, the received packet payload does notcontain the payload t structure as de�ned by the userand thus does not contain the expected phase no andprobe packet no �elds. This might cause spurious

Packet_Pair(u_char *host,group_param_t params,int sizestruct timeval timeout;

){

int ctrl_fd;int group_id;int flow_id;probe_packet_t p_packet;probe_structure_t p_structure;int fd;fd_set rset;ssize_t r;

ctrl_fd=PS_open();group_id=PS_new_group(ctrl_fd,params);flow_id=PS_register_flow(ctrl_fd,

group_id,host);

p_packet.flow_id=flow_id;p_packet.size=size;p_packet.TTL=255;p_structure.n_probe_packets=2;p_structure.probe_packets

=malloc(2*sizeof(probe_packet_t));p_structure.probe_packets[0]=p_packet;p_structure.probe_packets[1]=p_packet;PS_register_probe_structure(ctrl_fd,

group_id,p_structure);

PS_activate_group(ctrl_fd,group_id);

/* Wait for ECHO REPLY messages */FD_ZERO(&rset);FD_SET(ctrl_fd,&rset);for (;;) {if (

(r=select(fd+1,&rset,NULL,NULL,&timeout))==0)// timeout expired, so exit

else// read payload details

}}

Fig. 3. Programming [pp] probing sequence of the packet-pair Probing technique using Periscope API.

measurements due to mixing the responses especiallywhen probes are lost or when packets can be reorderedover the investigated path. To protect against thiscase, Periscope provides two mechanisms:

1. Reordering Test: This test allows the determina-tion of whether probe packets are prone to a change intheir order over a path. This reordering can happendue to routers policies along the path. For example,in some probing experiments we conducted, we no-ticed that some routers give preference to mall pack-ets over large packets. The test is based on sendingall the probe packets to the end host (using a TTLvalue of 255 for all the packets) for a speci�c num-ber of phases and checking that for each phase thereturned ECHO REPLY messages have their assignedprobe packet no �eld value in the same transmission

Page 7: P AM 2002: Harfoush, Bestavr os, and Byers, Periscope n A ... · justi ed b y its imp ortance for a v ariet y of net w ork-a are appli-cations. One widely adopted strategy to c harac-terize

PAM 2002: Harfoush, Bestavros, and Byers, Periscope: An Active Measurement API 7

order. If probe packets are prone to reordering theuser is informed and has the decision as to whether toprobe this path or to abort the probing process.

2. Synchronization Procedure: This procedure makesit simpler, in case that reordering does not happen,to identify the phase to which an ICMP TIME EX-

CEEDED reply belongs. Note that this procedure isuseful if the �rst and last probe packet replies are ex-pected to be ICMP TIME EXCEEDED messages. Thetest relies on sending a synchronization packet of smallsize and a TTL value of 255 after the transmissionof each probe and using the ECHO REPLY messagestriggered by these packets to separate the responsesof di�erent probe packets.

User Level Libraries: Periscope is equipped witha set of user space libraries that implement the func-tionalities necessary to build and label metric-inducednetwork topologies instantiated for the loss rate met-ric.

Recap: In this section, we have described the datastructures and the user level API routines that canbe used to program, activate and capture responsesto generic probing structures using Periscope. Wehave also pointed out some of the problems that mightface an active probing tool in terms of packets reorder-ing and packet loss and proposed precautions againstthese problems. In the next section, we turn our at-tention to validating Periscope applicability in anInternet setting.

VII. Internet Measurement Experiments

In this section, we describe some of our experimentswith Internet validation of Periscope, drawing onexamples taken from a large set of experiments thatare aimed at using the Periscope API to eÆciently:(1) estimate the shared loss rate between a server anda set of clients and then infer and label loss topologies,and (2) estimate the bottleneck bandwidth along anarbitrary segment of a path.

The presented techniques are eÆcient in the sensethat they do not need to probe every physical linkalong the shared path to determine the shared met-ric value as compared to the less eÆcient techniqueinvolving the use of traceroute [18] and pchar [24].

A. Loss Topology Inference

In this section, we describe an illustrative exampledemonstrating Periscope's ability to correctly inferand label loss topologies in an Internet setting. In[4], a network topology connecting di�erent endpoints

Bayesian_Probing(u_char *host_A,u_char *host_B,group_param_t params,int sizestruct timeval timeout;

){

int ctrl_fd;int group_id;int flow_id_A;int flow_id_B;probe_packet_t p_packet_A;probe_packet_t p_packet_B;probe_structure_t p_structure;int fd;fd_set rset;ssize_t r;

ctrl_fd=PS_open();group_id=PS_new_group(ctrl_fd,params);flow_id_A=PS_register_flow(ctrl_fd,

group_id,host_A);

flow_id_B=PS_register_flow(ctrl_fd,group_id,host_B);

p_packet_A.flow_id=flow_id_A;p_packet_A.size=size;p_packet_A.TTL=255;p_packet_B.flow_id=flow_id_B;p_packet_B.size=size;p_packet_B.TTL=255;p_structure.n_probe_packets=2;p_structure.probe_packets

=malloc(2*sizeof(probe_packet_t));p_structure.probe_packets[0]=p_packet_A;p_structure.probe_packets[1]=p_packet_B;PS_register_probe_structure(ctrl_fd,

group_id,p_structure);

PS_activate_group(ctrl_fd,group_id);

/* Wait for ECHO REPLY messages */FD_ZERO(&rset);FD_SET(ctrl_fd,&rset);for (;;) {if (

(r=select(fd+1,&rset,NULL,NULL,&timeout))==0)// timeout expired, so exit

else// read payload details

}}

Fig. 4. Programming [pq] probing sequence of the BayesianProbing technique using the Periscope API.

through IP routing can be represented in one of thefollowing ways.

1. Physical topology, which has all routers connect-ing the endpoints represented as internal nodes of thetopology,

2. Logical topology, in which all internal nodes withonly one child have been collapsed into their parentrecursively. As a result all internal nodes in a logicaltopology have at least two downstream endpoints,

Page 8: P AM 2002: Harfoush, Bestavr os, and Byers, Periscope n A ... · justi ed b y its imp ortance for a v ariet y of net w ork-a are appli-cations. One widely adopted strategy to c harac-terize

PAM 2002: Harfoush, Bestavros, and Byers, Periscope: An Active Measurement API 8

202.112.1.62

202.112.128.55

A BD E

F G

S

C

209.247.10.38

212.1.200.26

193.204.250.4193.204.49.37

163.121.43.3163.121.43.2

202.198.8.5

202.112.128.5A B

D E

F G

S

C

A BD E

F G

S

C

202.112.42.3

Fig. 5. Periscope Validation: Logical tree used as a test case (left), most frequent inferred loss tree (middle) andminimal loss tree spanning all inferred trees (right).

3. Metric-Induced topology which is parametrized by asensitivity parameter c and a metric of interest (e.g.,loss rate), and is formed from the logical topologywhen all internal nodes with parent links of a metricvalue less than c have been collapsed into their parent.The larger the sensitivity constant the less links willappear in the tree as links with metric values less thanthe sensitivity constant are combined with other highmetric value links. That is, the sensitivity constantis used to vary the resolution of the metric-inducedtopologies.

The technique used to infer the loss topologies relieson inferring the shared loss rate between each pair ofclients and then using this information to build andlabel the loss topologies. In order to estimate theshared loss value Periscope uses the BP technique[14] which we summarize next.

Bayesian Probing: Let S be a server connectedthrough IP routing to two clients A and B. In order toestimate the loss rate over the shared path en routeto clients A and B, the Bayesian probing techniquerequires that server S sends, using the terminology ofsection III, the following probing sequence: [pq] wheres(p) = s(q), D(p) = A and D(q) = B. Then by col-lecting probe responses and correlating their losses,the shared loss rate can be estimated. For more de-tails about the bayesian probing technique refer to[14].

Periscope Code for BP Technique: Figure 4shows a sample routine, Bayesian Probing(), de-tailing how we can program the BP technique us-ing the Periscope API. The parameters to theBayesian Probing() routine are the endpoints A andB, the probing parameters to be used, the packet sizes(p) and the timeout value.

Experimental Setup: The topology used for theillustration connects a local server (Pentium II pro-cessor running RedHat Linux version 2.2.14) to a setof seven hand-picked hosts. The seven endpoints wereselected to ensure the existence of di�erent lossy pathsthat are shared between the server and various sub-sets of endpoints. In addition, by placing the serverbelow a slow uplink, we ensured the existence of a(possibly) lossy path between the server and all end-points. These choices were all made with the goal ofstress-testing the inference and labeling techniques inmind.6

Figure 5 depicts the logical topology between theserver and the seven hosts. Intermediate router IPaddresses were obtained through the use of traceroute.The server is in the continental U.S. Hosts A,B andC are in China with hosts A and B on the same LANof Beijing University of Aeronautics and Astronauticsand host C in Northeast China Institute of ElectricPower Engineering. Hosts D and E are in Egypt, onthe same LAN of the Arab Academy for Science andTechnology (AAST). Hosts F and G are in Italy at twodi�erent universities: Politecnico di Bari and Univer-sita Degli-Studi di Bergano.

In order to study whether the results con�rm towhat the analysis and simulation in [4] expected, weneed to establish a reference against which we could

6Speci�cally, validating our tool requires observing loss-topologies of appreciable structure|hence our choice of aninter-continental set of endpoints. Internet loss topologycharacterizations to small sets of random endpoints (fromCAIDA/NLANR logs) rarely yielded rich/interesting struc-tures. The depicted topology is meant to be illustrative, notrepresentative. Also, experiments to the selected set of end-points didn't consistently reveal high losses. Figure 5(right)was constructed only after integrating consistent inferred losstopologies viewed at di�erent times.

Page 9: P AM 2002: Harfoush, Bestavr os, and Byers, Periscope n A ... · justi ed b y its imp ortance for a v ariet y of net w ork-a are appli-cations. One widely adopted strategy to c harac-terize

PAM 2002: Harfoush, Bestavros, and Byers, Periscope: An Active Measurement API 9

compare the inferred and labeled loss trees we obtainfor a given sensitivity parameter c. The logical tree(shown in Figure 5) is such a reference for c = 0.Obtaining such a reference for a non-zero sensitivityparameter is impossible since it requires knowledge ofloss rates on all links of the logical tree. Moreover, lossrates cannot be assumed stationary for the durationof a Periscope experiment and may not always beabove the speci�ed sensitivity parameter.

While the logical tree in Figure 5 cannot be used todirectly validate loss trees inferred by Periscope, itcan be used to check whether the loss trees generatedby Periscope are mutually consistent. By mutuallyconsistent, we mean that it is possible to constructthe logical tree from the inferred loss tree by addingzero or more links.

We performed 20 experiments using Periscope toinfer and label the loss topology to the seven end-points of the logical topology in Figure 5. These 20experiments were conducted at di�erent times. Eachexperiment consisted of 100 probing phases with 64-byte probes. At a probing rate of 5 probes/sec, ittakes Periscope about 4 minutes to complete 100phases of probing. Notice that this time could bedecreased by reducing the number of phases or byincreasing the probing rate. Indeed, in most exper-iments, the loss topology tree stabilized within lessthan 10 phases|i.e. less than 24 seconds. However,increasing the probing rate is not desirable because itcan lead to correlated probe packets behavior. Thisviolates the BP approach assumptions.

Results: The non-stationarity of losses on the variouslinks in a logical topology makes it unlikely that all ofthe potentially lossy links will be observable in a givenexperiment at a given time. Thus, one would expectthat the loss topologies inferred by Periscope will bedi�erent when run on the tree in Figure 5 (left). In-deed, Periscope inferred six di�erent loss topologies.Over the 20 experiments we conducted, the most fre-quently inferred loss topology tree is shown in Figure5 (middle). This tree was inferred 11 times at timesranging between 3am and 7am EST (consistent withthe fact that the lossy paths were the ones connectingour server to the hosts in China).

We also constructed the minimal loss tree [4]spanning all six of the loss topologies inferred byPeriscope when c = 0:01. The resulting tree (whichitself is not one of the trees inferred by Periscope) isshown in Figure 5 (right). Clearly, that tree is consis-tent with the logical tree depicted in Figure 5 (left).

B. Bottleneck Bandwidth of Arbitrary Path Segments

We use the term bottleneck bandwidth of a path to re-fer to the maximum transmission rate that could beachieved between two hosts at the endpoints of thatpath in the absence of any competing traÆc. Thebottleneck bandwidth of a path is limited by the min-imum link speed along that path. In this section, weprogram Periscope to infer the bootlenck bandwidthalong an arbitrary path segment. In particular, givena path consisting of a sequence of links L1; : : : ; Ln withbase bandwidths b1; : : : ; bn, our goal is to estimatethe bottleneck bandwidth of an arbitrary sequence oflinks along that path, i.e. estimate mini�k�j bk, forarbitrary i and j such that i � j � n. We use theshorthand bi;j to denote the bottleneck bandwidth inthe interval between links i and j inclusive. The tech-nique we use to do this inference is called cartoucheprobing [15] which we summarize next.

Cartouche Probing: Let L be a path consistingof a sequence of links L1; : : : ; Ln with base band-widths b1; : : : ; bn. In order to estimate bi;j , the Car-touche probing technique requires injecting at L1 ,using the terminology of section III, the followingprobing sequences: (1) [pmfpqgr�1pm] where s(m) =s(q) � s(p), D(m) = n and h(p) = h(q) = i, (2)[pi�1mfpi�1qi�1g

r�1pimfpiqigr�1pi+1m: : : pj�1mfpj

qjgr�1pjm] where s(pw) and s(qw) are the same for

all w = i � 1; : : : j, s(m) = s(qw) and s(pw) � s(m),h(pw) = w, h(qw) = w, h(m) = n. The techniquethen measure the inter-arrival times of the m probepackets and use it to infer bi; j. For more details aboutthe cartouche probing technique refer to [15].

Periscope Code for CP Technique: Figure 6shows a sample routine, Cartouche Probing(), de-tailing how we can program the [pmfpqgr�1pm] prob-ing sequence of the CP technique using PeriscopeAPI. where s(m) = s(q) � s(p), D(m) = n. Theparameters to the Cartouche Probing() routine arethe probed endhost, the value i of h(p) = h(q), thecartouche size r, the probing parameters to be used,the probe packet sizes s(p) = s(q) (large size) ands(m) (small size) and the timeout value.

Experimental Setup: We installed Periscope ona laptop with a Pentium III processor running Red-Hat Linux 2.2.14. Four Internet paths connectingthe laboratory to four di�erent universities have beenhandpicked for the experiments. These universitiesare Georgia Tech in the US, University of BritishColumbia in Canada, Ecole Normale Superieure inFrance and Hirosaki University in Japan.

Page 10: P AM 2002: Harfoush, Bestavr os, and Byers, Periscope n A ... · justi ed b y its imp ortance for a v ariet y of net w ork-a are appli-cations. One widely adopted strategy to c harac-terize

PAM 2002: Harfoush, Bestavros, and Byers, Periscope: An Active Measurement API 10

Cartouche_Probing(u_char *host,int i,int r,group_param_t params,int large_size,int small_size,struct timeval timeout;

)

{

int ctrl_fd;int group_id;int flow_id;probe_packet_t p_packet_p;probe_packet_t p_packet_q;probe_packet_t p_packet_m;probe_structure_t p_structure;int fd;fd_set rset;ssize_t r;

ctrl_fd=PS_open();group_id=PS_new_group(ctrl_fd,params);flow_id=PS_register_flow(ctrl_fd,

group_id,host);

p_packet_p.flow_id=flow_id;p_packet_p.size=large_size;p_packet_p.TTL=i;p_packet_q.flow_id=flow_id;p_packet_q.size=small_size;p_packet_q.TTL=i;p_packet_m.flow_id=flow_id;p_packet_m.size=small_size;p_packet_m.TTL=255;p_structure.probe_packets

=malloc((2*r+2)*sizeof(probe_packet_t));p_structure.probe_packets[0]=p_packet_p;p_structure.probe_packets[1]=p_packet_m;for (int index=1;index<r;index++) {

p_structure.probe_packets[2*index]=p_packet_p;

p_structure.probe_packets[2*index+1]=p_packet_q;

}p_structure.probe_packets[2*r]=p_packet_p;p_structure.probe_packets[2*r+1]=p_packet_m;p_structure.n_probe_packets=2*r+2;PS_register_probe_structure(ctrl_fd,

group_id,p_structure);

PS_activate_group(ctrl_fd,group_id);

/* Wait for ECHO REPLY messages */FD_ZERO(&rset);FD_SET(ctrl_fd,&rset);for (;;) {if (

(r=select(fd+1,&rset,NULL,NULL,&timeout))==0)// timeout expired, so exit

else// read payload details

}}

Fig. 6. Programming the [pmfpqgr�1pm] probingsequence of the Cartouche Probing technique usingPeriscope API.

In order to provide a preliminary study whether theresults con�rm what the analysis and simulation in[15] expected, we connected Periscope to the Inter-net in two distinct locations: once through a 10MbpsLAN in Boston University Computer Science labora-tory over and another time over a 56Kbps modem.Then by using the cartouche probing technique to es-timate the bottleneck bandwidth over the �rst link,we know that the correct bandwidth estimates shouldbe around 10Mbps and 56Kbps respectively.

In all experiments, we use s(p) = 1500 bytes ands(m) = 60 bytes. Experiments were conducted onceper second until we obtain 100 valid results for a givenpath, recalling that packet reordering or packet lossesinvalidate an experiment.

Results: Figure 7 shows the histograms we obtainedwhen using Periscope to estimate the bandwidth es-timate over the �rst link for the path connecting thelaptop to the four universities both when connectedover the 10Mbps with histogram bin width of 1Mbps(top) and over the 56Kbps modem with histogram binwidth of 1Kbps (bottom row). In both cases the car-touche dimension r is 4 and the histogram used has abin width of 1Mbps. These preliminary results are inkeeping with our expectations.

VIII. Conclusion

We have presented a Linux API called Periscope

that implements the functionality necessary to de-�ne, activate and collect measurements for arbitraryprobing structures. Periscope optimizes user/kernelboundary crossings while providing a exible and wellestablished interface to applications.

By presenting sample code that implements a va-riety of di�erent probing techniques from the liter-ature; namely the packet-pair technique, bayesianprobing and cartouche probing; we have shown thatPeriscope is easily programmable and is capable ofgenerating di�erent probing structures.

Periscope has proven to be applicable in real In-ternet settings. We gave evidence of this applicabilitythrough two di�erent sets of experiments to infer andlabel loss topologies and to infer the bottleneck band-width along arbitrary path segments.

Software Availability: Periscope software isavailable for downloading at: http://cs-people.bu.edu/harfoush/periscope. The web page also includes moretechnical details, as well as a Periscope tutorial.

Page 11: P AM 2002: Harfoush, Bestavr os, and Byers, Periscope n A ... · justi ed b y its imp ortance for a v ariet y of net w ork-a are appli-cations. One widely adopted strategy to c harac-terize

PAM 2002: Harfoush, Bestavros, and Byers, Periscope: An Active Measurement API 11

0

0.05

0.1

0.15

0.2

0.25

0.3

0 50 100 150 200 250 300 350 400

%

BW(Mbps)

0

0.05

0.1

0.15

0.2

0.25

0.3

0 50 100 150 200 250 300 350 400

%

BW(Mbps)

0

0.05

0.1

0.15

0.2

0.25

0.3

0 50 100 150 200 250 300 350 400

%

BW(Mbps)

0

0.05

0.1

0.15

0.2

0.25

0.3

0 50 100 150 200 250 300 350 400

%

BW(Mbps)

0

0.1

0.2

0.3

0.4

0.5

0.6

0 20 40 60 80 100

%

BW(Kbps)

0

0.1

0.2

0.3

0.4

0.5

0.6

0 20 40 60 80 100

%

BW(Kbps)

0

0.1

0.2

0.3

0.4

0.5

0.6

0 20 40 60 80 100

%

BW(Kbps)

0

0.1

0.2

0.3

0.4

0.5

0.6

0 20 40 60 80 100

%

BW(Kbps)

Georgia Tech University of British Columbia Ecole Normale Superieure Hirosaki University

Fig. 7. Histograms of bandwidth estimates over the �rst link connecting Periscope to four universities over a 10Mbpswith histogram bin width of 1Mbps (top) and over a 56Kbps modem with histogram bin width of 1Kbps (bottom).Cartouche dimension r =4.

References

[1] A. Barbir et al. Known CDN Request-RoutingMechanisms. http://www.globecom.net/ietf/draft/draft-cain-cdnp-known-request-routing-01.html, February2001.

[2] D. Andersen, H. Balakrishnan, M. F. Kaashoek, andR. Morris. Resilient Overlay Networks. In Proceedings ofSOSP 2001, Ban�, Canada, October 2001.

[3] H. Balakrishnan, H. Rahul, and S. Seshan. An IntegratedCongestion Management Architecture For Internet Hosts.In SIGCOMM '99, Cambridge, MA, September 1999.

[4] A. Bestavros, J. Byers, and K. Harfoush. Inference andLabeling of Metric-Induced Network Topologies. InINFOCOM'02, New York, NY, June 2002.

[5] J. C. Bolot. End-to-end Packet Delay and Loss Behaviorin the Internet. In SIGCOMM '93, pages 289{298,September 1993.

[6] R. C�aceres, N. G. DuÆeld, S. B. Moon, and D. Towsley.Inference of Internal Loss Rates in the MBone. In IEEEGlobal Internet (Globecom), Rio de Janeiro, Brazil, 1999.

[7] Robert L. Carter and Mark E. Crovella. Measuringbottleneck link speed in packet switched networks. InPERFORMANCE '96, the International Conference onPerformance Theory, Measurement and Evaluation ofComputer and Communication Systems, October 1996.

[8] Y.-H. Chu, S. Rao, and H. Zhang. A Case forEnd-System Multicast. In ACM SIGMETRICS '00,Santa Clara, CA, June 2000.

[9] M. E. Crovella, R. Frangioso, and M. Harchol-Balter.Connection Scheduling in Web Servers. In Proceedings of1999 USENIX Symposium on Internet Technologies andSystems (USITS '99), Boulder, CO, October 1999.

[10] Dionisys: End-System QoS Support for DistributedApplications.http://cs-www.bu.edu/fac/richwest/research.html.

[11] Constantinos Dovrolis. Pathrate: Measurement tool forthe capacity and load of internet paths.http://www.cis.udel.edu/~dovrolis/bwmeter.html.

[12] Allen B. Downey. Clink: a tool for estimating internetlink characteristics.http://rocky.wellesley.edu/downey/clink/.

[13] N. DuÆeld, F. Lo Presti, V. Paxson, and D. Towsley.Inferring Link Loss Using Striped Unicast Probes. InIEEE INFOCOM 2001, April 2001.

[14] K. Harfoush, A. Bestavros, and J. Byers. RobustIdenti�cation of Shared Losses Using End-to-End UnicastProbes. In 8th International Conference on NetworkProtocols (ICNP), Osaka, Japan, November 2000.

[15] Khaled Harfoush, Azer Bestavros, and John Byers.Measurement of Shared Bottleneck Bandwidth UsingEnd-to-End Cartouche Probing. Technical ReportBUCS-TR-2000-016, Boston University, ComputerScience Department, July 2001.

[16] IPMA : Internet Performance Measurement and Analysis.http://www.merit.edu/ipma.

[17] V. Jacobson. Pathchar: A Tool to Infer Characteristics ofInternet Paths. ftp://ftp.ee.lbl.gov/pathchar.

[18] V. Jacobson. traceroute.ftp://ftp.ee.lbl.gov/traceroute.tar.gz, 1989.

[19] J. Jannotti, D. Gi�ord, K. Johnson, M. F. Kaashoek, andJr. J. O'Toole. Overcast: Reliable Multicasting with anOverlay Network. In Proceedings of OSDI 2000, SanDiego, CA, October 2000.

[20] J. Kangasharju, J. Roberts, and K. W. Ross. ObjectReplication Strategies in Content Distribution Networks.In Proceedings of WCW'01: Web Caching and ContentDistribution Workshop, Boston, MA, June 2001.

[21] S. Keshav. Congestion Control in Computer Networks.PhD thesis, University of California at Berkeley,September 1991.

[22] K. Lai and M. Baker. Measuring Link Bandwidths Usinga Deterministic Model of Packet Delay. In SIGCOMM'00, Stockholm, August 2000.

[23] K. Lai and M. Baker. Nettimer: A tool for MeasuringBottleneck Link Bandwidth. In Proceedings of USITS '01,March 2001.

[24] B. Mah. pchar.

Page 12: P AM 2002: Harfoush, Bestavr os, and Byers, Periscope n A ... · justi ed b y its imp ortance for a v ariet y of net w ork-a are appli-cations. One widely adopted strategy to c harac-terize

PAM 2002: Harfoush, Bestavros, and Byers, Periscope: An Active Measurement API 12

http://www.employees.org/~bmah/Software/pchar/,2000.

[25] Mtrace: Tracing multicast path between a source and areceiver. http://www-itg.lbl.gov/mbone/mtrace.tips.html.

[26] V. Padmanabhan. Optimizing Data Dissemination andTransport in the Internet. Slides presented at theBU/NSF Workshop on Internet Measurement,Instrumentation and Characterization, September 1999.

[27] V. Paxson. End-to-end Routing Behavior in the Internet.In SIGCOMM '96, Stanford, California, August 1996.

[28] V. Paxson. End-to-end Internet Packet Dynamics. InSIGCOMM, 1997.

[29] V. Paxson. Measurements and Analysis of End-to-endInternet Dynamics. PhD thesis, U.C. Berkeley andLawrence Berkeley Laboratory, 1997.

[30] QLinux: A QoS enhanced Linux Kernel for MultimediaComputing. http://lass.cs.umass.edu/software/qlinux/.

[31] P. Radoslavov, R. Govindan, and D. Estrin.Topology-Informed Internet Replica Placement. InProceedings of WCW'01: Web Caching and ContentDistribution Workshop, Boston, MA, June 2001.

[32] S. Ratnasamy and S. McCanne. Inference of multicastrouting trees and bottleneck bandwidths using end-to-endmeasurements. In Proceedings of IEEE INFOCOM '99,pages 353{60, March 1999.

[33] I. Stoica, R. Morris, D. Karger, F. Kaashoek, andH. Balakrishnan. Chord: A Scalable Peer-to-Peer LookupService for Internet Applications. In Proceedings of ACMSIGCOMM'01, San Diego, CA, August 2001.

[34] Maya Yajnik, Sue Moon, Jim Kurose, and Don Towsley.Measurement and modelling of the temporal dependencein packet loss. In Proceedings of IEEE INFOCOM '99,pages 345{52, March 1999.