Enabling New Applications with Optical Circuit- Switched Networks · FRTP: Fixed Rate Transport...

Enabling New Applications with Optical Circuit-

Switched Networks

_____________________________________________________________

A Dissertation

Presented to

the Faculty of the School of Engineering and Applied Science

University of Virginia

_____________________________________________________________

In Partial Fulfillment

of the Requirement for the Degree

Doctor of Philosophy

(Electrical Engineering)

_____________________________________________________________

Xuan Zheng

May 2004

Approval Sheet

The dissertation is submitted in partial fulfillment of the

requirements for the degree of

Doctor of Philosophy (Electrical Engineering)

__________________

Xuan Zheng (Author)

This dissertation has been read and approved by the Examining Committee:

Professor Malathi Veeraraghavan (advisor)_______________________________

Professor Zongli Lin (Chairman) _______________________________

Professor Joanne Bechta Dugan _______________________________

Professor Maite Brandt-Pearce _______________________________

Dr. Wu-chun Feng _______________________________

Accepted for the School of Engineering and Applied Science:

_________________________

Dean, School of Engineering and Applied Science,

University of Virginia.

May, 2004

Abstract

New inventions in optical communications components are driving advances in net-

working architectures and protocols. However, user needs are not met by current network

solutions. Three gaps between user needs and network limitations are identified in this dis-

sertation. To bridge these gaps, we propose an optical circuit-switched solution called

Reconfigurable Ethernet/SONET Circuits for End Users (RESCUE). This solution is pro-

posed as an add-on to the primary Internet service already available to end users. It allows

the optical circuit-switched network to be operated in a call-blocking mode because the

primary Internet access path can be used as a fall-back option if the call setup attempt is

blocked. In RESCUE service, the circuits would essentially connect the end users directly

to either a service provider router or another end user in an optical circuit-switched net-

work. It allows end-host applications to enjoy direct high-speed Ethernet/SONET circuits.

We propose two types of applications using RESCUE service: (i) Dial-Up service for

Internet access, and (ii) end-to-end file transfers. They are proposed to overcome the three

gaps between user needs and network limitations. In this dissertation, we describe archi-

tectures and operations of these two applications. The routing decision algorithms for both

applications are proposed and quantitatively analyzed based on data-transfer delays and

network utilization. Analysis results show that a significant improvement in throughput

can be realized for data transfers in these two applications.

To implement applications that use the RESCUE service, we design and implement

three modules: a transport protocol module, a routing decision module, and a signaling

module. A high-speed transport protocol call Fixed Rate Transport Protocol (FRTP) is

proposed to substitute TCP over end-to-end RESCUE circuits to achieve better through-

put. The design and the implementation of FRTP with rate-based flow control and selec-

tive-ARQ error control are presented in this dissertation. The experimental results of this

implementation are presented in the context of our local-area testbed network. A routing

decision module is proposed to determine whether or not to attempt a RESCUE circuit

setup when end hosts have a choice of two communication paths. A signaling module is

needed to set up/release the RESCUE circuits.

The configuration of our local-area testbed network and the experiments designed for

this testbed network are introduced. A VLAN-based extension for local-area testbed net-

works is suggested to enhance the RESCUE service.

Finally, we list a number of enhancements that can be made to improve the RESCUE

service. These are described in the future work section.

To my wife Jie and my parents for their love and support

Acknowledgements

“Challenge” is the most appropriate word to describe the process of obtaining a doctoral

degree. Today, I am so glad that I am completing my doctoral study and I have learned

such advanced and interesting concepts in the field of networking. At this point, I would

like to thank everybody who helped me during this four-year process.

I would like to thank Prof. Malathi Veeraraghavan, my advisor, for her consistent guid-

ance and support throughout my doctoral program. Her extensive knowledge, enlighten-

ing direction, and continuous encouragement made my dissertation work smooth, positive,

and enjoyable. Besides the academic research, Prof. Veeraraghavan also provided long

hours of counseling, especially in improving my writing skills. She has been and will

always be an inspiration and an excellent role model.

I would then like to express my most sincere appreciation to my Ph. D. program com-

mittee members, Prof. Joanne Bechta Dugan, Prof. Maite Brandt-Pearce, Dr. Wu-chun

Feng, and Prof. Zongli Lin, for their generous help and numerous advices during my pro-

I would also like to thank the all other students in our research group, Anant Padmanath

Mudambi, Haobo Wang, Hojun Lee, Tao Li, Xiangfei Zhu, and Zhanxiang Huang, for

their friendship and kindly help.

Finally, this hard work is impossible to finish without the continuous love and support

from my dear wife Jie and my parents back in China. I dedicate this dissertation to them.

Contents

Chapter 1 Background and Problem Statement 1

1.1 Current Optical Switching Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Current Optical Network Architectures and Applications . . . . . . . . . . . . . . . . . . . 2

1.3 Gaps between User Needs and Current Network Solutions . . . . . . . . . . . . . . . . . . 4

1.3.1 Access Link Bottleneck Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3.2 TCP Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3.3 Difficulty in Creating End-to-end Connections to Meet Delay/Jitter Require-

ments of Interactive Real-time Applications. . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Chapter 2 Related Work 132.1 Related Work in the Packet-Switched Networking Community . . . . . . . . . . . . . . 13

2.1.1 Related Work to Address the Access Link Bottleneck Problem. . . . . . . . . . 132.1.2 Related Work to Address the TCP Limitations. . . . . . . . . . . . . . . . . . . . . . . 162.1.3 Related Work to Address the Difficulty in Providing End-to-end QoS . . . . 17

2.2 Related Work in the Circuit-Switched Networking Community . . . . . . . . . . . . . . 18

Chapter 3 Proposed RESCUE Service 19

3.1 Enabling Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Architecture and Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3 RESCUE as an “Add-on” Service to Primary Internet Access . . . . . . . . . . . . . . . 24

3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Chapter 4 Application I: Dial-Up Internet access service using RESCUE cir-cuits 29

4.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Analytical Basis for the Routing Decision: Delay Analysis . . . . . . . . . . . . . . . . . 33

4.3 Analytical Basis for the Routing Decision: Utilization Analysis . . . . . . . . . . . . . 40

4.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Chapter 5 Application II: End-to-end RESCUE Circuits to Improve File Transfer Delays 43

5.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.2 Analytical Basis for the Routing Decision: Delay Analysis . . . . . . . . . . . . . . . . . 46

5.3 Analytical Basis for the Routing Decision: Utilization Analysis . . . . . . . . . . . . . 51

5.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Chapter 6 Implementation of Application II 586.1 Design and Implementation of a High-speed Transport Protocol . . . . . . . . . . . . . 59

6.1.1 Design Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596.1.3 FRTP Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.1.4 An Implementation of FRTP protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686.1.5 LAN Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736.1.6 Summary of FRTP implementation and experiments . . . . . . . . . . . . . . . . . . 82

6.2 Routing Decision Module Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.3 Signaling Module Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.4 Local-area Testbed Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.4.1 Local-area Testbed Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876.4.2 Extension with VLAN Technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Chapter 7 Conclusions and Future Research 927.1 Summary and Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

7.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.2.1 Extension to Multi-protocol Interworking . . . . . . . . . . . . . . . . . . . . . . . . . . 967.2.2 Wide-area Testbed Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987.2.3 Call Scheduling in RESCUE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997.2.4 Router Disconnect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Bibliography 104

List of Figures

Figure 1. Towards Advancing the Value of Optical Networks . . . . . . . . . . . . . . . . . . . . . . 1

Figure 2. Current Optical Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Figure 3. Partial Topology of Polytechnic University’s Data Network . . . . . . . . . . . . . . . 5

Figure 4. One Sample Point for Total Usage of Polytechnic University Campus Access

Link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Figure 5. SONET Multiservice Provisioning Platform (MSPP) Architecture . . . . . . . . . 22

Figure 6. Configuration of End Hosts for RESCUE Service . . . . . . . . . . . . . . . . . . . . . . 22

Figure 7. The RESCUE Concept: Share optical network circuit resources on a call-by-call

basis and create high-speed Ethernet/SONET circuits on-demand; lines with

arrow-heads denote signaling messages; the dashed line denotes the dynamically

setup Ethernet/SONET circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Figure 8. RESCUE as an “Add-on” Service: the thick dashed lines show Ethernet/SONET

circuits set up on-demand between end hosts’ second NICs and routers, or

between the second NICs of two distant end hosts. In both cases, these become

alternative paths to the primary paths available through the hosts’ primary NICs.

Figure 9. Dial-Up Access Service Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Figure 10. An Extension of the Dial-Up Access Service . . . . . . . . . . . . . . . . . . . . . . . . . 32

Figure 11. Plot of equation (3) with a link rate of 1Gbps, , . . 37

Figure 12. Plot of per-circuit utilization for files in the range of (10KB, 50MB) with

=0.00001 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Figure 13. Use of RESCUE Circuits for End-to-end File Transfers. . . . . . . . . . . . . . . . . 44

Figure 14. Plot of equation (6) with , , 48

Figure 15. Plot of equation (6) with , , . . 48

Figure 16. Plot of equation (6) with , , ,

ρsig ρsp 0.7= = k 4=

pdialup

rc r 100Mbps= = ρsig ρsp 0.7= = k 20=

rc r 1Gbps= = ρsig ρsp 0.7= = k 20=

rc 100Mbps= r 1Gbps= ρsig ρsp 0.7= =

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Figure 17. A Three-link Network Model of RESCUE Service . . . . . . . . . . . . . . . . . . . . 52

Figure 18. Plot of Total Utilization on Each Access Link and the Core Link . . . . . . . . . 55

Figure 19. An End Host Configured for RESCUE Service . . . . . . . . . . . . . . . . . . . . . . . 58

Figure 20. The Model of FRTP Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Figure 21. Packet Formats in FRTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Figure 22. The Parameter-Exchange Packet in FRTP . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Figure 23. Data Sending/receiving Procedure in FRTP . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Figure 24. Feedback Checking and Processing at the FRTP Sender . . . . . . . . . . . . . . . . 71

Figure 25. Feedback Sending at the FRTP Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Figure 26. Packet-Loss Rates and Throughputs vs. the Sending Rate in FRTP Experiments

(DATA Packet Size=1500B, UDP Buffer Size=256KB, FRTP Buffer

Size=40MB, FRTP Data Block Size=8MB) . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Figure 27. An Example of Inter-packet Transmission Times within a FRTP File Transfer

(Sending Rate=50Mbps, DATA Packet Size=1500B, UDP Buffer Size=256KB,

FRTP Buffer Size=40MB, FRTP Data Block Size=8MB) . . . . . . . . . . . . . . . 76

Figure 28. CPU Utilization vs. the Sending Rate in FRTP Experiments (DATA Packet

Size=1500B, UDP Buffer Size=256KB, FRTP Buffer Size=40MB, FRTP Data

Block Size=8MB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Figure 29. Packet-Loss Rates and Throughputs vs. UDP Buffer Size in FRTP Experiments

(DATA Packet Size=1500B, Sending Rate=500Mbps, FRTP Buffer

Size=40MB, FRTP Data Block Size=8MB) . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Figure 30. Packet-Loss Rates and Throughputs vs. FRTP buffer size in FRTP Experiments

(DATA Packet Size=1500B, UDP Buffer Size=256KB, Sending

Rate=500Mbps, FRTP Data Block Size=8MB). . . . . . . . . . . . . . . . . . . . . . . . 80

Figure 31. Packet Losses and Throughputs vs. DATA Packet Size in FRTP Experiments

(MTU=1500B, UDP Buffer Size=256KB, FRTP Buffer Size=40MB, Sending

Rate=500Mbps) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Figure 32. Static Routing Decision Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Figure 33. Local-area Testbed Network Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . 87

Figure 34. RESCUE Circuit Extension with VLAN Technique. . . . . . . . . . . . . . . . . . . . 90

Figure 35. A Representation of Networks Differentiated by Signaling Capabilities . . . . 97

Figure 36. Configuration of Wide-area Testbed Network . . . . . . . . . . . . . . . . . . . . . . . . 98

Figure 37. The Concept of Router Disconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Acronyms

AAA: Authentication, Authorization and Accounting

ACK: Acknowledgement

AIMD: Additive Increase Multiplicative Decrease

ARQ: Automatic-Repeat-reQuest

ASIC: Application-Specific Integrated Circuit

ASON: Automatic Switched Optical Network

CLI: Command-Line Interface

DiffServ: Differentiated Service

DMA: Direct Memory Access

DNS: Domain Name Service

EoS: Ethernet-over-SONET

FRTP: Fixed Rate Transport Protocol

FTP: File Transfer Protocol

GbE: Gigabit Ethernet

GFP: Generic Framing Procedure

GMPLS: Generalized MultiProtocol Label Switching

HDBP: High-Delay-Bandwidth-Product

HTTP: Hyper-Text Transfer Protocol

IntServ: Integrated Service

ISP: Internet Service Provider

LAN: Local-Area Network

LSP: Label-Switched Path

MAN: Metro-Area Network

MEMS: MicroElectroMechanical System

MPLS: Multi-Protocol Label Switching

MSPP: Multi-Service Provisioning Platform

MTU: Maximum Transmission Unit

NAK: Negative Acknowledgement

NIC: Network Interface Card

OADM: Optical Add/Drop Multiplexers

OCS: Optical Connectivity Service

OFC: Optical Fiber Communications

OGSI: Open Grid Services Infrastructure

OXC: Optical Crossconnects

QoS: Quality of Service

RESCUE: Reconfigurable Ethernet/SONET Circuits for End Users

RPR: Resilient Packet Ring

RSVP-TE: Resource ReSerVation Protocol with Traffic Engineering

RTT: Round-Trip Time

SRP: Spatial Reuse Protocol

ST: Scheduled Transfer

TCP: Transmission Control Protocol

TL1: Transaction Language 1

UCLP: User Controlled LightPath

UDP: User Datagram Protocol

UNI: User-Network Interface

VBLS: Varying-Bandwidth List Scheduling

VBR: Variable Bit-Rate

VC: Virtual Concatenation

VLAN: Virtual LAN

WAN: Wide-Area Network

WAND: Waikato Applied Network Dynamics

XC: CrossConnect

XCP: eXplicit Control Protocol

Chapter 1 Background and Problem Statement

New inventions in optical communications components are driving advances in net-

working architectures and protocols. Advances in networking architectures and protocols

are enabling new applications and bringing more requirements for optical components.

New applications, in turn, are motivating new inventions in optical communications com-

ponents. Figure 1 shows how these three factors interact.

1.1 Current Optical Switching Technologies

The latest developments in optical switching components include programmable tun-

able transmitters/receivers, Optical Add/Drop Multiplexers (OADMs), Optical CrossCon-

nects (OXC), and all-optical switches [1]. Tunable transmitters/receivers either have

lasers whose output/input wavelength can be tuned as needed, or an array of lasers with

different wavelengths that can be selectively enabled. OADMs are programmable if they

can be configured to add or drop different wavelengths at different interfaces. The whole

multi-channel signal does not need to be demultiplexed in an OADM, unlike in an optical

crossconnect, where multiple fibers, each carrying multiple channels are first terminated

Figure 1. Towards Advancing the Value of Optical Networks

Opticalcommunicationscomponents

Networkingartchitectures &protocols

Applications

on demultiplexers before being crossconnected in a space-division switch fabric [2]. All-

optical switches are analog switches, where both the I/O modules and the switch fabric are

optical. It benefits from scalability, bit-rate and protocol independence, and power effi-

ciency [3]. Large-scale all-optical circuit switches using MicroElectroMechanical System

(MEMS) technology are now commercially available.

Current-day optical circuit-switched networks are built using above optical switching

components. While commercial interest in all-optical networks is increasing, current opti-

cal networks are still hybrid optical/electronic networks, which consist of optical compo-

nents as well as electronic components. For example, SONET/SDH technology dominates

current wide-area optical transport networks. It defines the standards for carrying TDM

signals with different data rates using both electronic and optical media [4][5]. In this dis-

sertation we focus on developing new architectures and applications for circuit-switched

networks using SONET/SDH technology. However, our basic networking concepts can be

readily adapted to future all-optical WDM circuit-switched networks as these become

deployed.

1.2 Current Optical Network Architectures and Applications

A representation of current optical network architecture is shown in Figure 2 [6]. The

metro access network interconnects multiple geographically-distributed enterprises build-

ings, and connects them to an access service provider MSPP/crossconnect node. The

access service provider node also belongs to a metro optical core network that intercon-

nects multiple service provider nodes, such as Internet Service Provider (ISP) and tele-

phone service provider nodes. Metro optical access networks and metro optical core

networks can have either a ring or mesh topology. Packet-switched wide-area networks,

such as the Internet, then interconnect various metro optical networks by interconnecting

various service provider nodes and offer enterprises Wide-Area Network (WAN) connec-

tivity.

One typical service provided by existing optical networks is access service for enter-

prise users. Leased access circuits at SONET/SDH rates are provisioned from enterprise

MSPPs through the access service provider node to the ISP node located on the metro core

network. Embedded within these SONET/SDH rate signals are T1s, T3s, or Ethernet sig-

nals. Another typical service is the provisioning of inter-switch/inter-router circuits, which

provide high-speed connections between switches/routers.

These two kinds of optical circuits are usually leased for long terms, and therefore lack

flexibility. To enable “simple, cost-effective, and bandwidth-efficient” services [7], many

approaches are being implemented to allow optical networks to be operated in a switched

mode, including: (i) web services such as Open Grid Services Infrastructure (OGSI) [8]

[9] and Sun MicroSystem's Jini/JavaSpaces paradigm for high-availability distributed ser-

vices [10], and (ii) signaling solutions as in Generalized MultiProtocol Label Switching

(GMPLS) [11]-[13], User-Network Interface (UNI) 1.0 specification [14], and Automatic

Figure 2. Current Optical Network Architecture

Ethernet switch/IP router

Enterprise building

Ethernethosts

Access service providerMSPP/crossconnectn

Metro opticalaccess network

Internet serviceprovider router

Internet - Packet Switched backbone network(IP routers interconnecting various networks)

Metro opticalcore network

Leased lines

Inter-switchcircuitsWide-area

optical network

Switched Optical Networks (ASON) [15]. Both GMPLS and UNI 1.0 specifications

include a signaling protocol based on the Resource ReSerVation Protocol with Traffic

Engineering (RSVP-TE) [13].

The main applications envisioned for bandwidth-on-demand optical circuit services are

fast restoration and rapid provisioning of circuits between IP routers, frame-relay

switches, or crossconnects/telephony switches. A request for fast restoration is triggered

when a failure occurs. Requests for rapid provisioning of circuits are expected to be gener-

ated by network administrators when they identify a need for additional bandwidth

between their network switches/routers. Focus has been directed primarily at inter-switch/

inter-router circuits in service provider networks because traditionally these links are the

ones that require the high-bandwidth capability of optical circuits.

1.3 Gaps between User Needs and Current Network Solutions

The applications/services in current optical networks, including long-term enterprise

leased access circuits and service providers’ bandwidth-on-demand circuits, are essen-

tially working for inter-switch/inter-router connections. However, by extending our atten-

tion to the enterprise end hosts (note: not residential), we identify three gaps between user

needs and current network solutions.

1.3.1 Access Link Bottleneck Problem

An end-to-end data communication path for an enterprise user typically consists of three

types of segments: (i) Local Area Network (LAN) segments within enterprise buildings,

(ii) access segments from enterprise buildings to service provider buildings, and (iii) wide-

area segments cross WANs. In recent years LAN and WAN network segments have expe-

rienced a tremendous increase in data rates: links in LANs are evolving from 10/100 Base-

T Ethernet to Gigabit Ethernet (GbE) and even 10GbE, and links in service provider wide-

area backbone networks are evolving to OC48 (2.5Gbps), OC192 (10Gbps), or even

OC768 (40Gbps) [6]. At the same time, the evolution of data rates in access networks has

been slow, with the capacity still limited to data rates on the order of a few megabits per

second [16][17] due to the high costs of leased circuits. This is because leased access cir-

cuits are, by definition, not shared, which translates to high costs. These leased access cir-

cuits, as we will show, are usually heavily utilized. They are identified to be the bottleneck

links on end-to-end communication paths.

To determine if access circuits are indeed a bottleneck, we conducted a measurement

study on the T3 access link that connects Polytechnic University’s data network to the

Internet. We show a partial topology of Polytechnic University’s data network in Figure 3

with two of its buildings, Dibner and Rogers Hall. Each building consists of a number of

subnets. Each subnet has an Ethernet switch. The Ethernet switches from various floors

Figure 3. Partial Topology of Polytechnic University’s Data Network

Dibner Building Rogers Hall Building

Accessrouter

GigabitEthernet

T3 access linkFrom otherbuildings/campuses

Corerouter 1 KuidasLAN 1

Corerouter 2

Ethernetswitch

are connected to Core routers located in the basements of the two buildings as shown in

Figure 3. All traffic from Dibner and other buildings, as well as from Roger Hall hosts,

destined for the WAN feed into the Core router (Core router 2 in Figure 3) in the Rogers

Hall basement. This router forwards all these packets to an access router via a 100Mbps

Ethernet link. Similarly, in the opposite direction, all packets arriving from the WAN pass

through the access router and the Rogers Hall Core router.

The access router has only two interfaces, the 100Mbps interface and the T3 link to the

WAN. This makes the 100Mbps Ethernet interface on the Access router an ideal snooping

point for access link traffic. We connected a Sun Sparc4 workstation called “kuidas” to

this 100Mbps Ethernet LAN (LAN 1 in Figure 3). Its Network Interface Card (NIC) was

set to operate in a promiscuous mode allowing it to capture all packets on the link. Consid-

ering the difference in data rates between the 100Mbps Ethernet link and the T3 access

link, our measurements could be slightly higher for the outgoing traffic* but accurate in

the incoming direction.

Running tcpdump on the workstation, we captured all packets involving wide-area com-

munications, but because of storage and privacy considerations, we only saved the first 68

bytes of all packets. This was adequate to capture the protocol layer headers for IP/ICMP,

IP/TCP and IP/UDP packets. We stored these packet headers in raw trace files to be

reduced later. We created one trace file for approximately 30 minutes of traffic to avoid

huge file sizes. A 30-minute period produced approximately 2 GB of trace data. Those

trace files were downloaded during light traffic hours to another machine with a larger

storage capacity than kuidas. We have collected a total of 37 trace files. The total volume

*Partial traffic might be dropped at the access router when the total outgoing traffic exceeds the 45Mbps of the T3 accesslink.

of these IP packet header traces is more than 80 Gigabytes (more than 1 Giga IP packet

headers). This database is available for other researchers in [18].

We wrote C++ programs to analyze the data. Our first goal was to determine the average

usage on the access link from the tcpdump trace files. We show a 10-minute-long sample

of total bandwidth usage on the access link in Figure 4 for the outgoing and incoming

directions. The time precision of these time-bandwidth curves is 1 second. Table 1 shows

the average value and 90% confidence interval of the bandwidth usage in the two direc-

tions (incoming and outgoing) for ten of the 37 trace files collected between April 24,

2002 and May 14, 2002 (part of the Spring 2002 semester, which allowed for peak mea-

surements).

The average bandwidth usage in both directions is close to 30Mbps or higher, some-

times even reaching close to 45Mbps. In the outgoing direction, we show an average of

Figure 4. One Sample Point for Total Usage of Polytechnic University Campus Access Link

49.6Mbps for the April 24, 16:01 trace. This is because we captured packets on the

100Mbps Ethernet link (see Figure 3) and not on the T3 link itself. These high loading

conditions indicate that Polytechnic University’s access link is heavily utilized and is

potentially the bottleneck link on some end-to-end paths. We also note that the data col-

lected by the Waikato Applied Network Dynamics (WAND) group on the University of

Auckland access link showed similar results [19].

1.3.2 TCP Limitations

Even though the access link bottleneck problem can be partially solved by increasing

the data rate of leased access circuits or other methods, the high-speed end-to-end commu-

nication is still a big challenge given the limitations of TCP protocol. TCP is widely used

over the Internet but is suboptimal for High-Delay-Bandwidth-Product (HDBP) networks

because of its slow start and Additive Increase Multiplicative Decrease (AIMD) conges-

tion control scheme [20]-[31]. First, TCP uses congestion window, or CWND, to deter-

mine how many packets can be sent at one time. The larger the congestion window size,

the higher the throughput. To achieve a steady-state throughput of 10 Gbps, a standard

TCP connection with a 100ms round-trip time requires an average congestion window of

Table 1. Bandwidth Usage on Polytechnic University’s T3 Access Link (Spring 2002)

Trace start time Trace length (seconds)

Average incoming bandwidth usage

(Mbps)

Average outgoing bandwidth usage

(Mbps)Apr. 24, 13:07 647 31.47 40.09Apr. 24, 13:50 658 30.2 39.8Apr. 24, 16:01 317 36.1 49.63Apr. 24, 17:01 710 30.17 33.62May. 13, 13:15 634 34.09 38.14May. 13, 14:29 582 42.43 36.12May. 13, 15:55 640 28.9 36.36May. 14, 13:25 685 29.43 30.71May. 14, 17:01 885 23.45 21.9

83,333 segments and a packet-loss rate lower than [22]. This is not realistic.

Second, TCP uses the same protocol irrespective of the end-to-end paths [32], which

results in a poor performance in wide-area environments [30]. This is because TCP send-

ers do not adapt their rate buildup scheme based on the features of the end-to-end path and

hence will incur an initial delay before the congestion window reaches a “streaming” state.

Third, bit errors are possible even on optical circuits; while optical fiber bit error rates are

very low, dust and poor connectors can increase these rates to the level. Bit errors

will be misinterpreted by the TCP sender as congestion signals, which unnecessarily

reduces the sending rate and requires a long time to recover.

These limitations make current IP-based networks, where TCP is the mostly used trans-

port protocol, hard to meet the bandwidth requirements of large file transfers, such as ter-

abyte and petabyte sized file transfers in particle physics, earth observation,

bioinformatics, radio astronomy, and other scientific studies [33].

1.3.3 Difficulty in Creating End-to-end Connections to Meet Delay/Jitter

Requirements of Interactive Real-time Applications

In addition to the bandwidth requirements, many interactive real-time applications have

strict Quality of Service (QoS) requirements for transfer delay and jitter, such as distrib-

uted collaborative visualization, remote computational steering, and/or remote instrument

control [34]. Given the connectionless nature of today’s Internet, it is hard to meet the

delay/jitter requirements of interactive real-time applications.

1.4 Problem Statement

Therefore, the problem statement of this work is as follows: Design new network archi-

tectures and solutions by exploiting and improving already-deployed circuit-switched net-

2 10 10–×

10 8–

works to bridge the gaps between user needs and network limitations. More specifically,

our work consists of three objectives:

1. Design a new network solution to solve the enterprise access bottleneck problem. The

design criterion is to successfully enable a new application providing enterprise end

users with high-speed Internet access paths, which allow applications to enjoy lower

packet-loss rates on the access path.

2. Design a new network solution to overcome TCP limitations in end-to-end data com-

munications. The design criterion is to successfully provide enterprise end hosts with

high-speed end-to-end connectivity and allow end hosts to enjoy a much better data-

transfer throughput than with current TCP/IP.

3. In addition to achieving high data-transfer throughput, we are designing our new net-

work solutions to overcome the difficulty of providing end-to-end QoS guarantees in

today’s Internet. The design criterion is to successfully provide end-to-end QoS guar-

antees, such as rate guarantee and delay/jitter guarantee to meet the requirements of

applications.

Given the difficulty in meeting above network design criteria with IP-based networks

(as we will show in the next Chapter), we develop our solution based on circuit-switched

networks. We take into account two major constraints in current optical networks while

developing our solution. First, SONET technology dominates current wide-area optical

transport networks. Originally designed for the public telephone network, SONET was

developed in the mid-1980s into a standard for optical telecommunications transport. It

defines a technology for carrying many signals of different capacities through a synchro-

nous, flexible, optical hierarchy. In recently years, advances in application-specific inte-

grated circuit (ASIC) technology has helped reduce the price of SONET switches greatly.

Meanwhile, a new set of enhancements have been proposed to drive SONET’s evolution

toward increased efficiency and flexibility in carrying data signals. As a result, the

SONET switch equipped with signaling engine and advanced transport capability began to

be deployed within enterprises networks. By noting that a SONET-based circuit-switched

network is an ideal platform to provide connection-oriented service, the question for us is

how to enable new network services by exploiting and improving the already-deployed

SONET network infrastructure. Second, Ethernet technology dominates current LAN

environments due to its low costs. This leaves us with another question of how to leverage

this Ethernet dominance to make our circuit-switched network solution more feasible and

easier to implement.

In this dissertation, we address three generic problems in creating a new network solu-

1. What is the mechanism for sharing the network resources? In circuit-switched net-

works, network resources are represented by circuits. Thus, our first mission is to

address the question of how to provision and share circuits by considering network

scalability and utilization as criteria.

2. What is the mechanism for transferring the data over the network? In other words,

what transport protocol should be used on end-to-end circuits? On an end-to-end cir-

cuit, contention for resources is resolved during circuit setup on a call-by-call basis.

Once the circuit is established, the full circuit bandwidth is dedicated for the session.

No congestion control scheme is needed to adjust the sending rate on the circuit.

Sending rates should in fact be matched as closely as possible to the circuit rates to

keep the pipe full. Therefore, our second mission is to design and implement a trans-

port protocol that can maintain a constant transfer rate.

3. What are the prospective applications? A good candidate should be one that can pro-

duce a high traffic load and fully utilize the circuit bandwidth. From the service pro-

vider’s perspective, the applications should be as common as possible to produce a

high traffic load and therefore achieve high network utilization, which translates to

low costs. From the user’s perspective, the application suited for dedicated circuits

should be able to fully utilize the circuit bandwidth and therefore achieve a low trans-

fer delay. In this dissertation, we propose two prospective applications and provide

corresponding numerical analysis on their performance.

Chapter 2 Related Work

2.1 Related Work in the Packet-Switched Networking Community

2.1.1 Related Work to Address the Access Link Bottleneck Problem

To address the first gap identified in Section 1.3, the most common solution is to

increase the data rate of the wide-area access links leased by the enterprises. The deploy of

optical fibers and ADMs has made such increases easier to implement. Nevertheless, costs

for high-bandwidth links remain quite high. For example, consider average costs for edu-

cational sites cited by NetworkVirginia [35]: a T3 link (45Mbps) has an annual cost of

about $53K, while an OC3 (155Mbps) has an annual cost of $133K. Reference [36] lists

annual costs of about $110K for an OC3, $320K for OC12, and $495K for an OC48 cir-

cuit. This is because leased links are, by definition, not shared, which translates to high

costs. Also, it is difficult for an enterprise to obtain leased-link bandwidth “on-demand”

(i.e., with short turn-around times).

Another solution to the access link bottleneck problem, currently under development, is

the Resilient Packet Ring (RPR) [37], one example of which is Cisco’s Spatial Reuse Pro-

tocol (SRP) [38]. The motivation for creating this new protocol is packet-switched rings

will be better suited for bursty Internet data traffic than the current circuit-switched

SONET/SDH/WDM rings. Due to the increased level of sharing possible on these packet-

switched rings, access link costs for high-speed access are likely to be lower than with

leased circuits. Therefore, the enterprises can more readily increase their WAN access cir-

cuit rates.

In these two approaches, while access link rates to an enterprise can be increased, the

more significant factor of packet losses on an individual flow is not addressed because

both operate in a packet-by-packet sharing mode. We show with the analysis below that

increasing the access link rate does not help if the packet-loss rate remains high, which can

happen because users often quickly fill up link capacity even as this is increased.

To study the impact of packet loss on end-to-end TCP delays, we use the analytical

model proposed by Padhye et al. [39], along with the extensions by Cardwell et al. [40],

which have been validated with both experimental and simulation results. These models

include all the complex steps of TCP data transfers: the time spent in slow start ,

the expected cost of a recovery following the first loss , the time spent in conges-

tion avoidance , and the time to delay the acknowledgement (ACK) for the initial

segment :

The reader is referred to [39][40] for the detailed closed-form expressions for each term

on the right hand side of (1). These expressions are functions of three key parameters: the

bottleneck link rate , the packet-loss rate , and the round-trip time (RTT) on the

TCP/IP path. We set the time for delayed ACKs to 0 because we assume a starting initial

window size of 2 [41] and the ACK-every-other-segment strategy.

Setting different numerical values for these three parameters as shown in Table 2 below,

we evaluate using (1) and the expressions provided in [40]. We compute

from the round-trip propagation delay by adding a rough estimate of packet queue-

ing delay using as a parameter, along with packet emission time. The values

E Tss[ ]

E Tloss[ ]

E Tca[ ]

E Tdelack[ ]

E Ttcp[ ] E Tss[ ] E Tloss[ ] E Tca[ ] E Tdelack[ ]+ + +=

r Ploss

E Ttcp[ ] RTT

Ploss RTT

presented can be thought of as the input parameters themselves with the low values

representative of metro-area paths (0.1-1ms), and higher values (50ms) representative of

wide-area paths. The bottleneck link rate and is used to compute ,

which is the size of the congestion window at which the TCP flow reaches a “streaming”

state. When the congestion window reaches , any further increase is irrelevant

because the sender does not even complete (or just about completes) emitting its current

congestion window before ACKs that increase the congestion window are received.

From the numerical results presented in Table 2, we note the following. First, as

increases, the mean transfer delay increases significantly if is high. For example,

when increases from (lightly-loaded path) to 0.001, assuming a bottleneck link

rate of 45Mbps (as in Polytechnic University’s access link), delay goes up from 19.219sec

Table 2. Mean TCP Transfer Delays for a 100MB File

Input parameters Intermediate derived results Final results

r (Mbps) Tprop (ms)Queuing delay

plus service time (ms)

RTT (ms)Wmax

(segments) for

a 100MB file (s)

0.398ms0.498 1.868 18.261

5 5.398 20.242 18.29650 50.398 188.993 19.219

1000.1

0.179ms0.279 2.325 8.220

5 5.179 43.158 8.26350 50.179 418.158 10.010

10000.1

0.018ms0.118 9.833 0.822

5 5.018 418.167 1.00150 50.018 4168.167 4.925

0.576ms0.676 2.535 18.395

5 5.576 20.91 20.85950 50.58 189.66 129.13

1000.1

0.38ms0.359 2.992 8.292

5 5.259 43.825 13.50150 50.26 418.825 128.276

10000.1

0.038ms0.126 10.5 0.860

5 5.026 418.833 12.82850 50.02 4168.833 127.682

r RTT Wmax RTT r×=

p E Ttcp[ ]

10 5–

p 10 5–

for a 100MB transfer to over 2 minutes (129.13sec) if the transfer is on a wide area (say

across the USA where is 50ms). This means for wide-area paths, it is especially

important to maintain a low . Second, we note that for such wide-area paths, there is lit-

tle benefit to be gained by increasing the bottleneck link rate if stays the same. For

example, increasing the bottleneck link rate from 100Mbps to 1Gbps results in decreasing

the mean file-transfer delay from 128.276s to 127.682sec, when is 50ms and is

0.001. The implication of this result is that even if access link rates are increased, if the

link sharing mode is still packet-by-packet “socialistic” sharing, then the link bandwidth

could be filled up with traffic from other users (especially in university access links with

students being heavy users of the Internet) causing to stay the same. Therefore, we con-

clude that it is more important to drop than it is to increase access link rate beyond a cer-

tain level.

2.1.2 Related Work to Address the TCP Limitations

To address the second gap of TCP limitations, many researchers are proposing enhance-

ments to TCP’s congestion control [22]-[24] and/or flow control [25]-[27]. These

enhancements proposed upgrades of TCP congestion control algorithm at end hosts to bet-

ter fit HDBP environments. Not requiring router upgrades makes them easier to imple-

ment. On the other hand, solutions requiring upgrades to routers have also been proposed,

such as eXplicit Control Protocol (XCP) and Jumbo Frame. XCP is a feedback-based con-

gestion control system that uses direct, explicit, feedback from routers to avoid congestion

in the network [28], while Jumbo Frame was proposed to use a larger Maximum Trans-

mission Unit (MTU) in both end hosts and routers [29].

These enhancements are essentially designed to achieve high end-to-end throughput in

Tprop p

future high-capacity Internet. However, they did not change the assumption of a shared

connectionless packet-switched Internet and still take network utilization and fairness as

the first-priority consideration. Like standard TCP, these enhancements use congestion-

control mechanism to adjust sending rates during data transfers based on congestion lev-

els. The fairness is achieved by sharing the network resources in a “socialistic” manner

among all data flows. They lack the capability to provide end-to-end QoS, such as band-

width guarantee and delay guarantee, for end users. Therefore it is hard to implement “pay

more, get more” service in IP-based networks.

2.1.3 Related Work to Address the Difficulty in Providing End-to-end QoS

To address the third gap, people made efforts to add connection-oriented characteristic

into IP-based network. Solutions consist of Integrated Service (IntServ) [42] and Differen-

tiated Service (DiffServ) [43]. IntServ provides end-to-end per-flow QoS by making

resource reservation end-to-end through RSVP signaling. However, the scalability prob-

lem due to the complexity of per-flow reservation and per-packet handling makes IntServ

inapplicable when the number of flows is large. DiffServ, instead, provides differentiated

QoS for a small number of classes by maintaining a separate queue for each class of

aggregate traffic. However, it lacks the capability to provide per-flow and end-to-end QoS.

Another solution of setting up high-speed circuits for end user sessions has been exam-

ined in a proposal called TCP switching [44]. The concept is to classify TCP flows at IP

routers and initiate requests for dynamic circuit setup for individual TCP flows through

optical circuit-switched networks. Advances in network processor technology have been

targeted at enabling high-speed flow classification needed to trigger circuit setup/release.

Nevertheless these approaches have remained difficult to realize in practice because of

scalability reasons.

2.2 Related Work in the Circuit-Switched Networking Community

Circuit-switched networking is ideal to provide “pay more, get more” service and end-

to-end QoS because of its connection-oriented nature. Increasingly a number of optical

circuit-oriented testbeds are being deployed, e.g., CANARIE's CA*net 4 [45], Starlight

[46], SURFnet [47], UKLight [48], etc. DOE's Ultranet [49] will include the ability to

offer end-to-end circuits. Some of these networks use all-optical switches with the granu-

larity of a circuit being a single wavelength, while others use hybrid electronic/optical

switches that provide sub-lambda granularity.

Most research efforts in these optical circuit-oriented testbeds are focused on how to

provision the circuits. For example, User Controlled Lightpaths (UCLP) project [50] sup-

ported by CANARIE network aims to provide user-controlled end-to-end optical circuits

to meet the QoS requirements. However, UCLP and other related projects use a central-

ized approach to provision circuits, in which the network inventory, topology, and routing

information are stored in a global database, and the circuit setup requests are processed by

a central management system. The complexity of such a centralized management system

makes fast provisioning hard to implement and limits the scalability of networks. Further-

more, proposed applications for these optical testbeds are limited to the very large data

transfers and other eScience applications within a small community instead of commodity

applications for a wide (Internet-scale) community. This results in small traffic loads, with

which it is hard to achieve high network utilization.

Chapter 3 Proposed RESCUE Service

To fill the three gaps identified in Section 1.3 between user needs and network solutions,

we propose an end-to-end optical networking solution called Reconfigurable Ethernet/

SONET Circuits for End Users (RESCUE). The concept is to provide end hosts with

high-speed, end-to-end circuit connectivity on a call-by-call shared basis, where a “cir-

cuit” consists of Ethernet segments at the ends that are mapped into Ethernet-over-SONET

long-distance circuits.

At first glance it appears that to extend the services of optical networks to end hosts, we

somehow need to extend the reach of optical networks all the way to desktops. Attempts to

extend optical networks to the desktop were made in mid-to-late nineties with ATM-to-

the-desktop projects, most of which failed. However, three major advances have occurred

since the late nineties that allow us to implement a solution for extending optical network

services to end hosts without actually dropping fiber to desktops.

First, optical fiber has been deployed extensively within both Metro-Area Networks

(MANs) and enterprise/university campuses. Second, Fast Ethernet and GbE technologies

have been deployed at end hosts using existing twisted-pair copper wires and these end

host links are not bottlenecks on end-to-end paths. The bottleneck for enterprise users is

the enterprise access link rather than the drop to the desktop. Third, a new system called

Multi-Service Provisioning Platform (MSPP) has been defined, developed, and more

importantly, already deployed within enterprises. Among its functions, which we will

review in Section 3.1, MSPPs provide a means for crossconnecting Ethernet signals from

end hosts to equivalent Ethernet-over-SONET (EoS) signals on wide-area access links.

Currently, service providers such as Verizon [35] offer “Ethernet access services” to enter-

prises through MSPPs. These Ethernet/SONET circuits (i) are leased for long durations,

and (ii) originate/terminate at routers. We propose to (i) use these hybrid circuits in a

dynamic mode, and (ii) extend them to end hosts.

Leveraging these advances, we propose an optical network service called RESCUE.

With this service end hosts should be able to dynamically request reconfigurable Ethernet/

SONET circuits for durations as short as a few milliseconds. We describe factors that

enable RESCUE service in Section 3.1, and the basic RESCUE architecture and opera-

tions in Section 3.2. In Section 3.3, we describe an important aspect of our approach,

which is to use additional NICs at end hosts for the RESCUE service so that it is an “add-

on” service to basic Internet access rather than a replacement. Two types of applications

using RESCUE solution are addressed in Section 3.4.

3.1 Enabling Technologies

As stated earlier, MSPPs have already been deployed in enterprises. The primary reason

for this deployment is to integrate T1s from PBXs carrying voice traffic and T1s/T3s/

Ethernet signals from wide-area-access IP routers carrying data traffic on to a SONET/

SDH/WDM signal used for wide-area access (hence the term “multi-service”). For our

proposal, the multiplexing aspect of MSPPs is not relevant. Instead we exploit the ability

of MSPPs to encapsulate Ethernet frames into SONET frames using EoS specifications,

such as Generic Framing Procedure (GFP) [51], along with Virtual Concatenation (VC)

[52], a technique for allowing arbitrary-bandwidth SONET signals to be created to reduce

wasted bandwidth [53], e.g., a 100Mbps Ethernet signal can be carried on two OC1 cir-

cuits instead of an OC3 circuit.

The architecture of a typical MSPP [54]-[56] is shown in Figure 5. Nodes within an

enterprise are connected to interface cards, such as Ethernet (10Mbps/100Mbps), T1, T3,

and Gbps Ethernet. The Ethernet cards encapsulate Ethernet frames into SONET frames

using EoS devices. The CrossConnect (XC) card is used to crossconnect signals from

incoming ports to outgoing ports. The control card typically has a processor and imple-

ments management software to control the MSPP. Communication with the control card is

through its own Ethernet and/or serial interface. The wide-area access link card is a high-

rate SONET, SDH and/or WDM interface. Typically, Ethernet, T1, T3 signals from the

interface cards connected to nodes within the enterprise are crossconnected through the

XC card to equivalent-rate signals on the wide-area access SONET link.

Increasingly, optical crossconnects and MSPPs now implement control-plane signaling

protocols to enable the dynamic setup and release of optical circuits across the network.

For example, a signaling protocol interoperability test involving many vendors’ products

was demonstrated at the Optical Fiber Communications (OFC) Conference in March 2003

Figure 5. SONET Multiservice Provisioning Platform (MSPP) Architecture

Control OC12, OC48, OC192

Interface cards Ethernet T1 T3

Cables to the enterprise switches/routers/PBXs

SONET/SDH/WDMwide-area access link

(10/100)

XC:cross-connect card

Ethernet and/or serial interface

3.2 Architecture and Operations

Details of the end host configuration and the MSPP within an enterprise are shown in

Figure 6.

RESCUE hardware configuration requires:

1. Second Ethernet NICs in end hosts, which are connected to the Ethernet ports of a

signaling-capable enterprise MSPP.

2. A high-speed optical circuit with multiple channels should be leased from the enter-

prise MSPP to either a wide-area signaling-capable network switch or another signal-

ing-capable enterprise/service-provider MSPP.

3. Software enhancement is needed at end hosts to generate call setup/release requests

for applications that can benefit from high-bandwidth RESCUE service. The details

of end host RESCUE software will be discussed in Chapter 6.

RESCUE operation is as follows. RESCUE circuit consists of multiple channels and

Figure 6. Configuration of End Hosts for RESCUE Service

Optical circuit-switched networkEthernetswitch/IP router

Ethernethosts

RESCUE circuit withmultiple channels

NIC 2 NIC 1

Enterprise building

To ISP's router

To ISP's router oranother signaling-capable

network switch

MSPP EthernetInterface

From other endhosts

SONETInterface

Primary Internetleased access circuit

Application +RESCUE software

allows end hosts sharing these channels on a call-by-call basis. A call setup request for a

RESCUE Ethernet/SONET circuit is generated by end-host software. This request is

received by the control software on the enterprise MSPP to which the requesting-host’s

second NIC is connected. The control software locates a free equivalent-rate SONET cir-

cuit on its access circuit and crossconnects the Ethernet signal from the requesting end

host to this SONET circuit. The enterprise MSPP’s control software then forwards the

call-setup request to the next switch on the path. Circuit setup proceeds hop-by-hop in this

manner. Once setup, the circuit is held and used for a short duration, and then released

using a similar hop-by-hop circuit release procedure. Subsequently a different communi-

cation session can reuse the same resources. The advantage of this dynamic, distributed

circuit-provisioning approach with signaling is apparent comparing to the traditional cen-

tralized approach using management systems. It is scalable given the much less processing

complexity at each hop. The superfast-provisioning becomes possible by implementing

hardware-accelerated signaling [58] at circuit switches, which is a key to achieve high net-

work utilization.

Figure 7 illustrates an example of circuit setup in metro-area networks. An end host in

enterprise 1 requests an Ethernet/SONET circuit to a router in service provider M’s net-

work. Call setup proceeds hop-by-hop with the signaling messages (RSVP-TE Path mes-

sage in the forward direction and RSVP-TE Resv message in the opposite direction) being

processed at each intermediate node. If resources are available on links L1, L2, L4, and

L6, the MSPPs and optical crossconnects en route will be programmed for the circuit (the

dashed line represents the dynamically setup circuit). RESCUE Ethernet/SONET circuits

can be set up from a host to a router/switch or another host.

RESCUE service can be introduced gradually by interconnecting signaling-capable

switches/MSPPs via leased circuits. For example, two buildings of a single organization

located within a metro area may lease a multi-channel circuit between the buildings but

share these channels on a call-by-call basis. In this scenario, only the enterprise MSPPs

would need to be signaling-capable.

3.3 RESCUE as an “Add-on” Service to Primary Internet Access

We illustrate how RESCUE service is configured as an “add-on” service to primary

Internet access in Figure 8. The primary NICs in end hosts are connected through the usual

LAN Ethernet switches/IP routers to the enterprise MSPP, which in turn is connected to an

Internet router by a leased circuit passing through the enterprise MSPP. For example, in

Figure 8, Leased circuit I is the primary Internet access link for enterprise building 1.

Hosts requiring access to RESCUE service will be equipped with second NICs as shown

in Figure 8. These second NICs are connected to ports on the enterprise MSPP’s Ethernet

Figure 7. The RESCUE Concept: Share optical network circuit resources on a call-by-callbasis and create high-speed Ethernet/SONET circuits on-demand; lines with arrow-heads denote signaling messages; the dashed line denotes the dynamically setupEthernet/SONET circuit

Enterprise 1

MSPP MSPP

Hosts/Routers

Metro-area optical circuit-switched network consisting ofelectronic/all-optical crossconnects/add-drop multiplexers

Enterprise N Service provider 1 (e.g., ISP) Service provider M

L1 L3L5

L4L2L6

Ethernet

Wide-areanetwork(WAN)

(3) (4)

(9)(10)

(12)(6)

(5)(8)

(1)-(6): Path message(7)-(12): Resv message

EndHosts

Hosts/Routers

interface card allowing them to be crossconnected on-demand to equivalent EoS circuits

at the MSPP. For communication between two entities that can be connected by a direct

EoS circuit, there is a choice of two paths: the primary TCP/IP path and an Ethernet/

SONET circuit. For example, an end host in enterprise building I with a second Ethernet

NIC configured for RESCUE service has two paths to Router I in Figure 8: (i) the primary

leased circuit I reachable through its primary NIC, enterprise Ethernet switches/IP routers,

and its MSPP (see solid line marked “Leased circuit I”), and (ii) an on-demand Ethernet/

SONET circuit through its second NICs and MSPP (see dashed line from the MSPP in

enterprise building I to Router I).

The presence of two such paths raises the question of which path an end-host applica-

Internet - Packet Switched (IP routersinterconnecting various networks)

Ethernethosts

T1/T3/Ethernet

Ethernet switches/IP routers

MSPPEnterprise building I

T1/T3/Ethernet

Enterprise building II

Optical

of SONET/SDHWDM Add/DropMultiplexers (ADMs),

.........

Enterprise/MDU

Figure 8. RESCUE as an “Add-on” Service: the thick dashed lines show Ethernet/SONET cir-cuits set up on-demand between end hosts’ second NICs and routers, or between the sec-ond NICs of two distant end hosts. In both cases, these become alternative paths to theprimary paths available through the hosts’ primary NICs.

Leasedcircuit II

Leasedcircuit I

Ethernet over SONET (EoS) circuit

networks consisting

crossconnects

building

Packet-switchednetworks

(MPLS,FrameRelay,ATM,etc.)

Differenttypes ofaccess networks(PDH,CATV, wireless,FTTH,etc.)

Enterprises and homes

Primary NICs

Second NICs Router I

NIC: Network Interface Card

circuit-switched

Set 1Set 2 Set 3

requested on-demand

Ethernet over SONET (EoS) circuitrequested on-demand

Ethernethosts

Ethernet switches/IP routers

tion should choose. We recognize that it is not appropriate to attempt a circuit setup for all

communication sessions. For example, for a small-file transfer (file size is on the order of

a few KB), the total delay incurred in setting up a circuit and then transferring the file

could be larger than the delay incurred in directly using the TCP/IP path. Thus, a routing

decision needs to be made at end hosts with access to RESCUE. We provide a thorough

analysis for the routing decision in Chapter 4 and 5, and the details of routing decision

implementation in Chapter 6.

Having the option to fall back to the primary TCP/IP path allows for a RESCUE service

provider to operate the circuit-switched network at a high enough call-blocking probabil-

ity to achieve satisfactory utilization. As is well known, resource utilization and call-

blocking probability operate at cross purposes. Without the option of falling back to the

primary TCP/IP path, the circuit-switched network would need to be engineered to operate

at a low call-blocking probability at the expense of utilization. It would make it more diffi-

cult to achieve “cost-effective, bandwidth-efficient” optical networks. The presence of the

dual path also allows applications to take advantage of both paths during a file transfer as

will be explained in Chapter 5.

The RESCUE concept and its applications are novel in three ways. First, the RESCUE

architecture is a “parallel-hybrid” solution in contrast to today’s “sequential-hybrid net-

works” where different types of switches could exist sequentially on an end-to-end path.

RESCUE is proposed as an “add-on” service to existing Internet connectivity that extends

all the way to end hosts giving end hosts an option between two paths: (i) an Internet

packet-switched path, and (ii) a dedicated circuit. End-host applications need to make a

routing decision on which path to use. In current-day networks, such routing decisions are

typically made only at switches, not at end hosts. Drawing an analogy to transportation

system, we note that in some situations (e.g., travelling between New York and Boston),

people have a choice of multiple transportation options. In this dissertation, we illustrate

the advantages and costs of such an approach. Second, not only is our proposal to extend

bandwidth-on-demand high-speed (e.g., Gbps, 10Gbps) circuit services to end hosts new

to the optical networking research community, but our proposal to enable these networks

to support calls with very small holding times (e.g., in the order of milliseconds for single

data transfers within a file transfer application) pushes the envelope of bandwidth-on-

demand thinking. By introducing small data transfers as well as elephant data transfers as

applications, we are aiming to create a large-scale circuit-switched network providing

commodity services. The high network utilization required by a scalable network is possi-

ble with high traffic loads from commodity services, which translates to the low costs seen

by users.

3.4 Applications

Next, we consider the question of how to use RESCUE circuits for applications. In this

dissertation, we address two applications that can use RESCUE circuits: (i) Dial-Up ser-

vice to dynamically set up high-speed circuits bypassing enterprise access links to the

ISP’s routers, and (ii) end-to-end file-transfer application,.

In Dial-Up service, RESCUE circuits would essentially connect end hosts directly to the

ISP routers serving the enterprise. It would bypass the shared access circuit of the enter-

prise and thus allow the end host applications to enjoy lower packet loss. This addresses

the access link bottleneck problem (the first gap) described in Section 1.3. In the end-to-

end file transfer application, end-to-end RESCUE circuits are established between end

hosts located on optical circuit-switched networks. By using a high-speed transport proto-

col instead of TCP on RESCUE circuits, low file-transfer delays can be achieved. This

addresses the TCP limitations (the second gap) described in Section 1.3.

Chapter 4 Application I: Dial-Up Internet access service

using RESCUE circuits

In Section 1.3.1, we identified the enterprise access link bottleneck problem. Further-

more, in Section 2.1 we noted that the absolute data rates of these bottleneck links are less

important than packet-loss rates, which are usually high on the heavily congested access

circuits. Given these results, we propose a Dial-Up Internet access service that bypasses

the shared access link of the enterprise and thus allows the end host application to enjoy

lower packet-loss rates. It not only increases the access bottleneck link rate, but more

importantly decreases the probability of loss experienced by a single TCP flow to signifi-

cantly improve file-transfer delays. Section 4.1 describes the operational steps in using

RESCUE for Dial-Up service. Section 4.2 and 4.3 describes our analytical basis for the

routing decision.

4.1 Description

The architecture of our Dial-Up service is shown in Figure 9. In Dial-Up service, an end

host with a second Ethernet NIC can request a direct high-speed Ethernet/SONET circuit

to the ISP’s IP router. This is comparable to current-day Dial-Up telephone service but at a

significantly higher bandwidth. The RESCUE software will make a routing decision on

whether to use the host’s primary NIC or whether to request an Ethernet/SONET circuit

through the host’s second NIC to the ISP’s IP router. The circuits are held only for the

duration of single transfer within a TCP session (note that within the holding time of a

TCP connection, there can be many data transfers). This allows for increased sharing of

resources. Without such an implementation, the SONET resources on the access link will

lie unused during a user’s “think” time. In other words, sharing of the access link set up

for RESCUE access service will be on a call-by-call basis rather than a packet-by-packet

basis as is done on current access links. This results in better performance for the call that

does successfully obtain a RESCUE circuit, but the price paid is access-link utilization.

We will discuss this trade-off along with many other details in the following subsections.

When an end host’s RESCUE software requests a circuit by generating a signaling mes-

sage, the SONET/SDH optical circuit-switched network consisting of the enterprise

MSPP, access network switches and ISP’s MSPP (see Figure 9) may or may not be able to

accommodate the request. If no spare circuit is available on this path, then the call setup

request is blocked. The end host software is programmed to handle call blocking by then

Figure 9. Dial-Up Access Service Architecture

Dial-Up server(signaling

+configuration

software)

Internet serviceprovider

SONETMSPP

Ethernetswitch/IP router

Ethernethosts

User space

EthernetInterface

RESCUE circuitfor Dial-Up service

Ethernetswitch/IP

router

From otherend hosts

ARP tableMap MAC addresses

to newly setupRESCUE circuit

Routing tableMap IP address to

newly setupRESCUE circuit

NIC 2 NIC 1

SONETMSPP

Enterprise building

Optical circuit-switchedaccess network

switching to the primary NIC and using the shared WAN access link. If however a circuit

is available, the MSPP will crossconnect the host’s Ethernet signal to an equivalent-rate

SONET/SDH circuit established via the access network to the ISP’s IP router.

Since the ISP’s IP router is the terminating endpoint for the Ethernet/SONET circuit, it

should be capable of receiving signaling messages and accepting/rejecting circuit setup

requests. However, given the difficulty in adding new software to routers, we propose an

external “Dial-up server,” which consists of (i) software to terminate signaling messages

on behalf of the IP routers, and (ii) software to configure the routers. The signaling com-

ponent of the software will respond to the signaling messages issued by MSPPs/access

network switches. The configuration part of the software will have administrative user

permissions to write into the routing table and ARP table of the ISP’s IP router. This step

is required to create a mapping of the IP address and MAC address of the host connected

temporarily via the newly established Ethernet/SONET circuit to the corresponding inter-

face on the ISP’s IP router because at different time instants, different Ethernet hosts are

reachable through the same interface of the IP router. It allows the router to route packets

arriving from the Internet destined to end hosts by consulting the updated routing and

ARP tables.

Different arrangements are possible for increasing levels of sharing with the RESCUE

access service:

1. An enterprise could lease bandwidth from its MSPP directly to the ISP’s IP router for

RESCUE Dial-Up access service. This model is similar to today’s wide-area Internet

access link where a leased circuit is obtained by an enterprise to terminate directly on

the ISP’s IP router. This leased circuit is shared on packet-by-packet basis. In con-

trast, the RESCUE leased circuit would be shared on a transfer-by-transfer basis.

Multiple simultaneous flows can be accommodated if the leased circuit bandwidth

can support multiple Ethernet-rate circuits. For example, with an OC12 leased cir-

cuit, six simultaneous Fast Ethernet RESCUE circuits can be supported. This

arrangement is easy to introduce within today’s Internet, however, with the limited

sharing, utilization may be compromised to achieve low call blocking probability.

2. In addition to the previous arrangement, if the ISP’s MSPP supports signaling proto-

cols, then the interfaces on the ISP’s IP router can be shared. In the previous arrange-

ment, for each enterprise that leases say Ethernet-rate circuits, the ISP’s IP router

needs an equivalent number of interfaces connecting it to the MSPP. If the ISP’s

MSPP supports signaling, there can be aggregation with call-level sharing of a small

set of ISP’s router interfaces among many enterprises. This will increase sharing and

hence lower costs.

3. A more advanced arrangement requires the access network provider to upgrade their

SONET/SDH/WDM switches with signaling capability. The RESCUE access service

becomes truly shared among many enterprises reaching many ISPs via the access net-

ISPMSPP

Subnet 1

Subnet 2

128.239.5

156.78.5

Access

Destination I/FIP routing table

128.239.5 le0156.78.5 le0

Figure 10. An Extension of the Dial-Up Access Service

network

Enterprise MSPP

Ethernetswitch

....PC22

IP router

Another extension of the Dial-Up access service concept is to not only allow end hosts

to connect their second NICs directly into MSPP ports, but also allow Ethernet switches

serving small subnets to be connected to the enterprise MSPP ports for RESCUE access

service as shown in Figure 10. When the first host on an Ethernet switch requests a RES-

CUE circuit, the MSPP sets up this circuit (assuming resources are available). If a second

host on the same Ethernet switch initiates an application that causes a RESCUE circuit

request, the enterprise MSPP’s signaling software can respond saying the RESCUE circuit

is already established. This requires a small amount of extra book-keeping on the part of

the enterprise MSPP’s signaling software. It needs to know the MAC addresses of all the

hosts hanging off the Ethernet switches connected to each of its RESCUE ports. Another

change is in the last step of the RESCUE circuit setup, which involves writing the routing

table and ARP table at the ISP’s IP router. Here instead of one IP address and one MAC

address, the addresses of all the hosts connected to the Ethernet switch on the RESCUE

circuit should be written into the ISP’s IP router tables. For example, in Figure 10, if the

Ethernet signal from the Ethernet switch with subnet number 128.239.5 is first crosscon-

nected to an equivalent-rate SONET signal, which is then released and the same SONET

signal crossconnected to the Ethernet switch with subnet number 156.78.5, then the IP

routing table entry for the former should be replaced with an entry for the latter.

4.2 Analytical Basis for the Routing Decision: Delay Analysis

In this section, we create an analytical model and obtain numerical values to provide a

quantitative basis for the routing decision. Let be the mean transfer delay

incurred if a Dial-Up circuit setup is attempted prior to a data transfer within a TCP ses-

E Tdialup[ ]

where is the call-blocking probability on the circuit-switched network, is

the mean call-setup delay of a successful circuit setup, is the mean delay

incurred in a failed call-setup attempt, is the mean time to transfer the file

using the Dial-Up circuit for access, and is the mean time to transfer the file

using the primary access link. If the call is not blocked, mean delay experienced is

, but if it is blocked, then after incurring a cost , the end

host has to use the TCP/IP path and hence will incur the delay. If a circuit

setup is not attempted, the mean delay is simply .

We compare from (2) with to determine whether the Dial-Up

end host should directly resort to the primary path or whether it should attempt a Dial-Up

circuit setup. Approximating to be equal to , we get:

and are computed using the analytical models of TCP pre-

sented in [39] and [40] (equation (1) in Section 1.3.1) with different packet-loss rates,

and , different bottleneck link rates, and , respectively,

but the same end-to-end propagation delay, . The mean call-setup delay is

derived by counting mean signaling message transmission delays, mean call-processing

delays (to process signaling protocol messages), and a round-trip propagation delay

between the Dial-Up end host and the ISP’s IP router :

E Tdialup[ ] 1 Pb–( ) E Tsetup[ ] E Ttcpdialup[ ]+( ) Pb E Tfail[ ] E Ttcp

primary[ ]+( )+=

Pb E Tsetup[ ]

E Tfail[ ]

E Ttcpdialup[ ]

E Ttcpprimary[ ]

E Tsetup[ ] E Ttcpdialup[ ]+ E Tfail[ ]

E Ttcpprimary[ ]

E Tdialup[ ] E Ttcpprimary[ ]

E Tfail[ ] E Tsetup[ ]

if E Tsetup[ ]

1 Pb–---------------------- E Ttcp

primary[ ] E Ttcpdialup[ ]–( )≥⎝ ⎠

⎛ ⎞ resort directly to the TCP/IP path

if E Tsetup[ ]

1 Pb–---------------------- E Ttcp

primary[ ] E Ttcpdialup[ ]–( )<⎝ ⎠

⎛ ⎞ attempt circuit setup

E Ttcpprimary[ ] E Ttcp

dialup[ ]

pprimary pdialup rprimary rdialup

Tprop E Tsetup[ ]

Tpropdialup

where is the cumulative size of signaling messages used in call setup, is the sig-

naling link rate, is the number of switches on the Dial-Up circuit path (between the

Dial-Up end host and the ISP’s IP router), and is the call-processing delay incurred at

each switch. We approximate the queueing delay for the signaling link with an M/D/1

queue at a load , and the queueing delay for the call processor also with an M/D/1*

queue at a load .

Numerical results:

In addition to the numerical values shown in Table 2, we compute additional values in

Table 3.

*M/D/1 queueing models are quite accurate here since inter-arrival times between file transfers have been shown tobe exponentially distributed [59], and signaling message lengths and call-processing delays are more-or-less con-stant.

RTT (ms)Wmax

(segments) for

a 100MB file (s)

0.10.436

0.536 2.01 18.2695 5.436 20.385 18.381

50 50.436 189.135 26.039

1000.1

0.1970.297 2.475 8.222

5 5.197 43.308 8.41250 50.197 418.308 23.200

10000.1

0.0200.120 10 0.824

5 5.020 418.333 2.320150 50.020 4168.333 21.320

E Tsetup[ ]msigrs

--------- 1 ρsig

2 1 ρsig–( )---------------------------+⎝ ⎠

⎛ ⎞× k 1+( )× Tsp 1 ρsp

2 1 ρsp–( )-------------------------+⎝ ⎠

⎛ ⎞× k× Tpropdialup+ +=

msig rs

p E Ttcp[ ]

5 10 5–×

We plot the two sides of (3) in Figure 11 assuming that both and are

1Gbps, and the following parameter values for : , ,

0.0001

0.4600.560 2.1 18.274

5 5.460 20.475 18.50850 50.46 189.225 37.572

1000.1

0.210.31 2.558 8.226

5 5.21 43.392 8.63250 50.2 418.392 36.030

10000.1

0.0210.121 10.083 0.825

5 5.021 418.417 3.60350 50.02 4168.417 35.107

0.0005

0.5320.632 2.37 18.324

5 5.532 20.745 19.54550 50.53 189.495 88.701

1000.1

0.2390.339 2.825 8.253

5 5.239 43.658 10.40850 50.24 418.658 88.876

10000.1

0.0240.124 10.333 0.840

5 5.024 418.667 8.88850 50.02 4168.667 88.463

0.7310.831 3.116 19.124

5 5.731 21.491 34.35650 50.73 190.241 303.936

1000.1

0.3290.429 3.575 8.670

5 5.329 44.408 31.93050 50.33 419.408 301.528

10000.1

0.0330.133 11.083 1.025

5 5.033 419.417 30.15350 50.03 4169.417 299.754

0.8330.933 3.499 20.210

5 5.833 21.874 51.13050 50.833 190.624 445.549

1000.1

0.3750.475 3.958 9.229

5 5.375 44.792 47.11250 50.375 419.792 441.535

10000.1

0.0380.138 11.5 1.243

5 5.038 419.833 44.15850 50.038 4169.833 438.581

RTT (ms)Wmax

(segments) for

a 100MB file (s)p E Ttcp[ ]

rprimary rdialup

E Tsetup[ ] ρsp ρsig 0.7= = k 4=

, , , and . The 4 s call-pro-

cessing delay is based on our work on hardware-accelerated signaling protocol implemen-

tations [60]. is set to either 0.1ms or 50ms. Note that is the end-to-end round-

trip propagation delay between the two end hosts participating in the TCP connection,

while is the round-trip propagation delay between the end host invoking a RES-

CUE circuit and its ISP’s IP router. The latter is likely to be local and hence we only use

the 0.1ms number.

For the three horizontal lines in Figure 11 on which values are listed, the y-axis is the

left-hand side of (3), i.e., . For the remaining three lines, which are

marked with values and values, the y-axis is the right-hand side of (3),

i.e., , which we refer to as the “difference” curves. Under some

circumstances, there are crossovers between the difference curves and the horizontal lines.

For transfers of sizes below the crossover size, the end-host software should resort directly

to using the primary path, and for file sizes larger than the crossover size, the software

msig 100B= rs 10Mbps= Tpropdialup 0.1ms= Tsp 0.004ms= µ

Tprop Tprop

Tpropdialup

Figure 11. Plot of Equation (3) with a Link Rate of 1Gbps, , ρsig ρsp 0.7= = k 4=

(a) Tprop is 0.1ms (b) Tprop is 50ms

E Tsetup[ ] 1 Pb–( )⁄

pprimary pdialup

E Ttcpprimary[ ] E Ttcp

dialup[ ]–

should attempt a Dial-Up circuit setup.

We computed crossover sizes for various combinations of , , ,

and . The crossover file sizes for different conditions are listed in Table 4-5. The

first column in Table 4-5 show the values of the packet loss rate on the end-to-end path,

, if the primary access link is used and the packet loss rate on the end-to-end path,

and , if a RESCUE access circuit is setup. For different values of this pair of

parameters, we compute the crossover file size above which a RESCUE circuit setup

should be attempted for a data transfer under different operating conditions (call blocking

probabilities) of the access network.

, the round-trip propagation delay for the file transfer (from client to server) is set

to either 0.1ms or 50ms. The round-trip propagation delay between the enterprise building

and the ISP’s router, incurred when setting up a RESCUE circuit, as in (4), is

0.1ms. The bottleneck link rates on the primary path and on the path with the RESCUE

circuit are both set to either 1Gbps (Table 4) or 100Mbps (Table 5).

Table 4. Crossover File Sizes when

0.0001, 0.00001 40MB 43MB 52MB 349KB 361KB 396KB0.0001, 0.00005 58MB 63MB 79MB 387KB 403KB 449KB0.001, 0.00001 3.9MB 4MB 4.8MB 84KB 85KB 90KB0.001, 0.0005 5.5MB 6MB 7.5MB 78KB 81KB 88KB0.01, 0.00001 318KB 344KB 424KB 20.7KB 20.9KB 21.1KB0.01, 0.005 491KB 534KB 671KB 16.2KB 16.5KB 17.1KB

0.0001, 0.00001 21MB 23MB 28MB 318KB 331KB 367KB

pprimary pdialup rprimary

rdialup

pprimary

pdialup

Tpropdialup

rprimary rdialup 1Gbps= =

Tprop 0.1ms= Tprop 50ms=

pprimary pdialup Pb 0.01= Pb 0.1= Pb 0.3= Pb 0.01= Pb 0.1= Pb 0.3=

rprimary rdialup 100Mbps= =

From these results we see that if the file transfer is across a wide area (high round-trip

propagation delays), a RESCUE circuit should be attempted even for small files (in the

order of KB) unless the end host knows that its access link is not the bottleneck link on the

path. For metro-area transfers ( of 0.1ms), a RESCUE circuit should be attempted

for files in KB range if the end host knows that upon setting up a RESCUE circuit it can

clearly lower the end-to-end loss rate from say 1% to 0.001% or to even half its value

0.5%. If the bottleneck link rates are lower at 100Mbps, the crossover file sizes beyond

which a RESCUE circuit should be attempted are even lower than when this link rate is

1Gbps.

Finally, we give crossover file sizes by assuming different bottleneck link rates for the

RESCUE path and the primary path. In general one can expect the primary leased link

bandwidth to be higher than an individual host’s RESCUE circuit. For example, enter-

prises may lease OC3 links for primary access, while RESCUE hosts may only have

100Mbps Ethernet NICs as their second (RESCUE) NICs. Even if the latter link rate is

lower, there are crossover file sizes above which attempting a RESCUE circuit is benefi-

cial if the packet loss rate of the RESCUE path is lower than on the primary path. As noted

in Section 2.1, link rates sometimes have little impact on total delay. Table 6 shows these

0.0001, 0.00005 27.5MB 30MB 37MB 372KB 388KB 435KB0.001, 0.00001 2MB 2.2MB 2.8MB 75KB 78KB 82KB0.001, 0.0005 3.2MB 3.6MB 4MB 75KB 78KB 85KB0.01, 0.00001 128KB 141KB 182KB 18.6KB 18.9KB 20.4KB

0.01, 0.005 218KB 236KB 292KB 14.9KB 15.2KB 16.8KB

rprimary rdialup 100Mbps= =

crossover file sizes.

4.3 Analytical Basis for the Routing Decision: Utilization Analysis

Sharing of the access circuit for Dial-Up service is on a call-by-call basis rather than a

packet-by-packet basis on current access circuits. This results in better performance for

the call that does successfully obtain a RESCUE circuit, but the price paid is utilization.

Per-circuit utilization uc if a RESCUE circuit is used is given by:

where f is the size of the file being transferred. We plot the numerical results for per-circuit

utilization uc in Figure 12 for different values of round-trip propagation delay, , and

RESCUE circuit link rate, .

We first note that as file size f is increased, the per-circuit utilization increases. Once a

TCP connection is established, it begins with the initial slow start phase, where it slowly

increases its congestion window. Thus the corresponding utilization begins with a low

value and increases slowly. When the file size is small, the data transfer is completed

before the TCP connection enters the streaming state, which leads to low utilization [58].

If the file size is large enough (e.g. 200KB when is 0.1ms and 50MB when is

Table 6. Crossover File Sizes when , , and

0.0001, 0.00001 405KB 416KB 447KB0.0001, 0.00005 487KB 500KB 538KB0.001, 0.00001 85KB 87KB 91.3KB0.001, 0.0005 90KB 92KB 99.5KB0.01, 0.00001 18.9KB 19KB 20.4KB0.01, 0.005 16.2KB 16.4KB 17.2KB

Tprop 50ms= rprimary 155Mbps=

rdialup 100Mbps=

pprimary pdialup Pb 0.01= Pb 0.1= Pb 0.3=

ucf rdialup⁄

E Ttcpdialup[ ] E Tsetup[ ]+

------------------------------------------------------=

rdialup

Tprop Tprop

50ms), the TCP connection will finally reach the streaming state, where data packets will

effectively be transmitted continuously, and a higher utilization will be seen.

The second observation is that the propagation delay has a significant impact on utiliza-

tion. A higher utilization is achieved in low-propagation-delay environments than in high-

propagation-delay environments. For example, with a 100Mbps link rate and 10-5 packet

loss rate, a 10MB file transfer over a RESCUE circuit results in a 97.2% per-circuit utili-

zation when the propagation delay is 0.1ms, but only 57.1% when the propagation delay is

50ms. From our file transfer delay analysis, we found that RESCUE circuits should be

attempted even for small files across a wide area (high round-trip propagation delays), but

from a utilization perspective we see the need to place a lower bound on file sizes for

wide-area transfers.

Finally, in Figure 12 we see that increasing the link rate of RESCUE circuits from

100Mbps to 155Mbps results in a drop in per-circuit utilization when the propagation

Figure 12. Plot of Per-circuit Utilization for Files in the Range of(10KB, 50MB) with =0.00001pdialup

delay is 50ms. For example, for a 50MB file transfer, the drop is from 80% to 67%. As

noted in Section 2.1, increasing link rates beyond some level has little positive impact on

file transfer delays especially when the propagation delay is high. Therefore it is not

always beneficial to increase link rates.

4.4 Chapter Summary

Enterprise access links remain bottlenecks even as LAN and WAN link rates increase. In

this chapter, we proposed a Dial-Up Internet access service in which end hosts run device

drivers that request Dial-Up RESCUE circuits from the end host to an ISP IP router,

bypassing congested shared access links. This results in lowered packet-loss rates and thus

translates into lower mean file-transfer delays (or increased TCP throughputs). This is

especially true for wide-area TCP paths when round-trip propagation delays are high. The

circuits leased for Dial-Up service are shared on a call-by-call basis unlike Internet access

lines that are shared on a packet-by-packet basis. This helps improve user performance.

The trade-off is in utilization. To keep utilization high, we propose that a Dial-Up circuit is

only held for durations of single data transfers.

Chapter 5 Application II: End-to-end RESCUE Circuits

to Improve File Transfer Delays

There is a growing interest in improving current protocols or developing new ones to

increase the effective throughput of file transfers on the Internet [61][62]. Of particular

interest is the effective throughput of transfers of large files, for which current TCP has

been shown to be inadequate [21]. Contrary to the conventional thinking of video-stream-

ing transfers being the prime contributor to high-bandwidth applications, we note that file

transfers can enjoy any amount of bandwidth. The higher the rate, the lower the file-trans-

fer delay. This is unlike video-streaming applications, which with compression technolo-

gies often require only a few Mbps but long durations. Increasingly the Grid community is

recognizing the value of optical circuit-switched networks to carry out transfers of very

large files created by scientists [62]. Thus, we propose to use high-speed RESCUE circuits

for the end-to-end file transfer application.

Section 5.1 describes the operational steps in using end-to-end RESCUE circuits for file

transfers. Sections 5.2 and 5.3 describe our routing decision algorithm based on delay and

utilization analysis.

5.1 Description

Figure 13 shows the architecture for the proposed end-to-end file transfer applications.

In the end-to-end file transfer applications, high-speed, end-to-end RESCUE circuits are

requested automatically by end-host software when file-transfer applications on the end

host require high-throughput end-to-end communication. The circuit consists of a concat-

enation of Ethernet signals from end hosts to MSPPs within enterprises and Ethernet-over-

SONET signals between enterprise MSPPs across wide-area optical circuit switched net-

works.

File-transfer applications based on Hyper-Text Transfer Protocol (HTTP) and File

Transfer Protocol (FTP) typically involve the exchange of small messages prior to the

actual data transfer. Exploiting the presence of the dual paths (another one of our reasons

for RESCUE being an “add-on” service), we propose using the primary TCP/IP path for

these small messages. RESCUE circuits are used only for the actual data transfer. To

achieve high circuit utilization, we propose that the circuit be held only for the duration of

the actual data transfer and released immediately upon completion. Furthermore, we rec-

ommend that the EoS circuit be unidirectional from the sender to the receiver.

Optical Connectivity Service (OCS): When a sending host is ready to transfer a file, it

has to determine whether the correspondent end can be reached by a direct Ethernet/

Figure 13. Use of RESCUE Circuits for End-to-end File Transfers

Internet - Packet Switches(IP routers interconnecting

various networks)

Optical circuit-switchednetworks

SONETMSPP

Kernalspace

Ethernethosts

User space

EthernetInterface

From otherend hosts

NIC 2 NIC 1

Enterprise building

SONETMSPP

Kernalspace

Ethernethosts

User space

EthernetInterface

From otherend hosts

NIC 2NIC 1

Enterprise building

RESCUE circuit for End-To-End file transfer

service

SONET circuit. We propose a service called Optical Connectivity Service (OCS), similar

to the Domain Name Service (DNS), to maintain connectivity information. As with DNS,

information can be cached to reduce delay overhead incurred in determining whether the

correspondent end host is reachable with RESCUE. Alternatively, OCS service can be

implemented in a centralized fashion as a part of a carrier network management system.

For example, it can be combined with an Authentication, Authorization and Accounting

(AAA) service [63]. OCS is important to enable a gradual growth of RESCUE users. If an

end host with RESCUE capability wants to communicate with an end host without such

capability, it will simply use the Internet. If, through OCS, it determines that the corre-

spondent host also has RESCUE capability, and furthermore it is connected via the same

optical circuit-switched network, it can use a RESCUE circuit.

Transport protocol on end-to-end RESCUE circuits: For the actual data transfer on

RESCUE circuits, we recommend using a combination of a rate-based transport protocol

on the unidirectional end-to-end Ethernet/SONET circuit from the server to the client and

a TCP connection for the reverse direction through the IP network. Standard TCP is not

well-suited for end-to-end circuits, i.e., paths on which there are no packet switches,

because of the congestion-control mechanisms built into Standard TCP. This functionality

is not only unnecessary if the end-to-end path is a circuit, it is also detrimental because bit

errors will be interpreted as congestion losses causing variations in the sending rate. For

full utilization of the circuit what we need is a transport protocol that uses rate-based flow

control and constantly sends data. As for error control, we do expect losses both as a result

of bit errors on links (which are likely to be rare because of the high quality of optical fiber

transmission), and receive-buffer overflows resulting from mismatches in the instanta-

neous rate at which the sending NIC emits data on to the circuit and the rate at which the

receiving end host software moves data to disk. A negative acknowledgement (NAK)

based error-control scheme is well suited for end-to-end circuits where data blocks are

received in sequence. Retransmissions can be sent on the Ethernet/SONET circuit if they

occur in the middle of the transfer and on the TCP connection if they occur at the end. The

reason for the latter is to avoid having to hold the circuit open after completion of the ini-

tial file transfer while waiting for the final acknowledgment confirming completion.

5.2 Analytical Basis for the Routing Decision: Delay Analysis

In this section we use the analytical model described in Section 4.2 to provide a quanti-

tative basis for the routing decision in the file-transfer application. Using similar reasoning

to that presented in (2) and (3), we can base the routing decision on whether to attempt a

circuit setup for an end-to-end file transfer as follows:

is the actual file-transfer delay on the end-to-end Ethernet/SONET circuit:

where is the size of the file being transferred and is the data rate of the circuit.

is the mean time to transfer the file using the primary TCP/IP path and computed

using the analytical models of TCP presented in [39] and [40] (equation (1) in Section

1.3.1). We have not included retransmission delays here because on Ethernet/SONET cir-

cuits, retransmissions required due to random bit errors and/or receive-buffer overflows

if E Tsetup[ ]

1 Pb–---------------------- E Ttcp[ ] Ttransfer–( )≥⎝ ⎠⎛ ⎞ resort directly to the TCP/IP path

if E Tsetup[ ]

1 Pb–---------------------- E Ttcp[ ] Ttransfer–( )<⎝ ⎠⎛ ⎞ attempt circuit setup

Ttransfer

Ttransferfrc----- Tprop

2------------+=

E Ttcp[ ]

are needed in both the TCP path and the Ethernet/SONET end-to-end circuit. Since the

routing decision is based on comparing delays on the two paths before deciding whether

or not to attempt a circuit setup, we have omitted retransmission delays on both paths.

Including this delay would in fact favor using the Ethernet/SONET circuit. This is because

bit errors on the TCP/IP path would be misinterpreted as packet losses caused by conges-

tion leading to a reduction in the sending rate.

The key difference between (6) and (3) is that we have in (6) instead of the

term in (3). This is because in the Dial-Up application, since only the access

link is being bypassed, TCP is still required end-to-end, while on the end-to-end circuit,

the transfer time after the circuit is set up is given by (7). A second difference is in the

term . In (4), we used the term to denote the round-trip propagation

delay between the RESCUE end host and its ISP’s IP router. To obtain the numerical

results, we assumed this delay to be typically small, and hence set it to 0.1ms. In the file-

transfer application will include an end-to-end round-trip propagation delay

between the two hosts, which could have a small or large value depending on the distance

between the two hosts. For example, we use 0.1ms for a metro-area path and 50ms for a

wide-area path.

Numerical results:

We provide two sets of numerical results. In sub-section A, we consider the case when

the circuit rate is the same as the bottleneck link rate on the primary TCP/IP path. In sub-

section B, we consider the case when the circuit rate is only 100Mbps while the bottleneck

rate on the TCP/IP path is 1Gbps.

A. Discussion of the routing decision (6) if

Ttransfer

E Ttcpdialup[ ]

E Tsetup[ ] Tpropdialup

E Tsetup[ ]

We plot the two sides of (6) in Figure 14 and Figure 15 assuming that both and are

100Mbps and 1Gbps respectively. The parameter values used to compute are

same as those used in Section 4.2 for the Dial-Up application except for , the number of

switches, and . We increase from 4 to 20 since the end-to-end circuit between

hosts could consist of more circuit switches than the Dial-Up path from a host to its ISP’s

IP router. is set to either 0.1ms or 50ms as previously mentioned.

E Tsetup[ ]

Tprop k

Figure 14. Plot of Equation (6) with , , rc r 100Mbps= = ρsig ρsp 0.7= = k 20=

Figure 15. Plot of Equation (6) with , , rc r 1Gbps= = ρsig ρsp 0.7= = k 20=

For wide-area paths, when is 50ms, for the entire file range (100KB, 1GB), a

RESCUE circuit setup should be attempted for the and values shown. This is

because is always less than the difference (see

(6)). However, on low-propagation delay paths (Figure 14(a) and Figure 15(a) in which

), we see that there are crossover file sizes below which an end host

should resort directly to the TCP/IP path and above which it should attempt an Ethernet/

SONET circuit setup. These crossover file sizes are listed in Table 7 and Table 8. As an

example, if , the call-blocking probability on the optical circuit-

switched network is 30% and the packet-loss rate on the TCP/IP path is 1%, then for

calls in which the RESCUE circuit traverses 20 switches, 650KB is a crossover file size in

low-propagation delay environments, i.e., when the end-to-end path is within a single

metro area. For files below this size, the end host application software should directly

resort to the TCP/IP path.

Table 7. Crossover File Sizes when and

Number of switches on the circuit Number of switches on the circuit

610KB 640KB 840KB 2.4MB 2.65MB 3.4MB

490KB 550KB 730KB 2MB 2.2MB 2.8MB

120KB 140KB 180KB 500KB 550KB 650KB

4.8MB 5.4MB 7.2MB 22MB 24MB 30MB

3.2MB 2.4MB 2.2MB 9MB 10MB 12MB

Pb Ploss

E Tsetup[ ] 1 Pb–( )⁄ E Ttcp[ ] Ttransfer–

Tprop 0.1ms=

rc r 100Mbps= = Pb

rc r 100Mbps= = Tprop 0.1ms=

Measure of loading on

Ckt. sw.network

TCP/IP path

k 4= k 20=

Pb 0.01= Pb 0.1= Pb 0.3= Pb 0.01= Pb 0.1= Pb 0.3=

Ploss 0.0001=

Ploss 0.001=

Ploss 0.01=

rc r 1Gbps= = Tprop 0.1ms=

Ckt. sw.network

TCP/IP path

k 4= k 20=

Pb 0.01= Pb 0.1= Pb 0.3= Pb 0.01= Pb 0.1= Pb 0.3=

Ploss 0.0001=

Ploss 0.001=

B. Plots of the routing decision (6) if and

Our reason for considering this option is as follows. Consider the case when the leased

circuit for the primary Internet access of an enterprise is increased to 1Gbps. Say, the line

leased for RESCUE service is also 1Gbps. In this case, if a RESCUE circuit for a single-

file transfer is allocated the full 1Gbps rate of the RESCUE leased circuit, all other calls

will be blocked. For increased sharing of the RESCUE leased circuit, each RESCUE cir-

cuit may only be allocated 100Mbps. This is the classical difference between bandwidth

sharing modes of circuit-switched and packet-switched networks. In this section, we con-

sider the question of whether there is any value for the circuit-switched path even with this

ten-fold handicap in the data rate.

We plot the two sides of (6) in Figure 16 assuming that and

. When , for the entire file range (100KB, 1GB), a RESCUE

circuit should be attempted first as shown in Figure 16(b). However, when

, for the entire file range (100KB, 1GB), the TCP/IP path should be used

directly if there is such a rate mismatch. For the case, the difference

is not only smaller than , but also negative.

This implies that the mean transfer delay on the higher-rate TCP path is smaller than the

time to transfer on the lower-rate circuit.

In summary, from this file-transfer delay analysis, we conclude that a circuit setup

300KB 360KB 500KB 1.2MB 1.4MB 1.8MB

rc r 1Gbps= = Tprop 0.1ms=

Ckt. sw.network

TCP/IP path

k 4= k 20=

Pb 0.01= Pb 0.1= Pb 0.3= Pb 0.01= Pb 0.1= Pb 0.3=

Ploss 0.01=

rc 100Mbps= r 1Gbps=

rc 100Mbps=

r 1Gbps= Tprop 50ms=

Tprop 0.1ms=

E Ttcp[ ] Ttransfer– E Tsetup[ ] 1 Pb–( )⁄

should be attempted if is 50ms for files 100KB or larger even with gross rate mis-

matches, while in low propagation-delay environments, the decision depends upon the file

size, the rates on the two paths, and the loading conditions on the two paths.

5.3 Analytical Basis for the Routing Decision: Utilization Analysis

While file-transfer delay is an important user measure for making the routing decision

of whether or not to attempt a circuit setup, service provider measures such as utilization

should also be considered since utilization ultimately does impact users through prices

charged. Total network utilization has two components: per-circuit utilization, , and

aggregate circuit utilization, .

Per-circuit utilization uc is given by:

, where . (8)

where is the average file size. Even though we hold circuits only for the duration of

the transfer, and only set up unidirectional circuits, given that call holding times are short,

Figure 16. Plot of Equation (6) with , ,,

rc 100Mbps= r 1Gbps=ρsig ρsp 0.7= = k 20=

ucE Ttransfer[ ]

E Tsetup[ ] E Ttransfer[ ]+----------------------------------------------------------= E Ttransfer[ ] E X[ ]

rc-----------=

E X[ ]

call-setup delays lower utilization. This is because switches hold resources for a call as its

setup procedure proceeds end-to-end. When is large (e.g., 50ms), there should be a

minimum file size below which circuit setup should not be attempted. Without such a min-

imum, per-circuit utilization can be poor. For example, consider a file size of 100KB. For

the 50ms propagation delay environments, we concluded from our file-transfer delay anal-

ysis in Section 5.2 that circuits should be attempted for all file sizes in the range (100KB,

1GB). However, for a 100KB file transfer on a 100Mbps circuit with 4 switches on the

end-to-end path, we need 50.158ms setup time and 8ms total transfer time. As a result, the

utilization per circuit is only 13.7%. From our file-transfer delay analysis, we found cross-

over file sizes for low-propagation delay environments, but from a utilization perspective

we see that such crossover file sizes are necessary for large-propagation delay environ-

ments.

To obtain aggregate circuit utilization , we model the three-link scenario shown in

Figure 17. In general, depending upon the number of switches equipped with signaling

engines, an Ethernet/SONET circuit could traverse many links. In the scenario shown in

Figure 17, each switch connects enterprises. Assume that each access link has

circuits and the core link has circuits. We further assume that each enterprise gener-

SwitchSwitch... ...N N

maccess

Local trafficLong distance traffic

Figure 17. A Three-link Network Model of RESCUE Service

N maccess

ates the same offered call load to the network of which a fraction represents local traffic

(i.e., calls between two enterprises within a same metro network) and the remaining frac-

tion representing long distance traffic (to enterprises located in the other metro net-

works). Both local and long distance call arrival processes are assumed to be Poisson.

We use the reduced load approximation [64], also well known as Erlang's fixed point

method, for call-blocking probabilities, which is known to be quite accurate except under

very high offered call loads. Nevertheless we validated our analytical results with simula-

tions and found a close match.

Call-blocking probabilities for the access links and core links are given by:

and (9)

where is the Erlang-B formula and (10) and (11)

characterize the thinning effect of the load due to blocking on other links. is the frac-

tional offered call load of each enterprise created by only allowing for transfers with files

larger than some crossover file size, , requesting a RESCUE circuit. Using the Pareto

distribution for file sizes [65], we compute the fractional offered load as

where , the shape parameter, is 1.06 and , the scale parameter, is 1000 bytes as com-

puted in [65], and is the total offered load.

The fixed-point reduced load approximation requires an iteration of (9)-(11) until there

is convergence to a single point. We start the iteration assuming and

1 f–

Pbcore

Erl νcoremcore,( )= Pb

accessErl νaccess

maccess,( )=

νcoreNν 1 f–( ) 1 Pb

access–( )2

νaccess νf 1 Pbaccess–( ) ν 1 f–( ) 1 Pb

core–( ) 1 Pbaccess–( )+=

Erl ρ m,( ) ρmm!⁄( ) ρk

k!⁄k 0=

∑⎝ ⎠⎜ ⎟⎛ ⎞⁄=

ν ρE X( )-----------P X χ≥( )E X X χ≥( )[ ] ρ α 1–( )

αk--------------------- k

χ---⎝ ⎠⎛ ⎞

α αχα 1–------------ ρ k

χ---⎝ ⎠⎛ ⎞

α 1–

Pbaccess 0←

. When the iterations converge, the end-to-end call-blocking probabilities

become:

and (13)

and corresponding link utilizations are:

Combining the two components of utilization ( in (8) and in (15)~(16)), we obtain

total utilization for the two types of links:

Numerical results:

To obtain numerical results, we assume the following input parameter values:

, , and . While the two switches shown in Figure

17 could both belong to the same metro network, for purposes of understanding the impact

of propagation delay, we assume that these two switches are located in distant metro net-

works with the round-trip propagation delay between the two switches being 50ms. We

assume intra-area round-trip propagation delay to be 0.1ms. Furthermore we assume local

calls pass through 3 switches (MSPPs at each enterprise and one of the two switches

shown in Figure 17) and long-distance calls pass through 4 switches (2 enterprise MSPPs

Pbcore 0←

Pblocal 1 1 Pb

access–( )2

–= Pblong dist– 1 1 Pb

access–( ) 1 Pbcore–( )–=

Pb Pblocal

f Pblong dist– 1 f–( )+=

uaaccess 1 Pb

local–( )νf 1 Pblong dist––( )v 1 f–( )+

maccess

-----------------------------------------------------------------------------------------------------=

uacore 1 Pb

long dist––( )Nν 1 f–( )

---------------------------------------------------------------=

uaccess

uaaccess E X X χ≥| ⟩[ ]( ) rc⁄

E Tsetup[ ] E X X χ≥| ⟩[ ]( ) rc⁄+----------------------------------------------------------------------×=

uacore E X X χ≥| ⟩[ ]( ) rc⁄

E Tsetup[ ] E X X χ≥| ⟩[ ]( ) rc⁄+----------------------------------------------------------------------×=

N 100= f 0.8= mcore 10maccess=

and the two switches shown in Figure 17).

We plot the numerical results of and in Figure 18(a) and Figure 18(b)

respectively, for different call-blocking probabilities , and different values of in Fig-

ure 18. As crossover file size is increased, is kept constant by increasing mcore and

maccess correspondingly. The “zigzag” pattern of the plots occurs because mcore and mac-

cess have to be integers.

As the crossover file size is increased, the access link aggregate utilization, ,

decreases because offered load decreases. Since the for local calls

( and ) is small (0.8ms), and 80% of the calls are assumed to be

local calls, per-circuit utilization is high even for the small files, e.g., 91% for a 100KB

file. Increasing the crossover file size will not improve the per-circuit utilization signifi-

cantly. As a result, the total access link utilization in Figure 18(a) decreases

slightly as crossover file size is increased.

In Figure 18(b), we plot , the utilization of the core link. All calls passing through

uaccess

Figure 18. Plot of Total Utilization on Each Access Link and the Core Link

(a) Access link utilization uaccess (b) Core link utilization ucore

χ uaaccess

E Tsetup[ ]

Tprop 0.1ms= k 3=

uaccess

this link are long-distance calls ( and ). As the crossover file size

is increased, the plots show total utilization increasing because per-circuit utilization

increases. However, beyond a critical crossover file size, the drop in the offered call load

and the corresponding drop in the aggregate utilization slows the increase of the

total utilization, making it stable at some value below 1 or even dropping it slightly. For

example, the optimal crossover file size is 2.7MB when is 5 and is 30%.

Another observation is that high utilization is possible by operating the network at a

high call-blocking probability (30%). For example, with and a blocking probabil-

ity of 30%, we can achieve a 93% total utilization on the core link using a crossover file

size of 500KB, while at a blocking probability of 1%, we can only achieve 84% utiliza-

5.4 Chapter Summary

In this chapter, we proposed to improve delay performance of file transfers by using

end-to-end RESCUE circuits where possible. To achieve high circuit utilization, we pro-

posed a unidirectional end-to-end RESCUE circuit from the server to the client and a rate-

based transport protocol on the RESCUE circuit. If the circuit setup is successful, there is

a huge advantage in total delay especially in wide-area environments. For example, a 1TB

file requires on 2.2 hours on a 1Gbps end-to-end circuit but could take more than 4 days

on a TCP/IP path in a WAN environment. We analyzed the conditions under which a cir-

cuit setup should be attempted. For WAN environments, it is clear that a circuit setup

should be attempted for large and medium-sized file transfers. For small-size file transfers

(on the order of KB), we see the need to place a lower bound for file sizes. Without such a

lower bound, network utilization can be poor. In lower propagation-delay environments,

Tprop 50ms= k 4= χ

uacore

ρ 10=

one should consider the loading conditions on the two paths, probability of packet loss on

the TCP/IP path and call blocking probability through the circuit-switched network,

before deciding whether or not to attempt the circuit setup.

Chapter 6 Implementation of Application II

To take advantage of the benefits brought by the RESCUE service, software enhance-

ments are needed at end hosts. We identify three basic modules for the end-host RESCUE

software: a routing decision module, a signaling module, and a high-speed transport proto-

col module for end-to-end file-transfer applications. An overview of the end-host RES-

CUE software architecture is shown in Figure 19. The user application shown in Figure 19

interacts with the RESCUE routing decision module to decide whether or not to attempt a

circuit setup. If it decides to attempt a circuit setup, the RESCUE signaling module ini-

tiates a call-setup request to the signaling-enabled network switches. If the circuit setup is

successful, RESCUE software will direct the user application to initiate data transfers on

the Ethernet/SONET circuit through the end host’s second NIC. Depending upon the

application, TCP or some other transport protocols could be used on the circuit. If, on the

contrary, the routing decision module determines the primary TCP/IP path is preferred, or

Application

Signaling

TCP NIC I

NIC II

TCP/othertransportprotocols

End host RESCUE software

Routing decision

Primary TCP/IP path

Dynamically configured Ethernet/EoS circuit

Figure 19. An End Host Configured for RESCUE Service

if the circuit setup fails, the user application will be directed to the primary TCP/IP path

through the end host’s primary NIC.

Details of the high-speed transport protocol module are presented in Section 6.1 along

with experimental results. Section 6.2 and 6.3 describe the basic functionality and features

for the routing decision module and the signaling module.

6.1 Design and Implementation of a High-speed Transport Protocol

In Section 1.3.2, we show that TCP is a poor choice for dedicated high-speed end-to-end

circuits because of its slow start and congestion avoidance algorithms. Also, TCP’s win-

dow-based flow control and positive-ACK based error control scheme are not well suited

for dedicated end-to-end circuits. In this section, we consider the question of what trans-

port protocol to use on the end-to-end high-speed RESCUE circuits for the file-transfer

application.

6.1.1 Design Rationale

To design a transport protocol for the end-to-end RESCUE file transfers, we start by

considering the purpose and role of transport protocols. Transport protocols perform the

functions necessary to achieve reliable transfer of data on an end-to-end basis. The end-to-

end paths in the file-transfer application will consist of Ethernet, GbE, or 10GbE segments

at the ends connected via wide-area SONET/SDH circuits and/or WDM all-optical light-

paths. Since resources are reserved in a dedicated manner for these circuits, congestion is

handled during circuit setup. Once the circuit is successfully provisioned, congestion con-

trol functionality is not required during the data transfer. There is no contention for

resources during the actual data transfers, and hence, no possibility of data loss at the cir-

cuit-based network switches unlike in packet-based network switches. Nevertheless losses

can occur even on these RESCUE circuit due to (i) link errors and (ii) receive-buffer

overflows. Link errors arise from bit and burst errors on the physical media. Even though

optical fiber, the physical medium of circuit-switched networks, is fairly reliable, link

errors are unavoidable. We will describe error-control solutions for recovery from link

errors and receive-buffer overflows.

A. Flow Control

There are three well-known, flow-control methods: ON/OFF, window-based, and rate-

based. In ON/OFF and window-based flow control schemes, the receiver sends messages

to control the behavior of the sender dynamically. These receiver-based flow control

schemes are suitable for networks, such as the Internet, where the available bandwidth and

receiver rates are unknown to the sender. However, when used in circuit-switched net-

works, they leave open the possibility of the circuit lying idle while a sender awaits an ON

or “window-open” signal from the receiver, which could lead to poor circuit utilization. In

RESCUE service, since the circuit rate assignment is known a priori to the sender, it is

possible to set the sending rate to match the circuit rate/receiver rate. Therefore, the flow-

control scheme in our transport protocol is rate-based. Rate-based flow control uses the

available bandwidth in an efficient manner. It can be implemented by setting the inter-

packet generation time at the sender.

However, implementation of a rate-based flow control is more complicated than the

above discussion indicates because of receive-buffer overflows. What are these and why

do they occur? At first glance, it appears that with a dedicated circuit, one could simply

match the sending rate to the receiver rate (which is ideally equal to the circuit rate), thus

eliminating the need for receiver buffers. For example, on an end-to-end telephone cir-

cuits, senders (speakers) generate audio data at the same rate at which receivers (listeners)

consume data. However, unlike telephones that perform their single dedicated function,

general-purpose computers that are typically involved in file transfers engage in several

other activities concurrent with file transfers. This requires operating systems to schedule

various tasks in and out of the processor as needed, which implies that data received on a

NIC is not moved at a guaranteed constant rate from the NIC to the disk. Furthermore disk

access rates are not constant. There can be significant variability based on the location to

which data needs to be written. Variability also arises at the sender. Based on when the

operating system at the sender schedules the network-related kernel threads and drivers,

data is moved from disk to memory (user-space and/or kernel-space) to the NIC at varying

rates. This variability at the receiver and at the sender leads to the interesting question of

how to select an appropriate circuit rate (transfer rate). If a pessimistic rate is chosen to

avoid (at all costs) the possibility of the sender sending data faster than the rate at which

the receiver can move the data, the overall transfer delay could be higher than if an opti-

mistic rate is chosen allowing for losses and subsequent retransmissions. Thus rate-based

flow control schemes are not trivial to implement.

B. Error Control

To counter losses from link errors and receive-buffer overflows, we propose to use the

selective-Automatic-Repeat-reQuest (selective-ARQ) scheme to achieve a high efficiency.

NAKs can be used to indicate packet losses instead of requiring the sender to maintain

timers and await positive acknowledgements (ACKs) because of the guaranteed in-

sequence delivery of data blocks on dedicated circuits. However, ACKs are still needed

because the sender needs to update its retransmission buffers. Retransmission buffers are

required at the sender because of the high delay overheads involved in disk accesses. Ide-

ally, if one could implement a transport protocol using OS-bypass, where a software pro-

gram on a processor on the NIC or hardware circuitry moves data directly from the disk to

the NIC for transmission, then no retransmission buffers will be needed at the sender. If a

block of data is lost, it can be re-extracted directly from the disk and retransmitted. How-

ever, in software implementations on general-purpose hosts using standard Ethernet NICs,

we will need retransmission buffers at the sender where data is held until acknowledged to

avoid repeated disk accesses. Retransmission buffers cannot be too large because of the

limited memory size at end hosts. Therefore we need ACKs to confirm the delivery of

packets and allow the release of the corresponding space in the retransmission buffer.

There is a trade-off between the retransmission buffer size and the frequency of sending

positive-ACKs. The longer the time interval between two consecutive ACKs, the less the

overhead incurred. But the retransmission buffer has to be correspondingly larger to store

more unacknowledged packets. We need further experimentation to select appropriate val-

ues for the retransmission buffer size. Thus, we choose to use a combination of NAKs for

immediate packet-loss indications with timer-based ACKs to clear the retransmission

buffer. A NAK will also be treated as a cumulative ACK for all segments prior to the one

being requested in the NAK.

Resequencing buffers are similarly needed at the receiver to accumulate together Ether-

net frames to create a block before performing a write access of the disk*. Accessing the

disk for each Ethernet frame will result in excess overhead.

C. Use of dual communication paths

*Again, with an OS-bypass implementation, resequencing buffers can be eliminated.

As described in Section 3.3, the system architecture leverages dual paths: (i) a dedicated

end-to-end high-speed circuit and (ii) a TCP/IP path. The high-speed dedicated circuit is

used only for the actual data transfer and is held open only for as long as user data flows

from the server to the client. Next consider whether to send retransmissions on the dedi-

cated end-to-end circuit or on the TCP/IP path. Our answer is to use the dedicated end-to-

end circuits for retransmissions unless these are needed at the end of the transfer. The rea-

son for not wanting to use the dedicated circuit for any retransmissions needed at the end

of a transfer is that the circuit will have to lie idle while the sender awaits acknowledg-

ment of data reception from the receiver. On the other hand, if an aggressive transfer rate

is used, we can expect a fair number of retransmissions to handle receive-buffer over-

flows. Hence we recommend using the dedicated end-to-end circuits for most of the

retransmissions.

The RESCUE circuit set up for the file transfer would be a unidirectional circuit for uti-

lization reasons. This raises the question of how to transport reverse-path control mes-

sages, such as ACKs and NAKs. If the RESCUE circuit is used for this exchange,

utilization will suffer. Given the end hosts have two communication paths in RESCUE

service, we propose using the TCP/IP path for such exchanges.

As a summary, we identify that the transport protocol on end-to-end high-speed RES-

CUE circuits should use a rate-based flow control scheme and a selective-ARQ based

error control scheme with both NAKs and ACKs.

6.1.2 Related Work

A number of new transport protocols have been proposed for high-speed networks using

the easy-to-use UDP socket API offered by end-host systems and implemented as applica-

tion-level processes. Examples include SABUL [66], UDT [67], Tsunami [68], and

RBUDP [69]. Others have enhanced TCP [22]-[24] and implemented these enhancements

in the kernel space. Most of these enhancements are for high-speed packet-switched net-

works, which means they run congestion-control mechanisms to adjust sending rates dur-

ing data transfers based on congestion levels, and therefore not suited for end-to-end

circuits.

RBUDP is an exception, being specifically targeted at photonic networks and imple-

mented with rate-based flow control. In RBUDP, the sender transmits whole user data at a

fixed rate and then retransmits all lost packets at the end of user-data transfer according to

the response from the receiver. This results in a poor circuit utilization during the retrans-

mission phrase because the circuit have to be held open after completion of the initial user-

data transfer while waiting for the final acknowledgment confirming completion.

Another class of transport protocols, such as Scheduled Transfer (ST) [70] and RDDP

[71], has been designed for OS-bypass implementations. They provide sufficient hooks to

allow for a high-speed, OS-bypass implementation, a feature that is necessary to achieve

true high-speed end-to-end throughput. It does this by having the sender specify a receiver

memory address in the data block header, which causes the receiving NIC to simply write

the received payload using Direct Memory Access (DMA) into the specified memory

location. This results in a low end-host transport layer delay. However, to gain this advan-

tage, ST requires a programmable processor on the NIC, which make it hard to implement.

Following the design rationale in Section 6.1.1 and using some concepts from above

protocols, we developed our own transport solution called Fixed Rate Transport Protocol

(FRTP). We specify the FRTP protocol in Section 6.1.3. The implementation of FRTP pro-

tocol is presented in Section 6.1.4 and some preliminary experimental results are shown in

Section 6.1.5 in the context of our local-area testbed network.

6.1.3 FRTP Specification

A. The Model of FRTP Connections

According to design rationale C, “use of dual communication paths”, we design a two-

channel model for FRTP. Each FRTP session has two channels: a data channel and a con-

trol channel. The data channel is used for actual user-data transfer on the end-to-end RES-

CUE circuit from the sender to the receiver. The control channel is used for control-

information (control messages for flow control and error control) transfer on the primary

TCP/IP path between the receiver and the sender.

B. Packet Formats in FRTP

All the user-data and control information are encapsulated into either FRTP DATA

Control process

Data transferprocess

Control process

Data transferprocessData channel over

RESCUE circuits

Control channel overprimary TCP/IP path

The sender The receiver

Figure 20. The Model of FRTP Connections

packets or FRTP control packets, as shown in Figure 21. The DATA packet carries the

user-data payload along with a unique 32-bit sequence number for error control. The pay-

load length in DATA packets should fit in to the path MTU size to avoid the IP segmenta-

tion (1472Bytes in our implementation). Taking the simplicity as design principle, we

define minimal numbers of control packets required by error control functionality. The

control packets include ACK and ERR (NAK) packets, which carry the control informa-

tion required by error control. Each control packet includes a 32-bit packet type field (1

for ACK and 2 for ERR), a 32-bit attribute field (the next expected sequence number if it

is an ACK packet, or the total number of lost packets reported if it is an ERR packet), and

an optional variable-length (up to 1440Bytes) attribute value field only for ERR packets to

carry the sequence numbers of lost packets. The ACK packets are cumulative positive

acknowledgments telling the sender that the receiver has received all the DATA packets

prior to the sequence number it carries, while the ERR packets are negative acknowledge-

ments that carry the sequence numbers of lost packets.

Sequencenumber Payload

Packet type

32bits

Attribute Attribute value(only for ERR)}

32bits

DATA packet

Control packet

1 The nextexpected SN

2The number of

lost packetsreported

The SNs of lostpacketsERR

Up to 1440Bytes

Figure 21. Packet Formats in FRTP

In addition, one special packet is used to exchange the FRTP parameters between the

sender and the receiver prior to the actual data transfer. We show the format of this packet

in Figure 22. Currently, the packet contains only two parameters, the receiver’s data chan-

nel identifier and the user-specified sending rate. Each parameter is carried in a 16-bit

field. The packet can be readily extended to carry more FRTP parameters in future ver-

sions.

B. Flow Control and Error Control in FRTP

According to design rationale A, FRTP uses a rate-based flow-control scheme. The

receiver notifies the sender with a user-specified sending rate prior to the actual data trans-

fer (using the parameter-exchange packet in Figure 22). The actual sending rate then is set

by the sender, which stays unchanged during the whole data transfer. The constant sending

rate is implemented by setting the inter-packet generation time at the sender. This time is

computed by the sender using the sending rate and DATA packet size.

According to design rationale B, FRTP error control is selective-ARQ scheme, in which

only the lost packets are retransmitted. Both the sender and the receiver detect the packet

losses. The receiver keeps track of the next expected sequence number for loss detection

purpose. Since the sequenced delivery is guaranteed on RESCUE circuits, any mismatch

between the currently received sequence number and the expected sequence number indi-

cates a packet loss. The sequence numbers of lost packets are then sent back to the sender

Data channelidentifier

The sendingrate} }

16bits 16bits

Figure 22. The Parameter-Exchange Packet in FRTP

via ERR packets. A time-out scheme is used by the sender to detect packet losses. The

sender keeps the retransmission timer for outstanding DATA packets. When the timer

expires, the sender will assume that expired packets are lost. To reduce the overheads

involved in disk accesses, the sender maintains a retransmission buffer to hold unacknowl-

edged DATA packets. ACK packets are sent periodically by the FRTP receiver to confirm

the delivery of DATA packets and allow the release of the corresponding space in the

retransmission buffer.

6.1.4 An Implementation of FRTP protocol

Our implementation of FRTP protocol is being developed based on SABUL. Overall, it

is implemented as an application-level process using a combination of UDP and TCP.

According to the design rationale, the data channel is an end-to-end dedicated circuit

carrying Ethernet packets. In theory, the file being transferred can be segmented into

FRTP DATA packet and carried directly in Ethernet packets. In practice, to simplify FRTP

implementation, we use a UDP/IP socket. The UDP/IP protocol layers do not add any

functionality in the transfer of FRTP DATA packets on dedicated end-to-end Ethernet/

SONET circuits. However, the use of a UDP socket greatly simplifies programming. It

helps us avoid kernel-level programming need for a direct transfer of FRTP DATA packet

in Ethernet packets. The FRTP data channel identifier is thus a UDP port number.

Following the design rationale described in Part C of Section 6.1.1, a TCP connection is

required via the primary IP path between the sender and the receiver. Therefore, as shown

in Figure 23, a FRTP session starts with a TCP connection establishment between the

sender and the receiver. After the initiation, the FRTP sender opens a TCP listening port

and waits for any incoming connection attempt. A TCP connection is established upon

receipt of a request from the FRTP receiver. The next step is FRTP parameter exchange.

The sender and receiver exchange a set of FRTP parameters via the TCP socket, such as

the user-specified sending rate and data channel identifier (UDP data channel’s port num-

After a successful TCP control channel establishment and FRTP parameter exchange,

the sender starts the actual data-transfer via the UDP socket on the end-to-end circuit*.

During the whole data transfer, the sender is responsible for data transmission and retrans-

missions based on feedback from the receiver. A select/poll mechanism is used by the

*UDP is a connectionless protocol, so there is no connection establishment procedure for UDP data channel.

FRTP receiver

Initiation

Establish TCPcontrol channel

Listening

Establish TCPcontrol channel TCP channel

FRTP sender

FRTP parameterexchange

Copy one block ofdata into

retransmission buffer

TCP channel

Initiation

FRTP parameterexchange

* Check andprocess feedbackfrom the receiver

The loss list isempty?

Pick up a lostpacketEncapsulate a new

DATA packet

Wait one inter-packet time

** Send feedbackto the sender if

necessary

Transmit aDATA packet

ReceiveDATA packet

If an errordetected?

Update the loss list and the nextexpected sequence number

Send ERR packetto the sender

Move one block ofdata out of

resequencing bufferTCP channel

UDP channel

Network-IO thread

Disk-IO thread

Network-IO thread

Disk-IO thread

Retransmission buffer

The loss list

Resequencing buffer

The loss list

Figure 23. Data Sending/receiving Procedure in FRTP

sender to handle two threads, a disk-I/O thread and a network-I/O thread, simultaneously.

The disk-I/O thread copies the data from disk (in disk-to-disk transfers) or upper-layer

application buffer (in memory-to-memory transfers) and places the data into the FRTP

sending buffer. This buffer is also used as a retransmission buffer for error control pur-

poses. The disk-I/O thread copies the user data block by block into the retransmission

buffer in a loop manner unless the buffer is full, which could happen due to the excessive

packet losses. In this case, the disk-I/O thread waits for a fixed time interval before trying

again. Meanwhile, a separate network-I/O thread reads the user data in the retransmission

buffer and encapsulates them into FRTP DATA packets. It sends one DATA packet every

inter-packet generation time via the UDP socket on to the end-to-end circuit. The sender

maintains a list to record the sequence numbers of lost DATA packets. Every inter-packet

generation time, the sender checks the loss list first. If it is not empty, the DATA packet

with the minimal sequence number in the loss list is retrieved from the retransmission

buffer and retransmitted. Otherwise, the network-I/O thread encapsulates a new DATA

packet and sends it out to the network. The new transmitted data will be kept in the FRTP

retransmission buffer till the corresponding acknowledgement is received. Every inter-

packet generation time, the network-I/O thread also checks ACK and ERR packets from

the receiver, as illustrated in Figure 24. The DATA packets that are acknowledged by

ACK packets are released from the FRTP retransmission buffer, while the sequence num-

bers of lost packets reported by ERR packets are inserted into the loss list. The sender

itself also detects packet losses by keeping an EXP timer (retransmission timer). Instead of

maintaining timers for each DATA packet, The FRTP sender maintains only one timer for

all new transmitted DATA packets for simplicity. If an ACK or ERR is not received before

the EXP timer expires (1s time-out value in our implementation), all outstanding DATA

packets will be inserted into the loss list and retransmitted.

Similarly, at the receiver side, a network-I/O thread receives and decapsulates DATA

packets. The data payloads are written into the FRTP resequencing buffer to be accumu-

lated together into blocks. The data blocks in the resequencing buffer are then copied to

the disk (in disk-to-disk transfers) or the upper-layer application buffer (in memory-to-

memory transfers) by a disk-I/O thread. The receiver also maintains a packet loss list. The

Figure 24. Feedback Checking and Processing at the FRTP Sender

Check TCPcontrol channel

Any incomingpacket?

Release the ACKeddata in

retransmission buffer

Remove the SNs ofACKed packets from

the loss list

Insert the SNs of lostpackets into the loss

ACK ERR

* Check andprocess feedbacksfrom the receiver

if the EXP timeris expired?

Insert the SNs of alloutstanding packets

into the loss list

sequence number of each lost packet is kept in the loss list until a correct retransmitted

copy is received. ACK packets are sent back to the sender periodically (every 100ms in

our implementation) or immediately when FRTP resequencing buffer is full*. The

sequence number carried in ACK packets is the next expected sequence number, which

equals the largest received sequence number plus 1 or the minimum sequence number in

the loss list if the loss list is not empty. Packet losses are detected by comparing the

received sequence number to the next expected sequence number. If a packet loss is

detected, the sequence numbers of the lost DATA packets are inserted into the loss list, and

*This could happen when the sender sends the data in a burst at a rate higher than the specified sending rate.

Figure 25. Feedback Sending at the FRTP Receiver

if the ACKtimer isexpired?

if the ERRtimer isexpired?

The loss list isempty?

Send ACK(largest received

Send ACK(smallest SN in

the loss list)

Send ERR(all SNs in the

loss list)

** Send feedbackto the sender if

necessary

an ERR packet containing the sequence numbers of the lost DATA packets is sent back to

the sender immediately. Since the retransmitted DATA packets could also be lost due to

link errors and receive-buffer overflows, to ensure the reliable delivery of DATA packets,

the whole loss list is sent back to the sender by ERR packets periodically (every 20ms in

our implementation) if it is not empty.

6.1.5 LAN Experiments

As indicated in the name, one of the key features of FRTP is to produce the constant

sending rate and allows us to set the corresponding circuit rate. The constant sending rate

is realized by controlling the inter-packet transmission time with rate-based flow control.

To test the effectiveness of rate-based flow control in FRTP implementation, we conduct

the experiment to measure the performance of FRTP.

Another question about FRTP is how to select an appropriate sending rate (circuit rate).

We realize that a higher sending rate does not always produce a higher throughput due to

the limitation of the end hosts. To select an appropriate rate, we must take account of

many factors, such as the scheduling scheme of operating systems, the hard disk access

rate, UDP buffer size, MTU size, and other FRTP related parameters. Although some of

these factors, such as the scheduling scheme of operating systems and the hard disk access

rate, cannot be controlled, other parameters can be adjusted to achieve better performance.

Hence, we test the impacts of these parameters in experiments.

In our experiments, we connected two Dell Precision 650 workstations via a Dell Pow-

erConnect Gigabit Ethernet switch. Each Dell workstation has a 2.4-GHz Intel XeonTM

CPU connected to a 533-MHz front-side bus (34Gbps CPU bandwidth), an E7505 chipset

with 512MB of DDR 266MHz memory (17Gbps memory bandwidth), an 80GB ATA/100

7200 RPM EIDE disk drive with 2MB cache (400Mbps average writing rate measured by

Bonnie [72]), and a 64bit/100MHz PCIx bus for the GbE NIC (6.4Gbps network band-

width). The operating systems on both workstations are RedHat Linux 9 with version

2.4.20-30.9 kernel. Tcpdump is used to better facilitate the analysis of data transfers [73].

Our experiments focus on the performance of bulk data transfers. We ran FRTP applica-

tions on both workstations and transferred a 127MB file between them.

A. Results with Default Settings

We began the experiments with default FRTP settings: 256KB UDP buffer size,

1500Bytes MTU size, 40MB FRTP buffer size, and 8MB block size for disk I/O opera-

tions. The sending rate is increased from 50Mbps to 1Gbps. Figure 26 plots packet-loss

rate and transfer throughput (note that we use the term “throughput” to denote “goodput”

in the whole Section 6.1.5) versus sending rate.

Figure 26. Packet-Loss Rates and Throughputs vs. the Sending Rate in FRTP Experiments(DATA Packet Size=1500B, UDP Buffer Size=256KB, FRTP Buffer Size=40MB,FRTP Data Block Size=8MB)

(a) FRTP throughput (b) FRTP packet-loss rate

As the sending rate is increased from a low value, the plots show the throughput increas-

ing as expected with zero or a small packet-loss rate. When the sending rate reaches a cer-

tain level (~200Mbps in Figure 26), the packet-loss rate becomes significant. This happens

because of the limitations of the end-host hardware and software modules that are

involved in moving blocks of the file from disk into the FRTP buffer, UDP buffer, and

finally into the NIC at the sender, and similarly through the NIC, UDP buffer, FRTP

buffer, and disk at the receiver. The major bottleneck in our experiment configuration is

the hard disk access rate, especially the writing rate. Even though the measured average

writing rate of our disks is around 400Mbps, its worse-case writing rate might be much

lower than the average value. For example, the disk driver requires a certain amount of

time to switch between cylinders and heads. During this switching time, the disk read/

write operations have to be suspended. To better reflect the disk's real-time performance,

the term “disk sustained transfer rate” was introduced. This rate is dependent on the disk's

media transfer rate, but includes the overheads required for cylinder switching time and

head switching time. Based on manufacturer’s specifications of our disks and other similar

products, a rough approximation of the sustained transfer rate of our disks is 200Mbps.

This explains the significant packet-loss rate when the sending rate is larger than

200Mbps.

As expected, increasing the sending rate beyond 200Mbps leads to excessive packet

losses because of receiver-buffer overflows, causing the retransmissions to impact overall

throughput. As a result, the throughput slowly reaches an “optimal” value (~370Mbps in

Figure 26) at a 590Mbps sending rate and then decreases. This “optimal” value approxi-

mates the 400 Mbps average disk access rate, the expected bottleneck in the experiment.

The small difference is caused by the additional processing overhead incurred in handling

packet losses. The results for the memory-to-memory transfer experiments are also shown

in Figure 26. By removing the effects of disk access, which is the bottleneck in the disk-

to-disk transfer, we can achieve a higher throughput (up to 910Mbps) without incurring

significant packet losses.

To verify the effectiveness of the FRTP rate-based flow control algorithm, we captured

several trace files while running FRTP file transfers. The actual inter-packet transmission

times seen on the link were retrieved and measured from trace files. Figure 27 shows an

example of inter-packet transmission times within a FRTP file transfer at a 50Mbps send-

Figure 27. An Example of Inter-packet Transmission Times within a FRTP File Transfer(Sending Rate=50Mbps, DATA Packet Size=1500B, UDP Buffer Size=256KB,FRTP Buffer Size=40MB, FRTP Data Block Size=8MB)

ing rate. The plot shows that the variance of actual inter-packet transmission times is very

small. The standard variance of inter-packet transmission times is only 0.00005, which is

quite acceptable considering the inevitable variability at the sender. Due to the limitation

of the machine running Tcpdump, we did not collect the measurement data at very high

sending rates. But we expect a similar behavior even under very high sending rates.

Another important measurement is CPU utilization of FRTP. As an application-level

implementation, FRTP consumes a large amount of CPU resources; thus its performance

is easily compromised by other concurrent processes. In all experiments presented in this

section, we carefully disabled all other possible user processes while running FRTP appli-

cations. Figure 28 plots CPU utilization versus sending rate in FRTP file transfers.

The plot shows that CPU utilization of the sender is always greater than 60%, some-

Figure 28. CPU Utilization vs. the Sending Rate in FRTP Experiments (DATA PacketSize=1500B, UDP Buffer Size=256KB, FRTP Buffer Size=40MB, FRTP DataBlock Size=8MB)

times reaching close to 90%. On the receiver side, CPU utilization is relatively lower, but

also greater than 60% when the sending rate is 500Mbps or higher. Requiring such a high

CPU utilization is a major drawback of application-level implementations. They require a

lot of CPU cycles, thus leaving little time for the application to do any computation

(though this is a non-issue for bulk-data transfer) and not allowing other concurrent CUP-

intensive applications. For example, when we start a Matlab process while running FRTP

with a 500Mbps sending rate, the FRTP throughput immediately dropped from 380Mbps

to 80Mbps.

B. Impact of UDP Buffer Size

UDP buffer size has a large impact on FRTP’s performance. It is well known that TCP

throughput is improved by properly selecting TCP buffer size. Similarly, better perfor-

mance can be achieved by increasing UDP buffer size in FRTP. At the FRTP sender, a

small UDP sending buffer increases the number of memory copies from the FRTP buffer

to the UDP buffer. This causes a serious degradation of FRTP performance in environment

where the CPU resource and/or the bus speed is bottleneck. At the receiver, a small UDP

receiving buffer also incurs unnecessary CPU overhead and data movement delays. Fur-

thermore, a small UDP receiving buffer increases the possibility of receive-buffer over-

flows due to variations in the data movement rate from the UDP buffer to the FRTP

buffers. When the operating system is temporarily unable to schedule system resources to

move the received data out of the UDP buffer, a small UDP receiving buffer can overflow,

especially in high-speed data transfers.

In this part of experiments, we fixed the sending rate at 500Mbps and observed FRTP

performance under different UDP buffer sizes. UDP buffer size was changed from 64KB

to 4MB by calling system function setsockopt(). All other parameters in FRTP used

default values. We plot packet loss and transfer throughput versus UDP buffer size in Fig-

ure 29.

As the UDP buffer size is increased, Figure 29 shows that FRTP throughput increases

while the packet-loss rate decreases as expected. For example, with a 2MB UDP buffer,

the average throughput at a 500Mbps sending rate increases to 386Mbps (a 20.6%

improvement from 320Mbps with the 64KB UDP buffer) and the loss rate drops from

16.96% to 0.03%. By removing the UDP buffer size limitation, the throughput gradually

approaches the theoretical optimal value, i.e. the average disk writing rate. The highest

throughput value seen in the experiment is 400.03Mbps, which matches the 400Mbps

average disk writing rate measured by Bonnie very well. Compare these results with those

of our experiment with default setting, in which we could only achieve a maximal

370Mbps at a 590Mbps sending rate with a 19% packet-loss rate. We conclude that a

larger UDP buffer size does help improve FRTP performance.

Figure 29. Packet-Loss Rates and Throughputs vs. UDP Buffer Size in FRTP Experiments(DATA Packet Size=1500B, Sending Rate=500Mbps, FRTP Buffer Size=40MB,FRTP Data Block Size=8MB)

However, increasing the UDP buffer size does not always bring us benefits. There is no

obvious improvement seen when we further increase the UDP buffer size beyond 2MB.

This is understandable because with a large UDP buffer, the only bottleneck is the disk

writing rate, and thus, increasing the UDP buffer size will not help improve throughput.

UDP buffer size should be tailored for each transfer. In our particular case, a UDP buffer

size slightly higher than 2MB produced “optimal” results.

C. Impact of FRTP Buffer Size

In this set of experiments, the sending rate was fixed at 500Mbps. All parameters were

set to default values except the FRTP buffer size. We changed the FRTP buffer size from

9MB to 40MB and observed FRTP performance under different FRTP buffer sizes. Figure

30 plots packet-loss rate and transfer throughput versus FRTP buffer size.

As FRTP buffer size increases from 9MB to 40MB, FRTP throughput increases from

305Mbps to 342Mbps, a 12.1% improvement. In our particular experiment, the “optimal”

FRTP buffer size is around 16MB although a slightly higher throughput value can be seen

Figure 30. Packet-Loss Rates and Throughputs vs. FRTP buffer size in FRTP Experiments(DATA Packet Size=1500B, UDP Buffer Size=256KB, Sending Rate=500Mbps,FRTP Data Block Size=8MB)

with larger FRTP buffer sizes. Increasing the FRTP buffer size beyond 16MB brings little

benefit because of the dramatic increase in packet-loss rate. This is reasonable because a

too large FRTP buffer will consume a large amount of system resources for memory man-

agement. Again, the “optimal” value of FRTP buffer size depends on the particular hard-

ware and software configurations, and should be tailored for each transfer.

E. Impact of FRTP DATA Packet Size

The last parameter that we tested in our experiments is the DATA packet size. The

throughput improves with larger packets. This is because the time needed for packet

encapsulation and decapsulation is smaller. To quantify the impact of packet size, we

repeated the previous experiments with a similar configuration. However, since the Dell

PowerConnect Ethernet switch that we used in previous experiments does not support

larger MTUs, we connected the two Dell Precision 650 (P650) workstations via a direct

1Gbps Ethernet link. We fixed the sending rate at 500Mbps and observed FRTP perfor-

Figure 31. Packet Losses and Throughputs vs. DATA Packet Size in FRTP Experiments(MTU=1500B, UDP Buffer Size=256KB, FRTP Buffer Size=40MB, SendingRate=500Mbps)

mance under different packet sizes. We increased the DATA packet size from the default

value of 1472B to 14972B. All other parameters in FRTP were set to default values. To

avoid IP fragmentation, we changed the system MTU size correspondingly. We plot

packet loss and transfer throughput versus DATA packet size in Figure 31.

As the FRTP DATA packet size is increased from 1500B to 6500B, FRTP throughput

increases from 342Mbps to 381Mbps, while the packet-loss rate drops from 14.8% to

8.68%. But the improvement is not apparent when we increased the packet size further.

This is because the possibility of receive-buffer overflows increases with packet size,

which then offsets a part of the benefits brought in by the reduction of encapsulation/

decapsulation overhead.

To avoid IP fragmentation, we set the path MTU size to be the same as FRTP DATA

packet size. However, we expect that the benefits gained with larger FRTP DATA packet

sizes will be offset by IP fragmentation overhead if there are switches on the end-to-end

paths that do not support larger MTUs. For this reason, we suggest a cautious use of large

packet sizes in FRTP, especially on paths where the support for large MTUs is unknown.

6.1.6 Summary of FRTP implementation and experiments

In this section, we presented the design and the implementation of a transport protocol

for dedicated end-to-end circuit call FRTP. FRTP consists of a rate-based flow control and

a selective-ARQ error control. We implemented FRTP as an application-level process and

conduct a series of experiments.

The experimental results showed that FRTP successfully produces a constant sending

rate during the data transfer. The inter-packet transmission times seen on the wire are quite

accurate and constant at different sending rate. This indicates that the rate-based flow con-

trol in FRTP is very effective and the most important objective of our transport protocol

work, a fixed sending rate, is successfully achieved by FRTP.

The experimental results also show that FRTP is able to achieve very high throughput.

In our disk-to-disk experiments, FRTP successfully achieved the theoretical maximum

throughputs, a 400Mbps disk access rate in our experimental configuration. In memory-

to-memory experiments, by removing the disk access bottleneck, FRTP could achieve a

throughput up to 910Mbps without too much packet losses.

We also notice from the experiments that several configurable parameters, such as UDP

buffer size, FRTP buffer size, and packet size, have great impacts on the performance of

FRTP. To obtain the optimal results, these parameters have to carefully tuning. In most

cases, a better performance can be seen with the larger buffer size and packet size.

However, due to the variability of the receiving capability, the receive-buffer overflows

and corresponding packet losses can not be completed eliminated. This will affect the per-

formance of FRTP and make FRTP throughput always lower than the sending rate. The

higher sending rate is, the higher FRTP throughput is, however, the lower bandwidth utili-

zation is (bandwidth utilization is defined as throughput divided by sending rate). There-

fore, the appropriate circuit rate should be chosen by counting the trade-off between

throughput and circuit utilization.

One disadvantage of FRTP is its high CPU utilization. One should avoid using FRTP

with other CPU-intensive processes concurrently. The variability of the receiving capabil-

ity is so severe due to the resource contention between concurrent processes that the

throughput and circuit utilization would drop to an unacceptable level. This is the biggest

problem of running application-level implementations on general-purpose end hosts. On

the contrary, transport protocols designed for OS-bypass implementations, such as ST, use

very little CPU resources, which makes them more attractive in environments where other

CPU-intensive processes are running. We will explore those OS-bypass transport proto-

cols in future work.

6.2 Routing Decision Module Design

Given the RESCUE service is configured as an “add-on” service to primary Internet

access, for communication between two entities that can be connected by a direct Ether-

net/SONET circuit, there is a choice of two paths: the primary TCP/IP path and an Ether-

net/SONET circuit. The routing decision module determines whether or not to attempt a

circuit setup based on network parameters required by the routing decision algorithm.

The analysis in Sections 4.2 and 5.2 shows that ideally the routing decision software at

end hosts should use dynamically obtained values of RTT, call-blocking probability on

the circuit-switched path, packet-loss rate on the TCP/IP path, bottleneck link rate

on the TCP/IP path, on the circuit-switched path, and other such measures. These

parameters can be estimated by using some measurement tools. For example, Pchar [74] is

a tool to characterize the bandwidth and packet-loss rate along an end-to-end path through

the Internet, Pathrate [75] is an estimation tool to estimate the bottleneck link rate on an

end-to-end TCP path, and tomography experiments [76] have shown that can be

estimated by end hosts. However, such a dynamic algorithm integrated with measurement

tools can be complex.

Since the benefit of using RESCUE is not significant under some circumstances (e.g.,

for small file transfers), a simpler alternative is to use values for these parameters under

nominal operating conditions of the two networks and program static values for the deci-

Ploss r

sion points. As an example, say we determine using such tomography experiments that

is 0.01, the service provider wants to be 0.3 (to achieve a given utilization), and

is determined to be 5 Erlangs (dependent upon the number of end hosts connected to

enterprise MSPPs at each enterprise and the file-transfer generation rate per host). For

these numerical values, the static crossover file size should be set to 2.7MB for long-dis-

tance transfers and 650KB for local calls in end-host application software. The former

comes from a utilization consideration and the latter from a delay consideration (see Table

7). As to whether a call is a local or long-distance call can be determined from the RTT

measurement taken during TCP connection establishment (as stated in Section 5.1, all

transfers require a TCP connection for short message exchanges).

An implementation of the routing decision module will consist of the routing decision

algorithm described in Chapter 5, and a database maintaining all the network parameters

required by the routing decision algorithm. The database can be dynamic, in which case

network parameters are measured periodically to reflect the latest network state, or static,

in which case parameters under nominal operating conditions of the two networks are

stored in the database.

We present a design for the routing decision module in Figure 32. There are three com-

ponents: (i) database, (ii) pre-computation module, and (iii) run-time module. The

database has a structure similar to that of the forwarding database in an IP router. Each

entry in the database corresponds to a destination IP address or a group of destination IP

addresses. The columns correspond to network parameters along the path between the

local host and the destination host. The pre-computation module executes measurement

tools to obtain network parameter values (though some may be programmed in at the start

Ploss Pb

and left unchanged, such as ). At update, the pre-computation module populates the

database entries and computes the crossover file size for each entry. Upon receiving a

query from the user application, the run-time module consults the database to retrieve the

crossover file size corresponding to the requested destination IP address. The crossover

file size will be compared with the requested transfer size . The routing decision is made

as a result of this comparison as shown in Figure 32.

6.3 Signaling Module Design

To support RESCUE service, the end host must be equipped with a signaling module,

which is able to send/receive signaling messages to/from signaling-capable network

switches and process signaling messages according to the signaling standards. The signal-

ing protocol specifications include IETF’s GMPLS, OIF’s UNI, and ITU-T’s ASON.

These specifications are designed as a common control plane (signaling and routing) for

many connection-oriented networks. GMPLS and ASON support both RSVP-TE and CR-

LDP as signaling protocols, while UNI only supports RSVP-TE signaling. Since RSVP-

Dest IP Ploss Pb Tprop r rc

192.168.0.2 0.01 10% 30ms 100Mbps 100Mbps

... ... ... ... ... ...

192.168.0.8 0.001 10% 30ms 10Mbps 100Mbps

... ... ... ... ... ...

Table lookup

QUERY(f, dest)

File sizecomparison

Attempt circuit setupif f > fc

Use TCP/IP pathif f < fc

Crossoverfile size

Database

Run-timemodule

Pre-computationmodule

Figure 32. Static Routing Decision Module

TE is the protocol implementation by many network switch venders, e.g., Sycamore and

Ciena, we choose RSVP-TE for our end-host signaling module.

6.4 Local-area Testbed Network

6.4.1 Local-area Testbed Network

The goal of our experiments is to demonstrate the end-host RESCUE software and the

file-transfer application on the RESCUE circuit. The experiments will be performed in a

local-area environment within UVA. The testbed network configuration is shown in Fig-

ure 33.

End hosts in the configuration are high-performance workstations (DELL Precision

650). Each workstation has a 2.4-GHz Intel XeonTM CPU connected to a 533-MHz front-

side bus, an E7505 chipset with 512MB of DDR 266MHz memory, an 80GB ATA/100

7200 RPM EIDE disk drive with 2MB cache, and a 64bit/100MHz PCIx bus for periph-

eral devices. The operating systems on workstations are RedHat Linux 9 with version

2.4.20-30.9 kernel to allow the flexible re-configuration. Each workstation is equipped

Sycamoreswitch

CiscoMSPP

Ethernet switch

Application

Signaling

TCP NIC I

NIC II

Dellworkstation 1

RESCUE software

Routingdecision

Application

Signaling

TCPNIC I

NIC II

RESCUE software

Routingdecision

TL1messages

RSVP_TEmessages

Dellworkstation 2

FRTP FRTP

RSVP_TEmessages

Dellworkstation

Figure 33. Local-area Testbed Network Configurations

with two NICs. One is an Intel 82545EM Gigabit Ethernet card and the other is an Intel

82544EI Gigabit Ethernet card. Both Ethernet cards are 64-bit PCIx copper interface

cards, which are capable of transmitting and receiving data at Gbps. The purpose of work-

station 3 is to demonstrate sharing, i.e., after the communication session ends between

workstation 1 and workstation 2, workstation 3 can set up a call to workstation 2 reusing

the emulated wide-area RESCUE link.

The packet-switched Internet is emulated by a Dell PowerConnect 16-port Gigabit

Ethernet switch. It connects to the end hosts through their primary NICs, the control cards

on MSPPs, and the control cards on the circuit-switched crossconnects. All the control

messages, including signaling messages for RESCUE circuit setup and FRTP-related mes-

sages, are routed through this Ethernet switch.

Each Dell workstation has a NIC connecting to the Ethernet card on the MSPP, which is

Cisco ONS 15454 as shown in Figure 33. The latter in turn are connected to the optical

circuit-switched network, which is emulated by one or more Sycamore SN 16000 switches

equipped with GMPLS/UNI/NNI signaling engine.

To test dynamic RESCUE circuit setup, we are implementing a signaling module, a sub-

set of RSVP-TE, at the end hosts. This module will generate signaling messages according

to the GMPLS RSVP-TE signaling standard. Signaling messages are carried within IP dat-

agrams and routed to the control cards of the signaling-capable switches. However, the

current version of Cisco ONS 15454 MSPP control software only implements UNI client-

side (UNI-C). The purpose of UNI-C is to generate circuit setup requests. It does not allow

for the provisioning of connections through the MSPP via UNI/GMPLS signaling. How-

ever, it does offer a Transaction Language 1 (TL1) [77] interface for circuit provisioning.

Therefore, an end-to-end RESCUE circuit is established as two segments: the crosscon-

nection between the Ethernet port connecting the secondary NIC on workstations and the

SONET port on the Cisco MSPP (dashed line in Figure 33), and the wide-area optical cir-

cuit cross the Sycamore circuit switches (solid line in Figure 33). The crossconnection

within the MSPP is set up/released by issuing TL1 messages to the control card on the

MSPP. The optical circuit cross the circuit-switched network is then established using

RSVP-TE signaling.

When a user application at workstation 1 requires a communication path to workstation

2, it first sends a query to the routing decision module of the RESCUE software. The rout-

ing decision module sends back an acknowledgement to notify the user application

whether a RESCUE circuit should be set up or not. If the acknowledgement is positive, the

end-host signaling module will send TL1 messages to the MSPP’s control card to set up a

crossconnection between the Ethernet interface card connected to the workstation 1’s sec-

ondary NIC and the SONET interface card (an OC3 SONET card in our experiment). At

the remote side (PC 2), a similar TL1 session should be triggered to set up the crosscon-

nection upon receiving the notification from PC 1. Meanwhile, the end-host signaling

module on workstation 1 sends signaling messages (RSVP-TE messages) to the Sycamore

switch to trigger a GMPLS circuit setup. If both the MSPP crossconnection setup and

GMPLS circuit setup are successful, the end-host RESCUE software on workstation 1 and

2 will direct the user application to the secondary NICs and start the actual data transfer on

the end-to-end RESCUE circuit cross the enterprise MSPP and optical circuit. The high-

speed transport protocol, FRTP, will be used for data transfers to achieve high end-to-end

transfer throughputs. On the other hand, if either the routing decision module replies with

a negative acknowledgement or the circuit setup fails, the primary NIC and packet-

switched path (the Ethernet switch in Figure 33) will be used.

6.4.2 Extension with VLAN Technique

One extension of the local-area RESCUE testbed network is to not only allow end hosts

to connect their secondary NICs directly into MSPP ports, but also allow Ethernet

switches serving small subnets to be connected to the enterprise MSPP ports for RESCUE

service as shown in Figure 34. In Figure 34, the secondary NICs on end hosts are con-

nected to the enterprise MSPP Ethernet ports through an Ethernet switch with advanced

Virtual LAN (VLAN) function. VLAN is a technique allowing networks to be segmented

logically without having to be physically rewired [78]. By bundling two ports on the

switch into a logical (virtual) subnet, we can effectively establish a dedicated circuit

between two ports. The Extreme Summit4 switch [79] is an Ethernet switch that supports

this VLAN capability. The Extreme switch provides a Command-Line Interface (CLI) to

allow a network administrator to manage its VLAN configurations. For example, PC 1 in

Figure 34 could initiate a CLI command to set up a VLAN (VLAN 1 in Figure 34) associ-

Figure 34. RESCUE Circuit Extension with VLAN Technique

Ethernetswitch

……

CLI commands

Ethernet switchwith VLAN

VLAN # Associated ports

1 & 32 & 3

ating port 1 and port 3. By isolating Ethernet ports of VLAN 1 from other ports on the

switch, we effectively establish a dedicated data path between port 1 and port 3, or effec-

tively a direct circuit between the PC 1 and the enterprise MSPP. The path to the MSPP

through port 3 can be shared among enterprise end hosts by updating the port assignment

of VLAN 1. For example, after PC 1’s communication session ends, we could replace port

1 in VLAN 1 with port 2, and allow PC 2 to set up a dedicated data path between port 2

and port 3.

By introducing VLAN switches into RESCUE network configurations, the end hosts are

no longer required to be connected to the enterprise MSPP Ethernet ports directly. This

allows for the sharing of limited Ethernet port resources on MSPPs among a large number

of end hosts. The VLAN setup procedure can be combined with the MSPP crossconnec-

tion setup procedure described in the previous section when establishing an end-to-end

RESCUE circuit.

Chapter 7 Conclusions and Future Research

In the following sections, the contribution of this research is summarized and future

research are introduced. This research resulted in eight publications, which are listed at the

end of this chapter.

7.1 Summary and Conclusions

In this dissertation, we proposed extending the services of optical networks to end hosts.

This is feasible today given the deployment of fiber to enterprises, MSPPs in enterprises,

and EoS technologies within these MSPPs. Our proposed service called Reconfigurable

Ethernet/SONET Circuits to End Users (RESCUE) offers a means for setting up and

releasing on-demand circuits consisting of Ethernet LAN segments and Ethernet-over-

SONET metro- and/or wide-area segments. RESCUE is proposed as an add-on service to

the currently available Internet access. This allows end host applications to attempt Ether-

net/SONET circuit setup, and if the attempt fails due to a lack of resources, the applica-

tions can fall back to the basic Internet service.

RESCUE service provides an effective way to overcome the three gaps identified in

Section 1.3. First, the dial-up Internet access service using RESCUE circuit enables end

hosts to bypass an enterprise’s heavily shared leased access links and therefore enjoy

much lower packet-loss rates on end-to-end communication paths. Second, end-to-end

QoS guarantees, which are hard to implement in the existing Internet, can be provided by

end-to-end RESCUE circuits. Different QoS requirements can be met by simply setting

different circuit rates. Third, by using a new transport protocol on end-to-end RESCUE

circuits, TCP limitations in HDBP environments can be overcome. Significant improve-

ments can be achieved on data-transfer throughput.

RESCUE circuits are shared on a call-by-call basis, which makes it easy to implement a

“pay more, get more” service. To use RESCUE, end hosts need an additional NIC and a

software upgrade. We carried out a detailed analysis of how the dial-up service and file

transfers can take advantage of RESCUE service in Chapter 4 and Chapter 5 respectively.

The analysis results showed that the end host will enjoy a much shorter file-transfer delay

on RESCUE circuits than on the TCP/IP path if the circuit setup is successful.

The RESCUE concept brings in a new idea of leveraging the Internet in developing the

circuit-switched service. We realized that a pure circuit-switched network service is hard

to deploy in a standalone mode for the following reasons: (i) not all types of applications

are suited for RESCUE circuits, such as small-file transfers and Variable Bit-Rate (VBR)

applications, (ii) without the Internet as a fallback option, the circuit-switched service will

have to be operated in a low call-blocking probability. To achieve a low call-blocking

probability, network utilization will have to be sacrificed especially during the service

growth period when traffic load is low, and (iii) without the Internet path for reverse-direc-

tion control message transport, the circuit-switched network will need to support both

low-rate and high-rate circuits, making the switches more expensive.

RESCUE solution calls for a revolutionary combined usage of two types of networks, a

circuit-switched network and a packet-switched network. RESCUE proposes a “parallel-

hybrid” network architecture in contrast to today’s “sequential-hybrid” network architec-

ture. In this “parallel-hybrid” network, the primary connectionless packet-switched Inter-

net is not only used as a backup path for those applications failing to obtain a RESCUE

circuit, it also can be used to carry control messages for the data transfers on end-to-end

RESCUE circuits. These are two key features that make our network architecture feasible

to introduce and grow while constantly achieving high utilization. To our knowledge, this

“parallel-hybrid” solution has not been proposed elsewhere.

The long-term objective of our research work is to create a large-scale circuit-switched

network providing commodity services. Scalability and network utilization are two

widely-used network design criteria. In RESCUE service, the network scalability problem

is addressed by using dynamic, distributed end-to-end circuit provisioning with signaling

protocols. In contrast to the centralized approach, signaling protocols enable distributed

provisioning, and therefore allow the network to grow to any size. The network utilization

problem is addressed by creating commodity applications, which will help increase traffic

loads, and by using the Internet as a back-up path. By allowing both small data transfers

and large data transfers, we envision the creation of high traffic load and corresponding

higher network utilization, which translates to low costs seen to users. Per-circuit utiliza-

tion is also considered by using superfast provisioning and rate-based flow control in

RESCUE service. Superfast provisioning is possible with the distributed signaling

approach, which does not entail the human and/or central management intervention. Hard-

ware-accelerated signaling can be used to further speed up the signaling processing capa-

bility of network switches. A transport protocol with a rate-based flow control should be

used on end-to-end RESCUE circuit to achieve 100% bandwidth utilization.

In the implementation chapter, we discussed the key features needed in a transport pro-

tocol that works in conjunction with end-to-end file transfer applications using RESCUE

service. We called this protocol Fixed Rate Transport Protocol (FRTP), one that uses a

rate-based flow control scheme and a selective-ARQ based error control scheme. An

application-level implementation of FRTP based on UDP sockets was then presented

along with experimental results. Our work extended previous work on transport protocols

significantly. Different from TCP and other transport protocols designed for IP-based net-

work, FRTP is designed for data transfers on dedicated end-to-end circuits. The goal is to

generate a constant sending rate to match the circuit rate, and therefore achieve high cir-

cuit utilization.

For the experimental results, we concluded that the rate-based flow control in FRTP

implementation is effective when there is no other process running at end hosts. FRTP

successfully produces a constant and accurate sending rate during the data transfer. The

experimental results also showed that FRTP is able to achieve a very high throughput, lim-

ited only by the end-host configurations. A better performance can be achieved by care-

fully tuning several parameters, such as UDP buffer size, FRTP buffer size, and packet

However, we also noticed that the performance of FRTP downgrades when there are

other concurrent CPU-intensive processes running on the end hosts. This is because FRTP,

as an application-level implementation, needs a lot of CPU cycles when the sending rate is

high, and therefore leaves little CPU time for other processes. We realized that this situa-

tion is hard to avoid given the current-day general-purpose end hosts and non-realtime

operating systems. This problem not only adds a constraint for the usage of FRTP, but also

reminds us of a shortcoming of our circuit-switched solution, which is as follows: a cir-

cuit-switched network is not adaptive to changes in data-transfer rates caused by the vari-

ability of data processing at end hosts. In a packet-switched network, this is not a problem

because the bandwidth gap left by a reduction of the sending rate in one data flow can be

filled by other data flows. However, in a circuit-switched network, since the circuit band-

width allocation remains unchanged throughout the transfer, any reduction in FRTP

throughput would result in poor circuit utilization. We will explore solutions to this prob-

lem in our future work.

7.2 Future Research

7.2.1 Extension to Multi-protocol Interworking

Ideally, if all signaling-capable switches along the end-to-end path support the same sig-

naling specification, the circuit provisioning procedure will be quite simple and standard.

However, such a standard signaling-driven circuit-switched network does not exist end-to-

end today. First, Ethernet switches and IP routers dominate today’s local-area networks.

They are generally connectionless switches. Second, in the circuit switch equipment

industry, different vendors support different signaling specifications based on their own

considerations. Usually one vendor’s switch is not compatible with a switch from a differ-

ent vendor unless they have protocol conversion capabilities. Although vendors are mak-

ing efforts to allow for the interconnection of their networking products, a full signaling-

interoperable network is still under development.

Differentiated by signaling capabilities, the whole network can be divided into multiple

autonomous sets, as illustrated in Figure 35. At first glance it appears that a dedicated end-

to-end circuit can only be established between two hosts connected by a single circuit-

switched network, e.g. host 1 and host 3 in Figure 35 connected by a GMPLS network. In

this case, a client-side GMPLS signaling implementation at end hosts is sufficient. How-

ever, for communications between two hosts connected by different types of networks, a

simple signaling solution will not work. For example, for communications between end

host 1 and end host 2 in Figure 35, GMPLS signaling alone is not sufficient because all

other networks on the end-to-end path do not support GMPLS.

To solve this problem, we propose an external “signaling agent”, which works like a

coordinator and a translator between different types of networks. The signaling agent is

able to talk to different networks because it will consist of multiple signaling modules.

When a circuit setup involves an end-to-end path across different networks, the signaling

agent sends signaling messages to each of autonomous network to initiate the intra-area

circuit setup. It also coordinates the neighboring networks to set up the inter-area circuits

between edge nodes.

With the signaling agent, the remaining question is how to set up dedicated data paths

within each autonomous packet-switched network. In Section 6.4.2, we already noted that

a dedicated data path can be set up in Ethernet-based LANs by using the VLAN technol-

ogy. Multi-Protocol Label Switching (MPLS) has been implemented in some IP routers.

Figure 35. A Representation of Networks Differentiated by Signaling Capabilities

Circuit-switched network 2(UNI)

Circuit-switched network 1(GMPLS)

Packet-switched network 2(VLAN)

Signaling module

End host 1

Signaling module

End host 2

Signaling agent

...... ......Signaling module

End host 3

GMPLS module UNI

module

Packet-switched network 1(MPLS)

......

MPLS module

VLAN module

For example, IP routers in the Abilene backbone network of Internet2 [80] have MPLS

implementation. MPLS is a connection-oriented packet-switched technology [81]. It sup-

ports traffic-engineered Label-Switched Path (LSP) setup and provides service-level guar-

antees for LSPs. By associating a strict bandwidth guarantee with the LSP, a dedicated

data path can effectively be set up across a network of IP routers/MPLS switches.

With the help of all the above technologies, it is possible to set up an end-to-end RES-

CUE circuit by sequentially concatenating multiple network segments along the end-to-

end path, such as SONET circuits, MPLS LSPs, and VLAN Ethernet paths.

7.2.2 Wide-area Testbed Network

After completing local-area experiments, we plan to run the same experiments on a wide

area. Based on costs, we plan to use either point-to-point links between UVA, CUNY and

ORNL, or a star configuration as illustrated in Figure 36. The reason we have two MSPPs

per campus is that typically external circuits arrive at one or two buildings on campus

(labeled “Telcom Building” in Figure 36), while researchers have laboratories in other

buildings (labeled “Research Building” in Figure 36). For example, at UVA, we found that

EndHosts

Figure 36. Configuration of Wide-area Testbed Network

Research

Telcom Building

Building

UVA campus

EndHosts

Research

Telcom Building

Building

ORNL campus

EndHosts

Research

Telcom Building

Building

CUNY campus

Point-to-point option(similar circuits canbe setup between UVA andORNL or CUNY and ORNL)

Star optionCollocation

facility

OC48 across campus OC48 across campus OC48 across campus

there is sufficient fiber installation but to connect our laboratory to UVA’s main telecom-

munication’s building will cost about $2,500 a year (on-campus fiber leasing service

[82]). Since GbE signals travel limited distances even with single-mode fiber (of the order

of 3 miles), we propose using an MSPP within the Research Building to carry the Ethernet

frames long-distance using SONET. In the star configuration, collocation service can be

obtained to place a SONET crossconnect at a central location. For example, Switch and

Data company [83] provides a collocation facility in Reston, VA, to which we could lease

OC48 links from each of the three campuses shown in Figure 36 to test the network con-

figuration.

7.2.3 Call Scheduling in RESCUE

In RESCUE file-transfer applications, resources of optical link are shared in a call-

blocking mode, in which the link capacity is typically subdivided between application

streams that share the link. Once the bandwidth allocation is made at the start of the trans-

fer, it remains unchanged throughout the transfer. We refer to this mode of usage of band-

width as a “fixed-bandwidth circuit switching.” In contrast, packet-switched networks

share the capacity resources packet by packet, in which all files are divided into packets,

and packets are sent one after another using the full capacity of the link. If there are

streams sending packets, then effectively each stream receives the same share of the link

capacity as in circuit-switched networks. However, when one or more traffic streams com-

plete their file transfers, the remaining transfers can take advantage of the bandwidth made

available by the completed transfers. The consequence of such bandwidth partitioning is

that transfers experience larger average delays with fixed-bandwidth circuit switching

than with packet switching.

Noticing that the fundamental source of poor average delay performance of fixed-band-

width circuit switching for file transfers is their inability to take advantage of bandwidth

that becomes available subsequent to the start of a transfer, we propose a scheme in which

the capacity allocated for a file transfer varies from time range to time range. We call this

scheme Varying-Bandwidth List Scheduling (VBLS) [84]. This is unlike the fixed-

bandwidth allocation mode where a fixed assignment of bandwidth is made for the entire

duration of the transfer. With information on the size of the file that an end host wants to

transfer, the network can fit this file into time ranges when bandwidth is available based

on the capacity allocations of ongoing transfers. This allows the network to offer an

incoming file transfer an increased amount of bandwidth for future time ranges whenever

there are fewer competing transfers. We will explore more details of VBLS in future work.

7.2.4 Router Disconnect

We realized that there is a practical problem in deploying RESCUE. While RESCUE

solution provides end hosts a choice of a second path, with its distinctly different service

quality when compared to the packet-by-packet shared IP path, has its obvious advan-

tages, the costs lie in the additional infrastructure needed to support such a deployment.

Access link costs are especially of concern. It is expensive for an enterprise to lease a sec-

ond access link for RESCUE service, one that terminates on a signaling-capable SONET/

SDH/WDM switch. We are currently exploring a solution to this problem in which such

an additional access link is not necessary. The solution is to use the enterprise-router-to-

ISP-router access link to carry Internet traffic in default mode and then dynamically reas-

sign its capacity (or a part of its capacity) as needed for RESCUE circuits.

We show an experimental setup of two Cisco 12008 routers [85] along with Cisco 15454

MSPPs in Figure 37. The MSPP OC3 interface cards have four ports as do the 12008

router OC3 cards. We can connect these ports as shown in Figure 37. In default mode, two

leased circuits are setup between the two routers and the forwarding data tables at these

routers are setup to use both these circuits (by setting some of the “east” host addresses to

be reached via one port at the west side router with others using the second port and vice

versa at the east side router). When a RESCUE circuit setup request is generated, we will

have the RESCUE software first send a message to these routers to disable one of these

OC3 circuits and then set up the end-to-end Ethernet-EoS-Ethernet circuit illustrated by

the dashed line in Figure 37. Normally disabling an interface on the router triggers OSPF

routing protocol to update the router’s routing table and leads to packet losses during

updating period. Cisco 12008 routers, however, support link bundling, a technique group-

ing multiple links together into one logical link to provide higher bandwidth, redundancy,

and load sharing between links [86]. For example, in Figure 37, two OC3 enterprise leased

circuits can be bundled into one logical link. When one of these OC3 circuits is disabled

for RESCUE service, the router automatically routes the Internet traffic to the remaining

circuit. No packet would be lost. The removed OC3 circuit can be restored for default

Internet traffic after the RESCUE circuit is released. Questions of how quickly these

Control XC OC3

MSPP I

Gb/s GRP Sw. OC3

12008 router

Hostsfabric

12008 router

Control XCOC3

MSPP II

GRP Sw.OC3fabric

Figure 37. The Concept of Router Disconnect

Ethernet

10/100Ethernet

Gb/sEthernet(west)

(east)

SONET network(emulated by fibers)

updates happen and whether such a dynamic bandwidth change of router-to-router circuits

causes problems in TCP flows will be answered with the experiment.

7.3 Publications

As a result of our research, the following papers are presented and submitted to interna-

tional conferences, journals, and magazines:

• M. Veeraraghavan and X. Zheng, “A Reconfigurable Ethernet/SONET Circuit Based

Metro Network Architecture,” IEEE JSAC on Advances in Metropolitan Optical

Networks (Architectures and Control), 2004.

• M. Veeraraghavan, X. Zheng, W. Feng, Hojun Lee, E. Chong, and H. Li, “Scheduling

and transport for file transfers on high-speed optical circuits,” Journal of Grid

Computing on High Performance Networking, 2004.

• X. Zheng, M. Veeraraghavan, and H. Lee, “Using Dial-Up Optical Circuits to Address

the Access Link Bottleneck Problem,” Under revision based on reviews from Infocom

• Best student paper award, M. Veeraraghavan, X. Zheng, H. Lee, M. Gardner, and W.

Feng, “CHEETAH: Circuit-switched High-speed End-to-End Transport

ArcHitecture,” Proceeding of Opticomm 2003, Dallas, TX, Oct. 13-16, 2003.

• M. Veeraraghavan, D. Logothetis, and X. Zheng, “Using dynamic optical networking

for high-speed access,” Optical Networks Magazine, special issue on “Dynamic

Optical Networking around the Corner or Light Years Away?”, vol. 4, no. 5, pp. 30-40,

Sep. 2003.

• M. Veeraraghavan, X. Zheng, W. Feng, H. Lee, E. Chong, and H. Li, “Scheduling and

Transport for File Transfers on High-speed Optical Circuits,” PFLDnet 2003, Chicago,

Feb. 16-17, 2004.

• M. Veeraraghavan, H. Lee, and X. Zheng, “File transfers across optical circuit-switched

networks,” PFLDnet 2003, Geneva, Switzerland, Feb. 3-4, 2003.

• T. Moors, M. Veeraraghavan, Z. Tao, X. Zheng, and R. Badri, “Experiences in

automating the testing of SS7 Signaling Transfer Points,” International Symposium on

Software Testing and Analysis (ISSTA), Via di Ripetta, Rome - Italy, July 22-24, 2002.

Bibliography

[1] M. Sakaguchi and K. Kaede, “Optical switching device technologies,” IEEE Commu-

nications Magazine, vol. 25, pp. 27-32, May 1987.

[2] M. Veeraraghavan, M. Karol, R. Karri, R. Grobler, and T. Moors, “Architectures and

protocols that enable new applications on optical networks,” IEEE Communications

Magazine, vol. 39, pp. 118-127, March 2001.

[3] S. Yao, B. Mukherjee, and S. Dixit, “Advances in photonic packet switching: an over-

view,” IEEE Communications Magazine, vol. 38, pp. 84-94, February 2000.

[4] Bellcore Publication GR-253-Core “Synchronous Optical Network (SONET) Trans-

port Systems: Common Generic Criteria,” January 1999.

[5] ITU-T, “Recommendation G.784: Synchronous Digital Hierarchy (SDH) manage-

ment,” June 1999.

[6] M. Kuznetsov, M. M. Froberg, S. R. Henion, H. G. Rao, J. Korn, K. A. Rauschenbach,

E. H. Modiano, and V. W. S. Chan, “A Next-Generation Optical Regional Access Net-

work,” IEEE Communications Magazine, vol. 38, pp. 66-72, January 2000.

[7] I. Habib, D. Awduche, and A. Fumagalli, “Advances in Metropolitan Optical Net-

works (Architectures and Control),” IEEE JSAC Call for Papers, http://www.argreen-

house.com/society/J-SAC/Calls/met_optical.html.

[8] OGSI, “Open Grid Services Infrastructure v1.0 (Draft 29),” http://www.gridforum.org/

ogsi-wg/, April 5, 2003.

[9] OGSI, “Grid Service Specification (Draft 8),” http://www.gridforum.org/ogsi-wg/,

February 2, 2003.

[10] M. Sampson, “World's First Working Prototypes of User Control of Lightpaths Dem-

onstrated,” http://www.canarie.ca/canet4/obgp/index.html, May 27, 2003.

[11] E. Mannie, “GMPLS Architecture,” IETF Internet Draft, http://www.ietf.org/internet-

drafts/draft-ietf-ccamp-gmpls-architecture-07.txt, May 2003.

[12] P. Ashwood-Smith, et al. “Generalized MPLS - Signaling Functional Description,”

IETF Internet Draft, http://www.ietf.org/proceedings/01dec/I-D/draft-ietf-mpls-gen-

eralized-signaling-07.txt, November 2001.

[13] P. Ashwood-Smith, et al. “Generalized MPLS - RSVP-TE Extensions,” IETF RFC

3473, January 2003.

[14] OIF Architecture, OAM&P, PLL, & Signaling Working Groups, “User Network Inter-

face (UNI) 1.0 Signaling Specification,” http://www.oiforum.com/public/documents/

OIF-UNI-01.0.pdf, October 2001.

[15] ITU-T, “Recommendation G.8080/Y.1304: Architecture for Automatic Switched Op-

tical Networks (ASON),” http://www.itu.int/itudoc/itu-t/aap/sg15aap/history/g8080/.

[16] A. Parikh, “Ethernet enlightens optical access,” Network World, Oct. 9, 2000, http://

www.nwfusion.com/news/tech/2000/1009tech.html.

[17] T. Brooks, “Optical networks: At your service,” Network World, Apr. 10, 2000, http:/

/www.nwfusion.com/columnists/2000/0410brooks.html.

[18] X. Zheng, “Internet traffic measurement experiments,” http://www.ece.virginia.edu/

~xz3y/research/measurements/measureindex.html.

[19] J. Michael and I. Graham, “The Auckland data set: an access link observed,” Proceed-

ing of the 14th ITC Specialists Seminar on Access Networks and Systems, April 2001.

[20] First International Workshop on Protocols for Fast Long-Distance Networks, PFLDnet

2003, http://datatag.web.cern.ch/datatag/pfldnet2003/, Geneva, Switzerland, February

3-4, 2003.

[21] W. Feng and P. Tinnakornsrisuphapá, “The Failure of TCP in High-Performance Com-

putational Grids,” Proceeding of SC2000: High-Performance Network and Computing

Conference, Dallas, TX, November 2000.

[22] S. Floyd, “HighSpeed TCP for Large Congestion Windows,” IETF RFC 3649, De-

cember 2003.

[23] C. Jin, D. X. Wei, and S. H. Low, “FAST TCP: motivation, architecture, algorithms,

performance,” Proceeding of IEEE Infocom 2004, March 2004.

[24] T. Kelly, “Scalable TCP: Improving Performance in HighSpeed Wide Area Net-

works,” PFLDnet 2003, http://datatag.web.cern.ch/datatag/pfldnet2003/, February 3-

4, 2003, Geneva, Switzerland.

[25] J. Semke, J. Mahdavi, and M. Mathis, “Automatic TCP Buffer Tuning,” Proceeding

of ACM SIGCOMM 1998, pp. 315-323, October 1998.

[26] W. Feng, M. Gardner, M. Fisk, and E. Weigle, “Automatic Flow-Control Adaptation

for Enhancing Network Performance in Computational Grids,” Journal of Grid Com-

puting, vol. 1, pp. 63-74, 2003.

[27] M. Gardner, W. Feng, and M. Fisk, “Dynamic Right-Sizing in FTP (drsFTP): An Au-

tomatic Technique for Enhancing Grid Performance,” Proceeding of the 11th IEEE

Symposium on High-Performance Distributed Computing, Edinburgh, Scotland, July

[28] D. Katabi, M. Handley, and C. Rohrs. “Internet congestion control for high bandwidth-

delay product network,” Proceeding of ACM SIGCOMM, Pittsburgh, August 2003.

[29] M. Mathis, “Raising the Internet MTU,” http://www.psc.edu/~mathis/MTU/.

[30] N. S. V. Rao and W. C. Feng, “Performance trade-offs of TCP adaptation methods,”

Proceeding of Intl. Conf. Networking, 2002.

[31] T. Dunigan, M. Mathis, and B. Tierney, “A TCP Tuning Daemon,” Proceeding of the

2002 ACM/IEEE conference on Supercomputing, Baltimore, Maryland, July 2002.

[32] D. Comer, “Internetworking with TCP/IP, Volume I: Principles, Protocols, and Archi-

tecture,” Prentice Hall, 1991.

[33] DOE Office Of Science High Performance Network Planning Workshop, http://do-

ecollaboratory.pnl.gov/meetings/hpnpw/workshopdescription.pdf, August 13-15,

[34] “End-To-End Provisioned Optical Network Testbed for Large-Scale eScience Appli-

cations,” http://www.ece.virginia.edu/~mv/html-files/ein-home.html.

[35] “About NetworkVirginia,” http://www.networkvirginia.net, information posted in

March 2003.

[36] UCAID costs, http://ncne.nlanr.net/training/techs/2000/000515/Talks/love1-

jt05152000/tsld014.htm.

[37] IEEE 802.17, Resilient Packet Ring Working Group, http://grouper.ieee.org/groups/

802/17/documents.htm.

[38] D. Tsiang and G. Suwala, “The Cisco SRP MAC Layer Protocol,” IETF RFC 2892,

August 2000.

[39] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling TCP Throughput: A Sim-

ple Model and its Empirical Validation,” IEEE/ACM Transaction on Networking, vol.

9, pp. 31-46, February 2001.

[40] N. Cardwell, S. Savage, and T. Anderson, “Modeling TCP Latency,” Proceeding of

IEEE Infocom, vol. 3, pp. 1742-1751, Tel-Aviv, Israel, March 2000.

[41] M. Allman, V. Paxson, and W. Stevens, “TCP Congestion Control”, IETF RFC 2581,

April 1999.

[42] R. Braden, D. Clark, and S. Shenker, “Integrated Services in the Internet Architecture:

an Overview,” IETF RFC 1633, June 1994.

[43] K. Nichols, S. Blake, F. Baker, and D. Black, “An Architecture for Differentiated Ser-

vices,” IETF RFC 2474, December 1998.

[44] P. Molinero-Fernandez and N. McKeown, “TCP switching: exposing circuits to IP,”

IEEE Micro, vol. 22, pp. 82-89, January-February 2002.

[45] “CANARIE's CA*net 4”, http://www.canarie.ca/canet4/.

[46] “Starlight”, http://www.startap.net/starlight/.

[47] “SURFnet”, http://www.surfnet.nl/en/.

[48] “UKlight”, http://www.ja.net/development/UKLight/.

[49] “DOE UltraScience Net”, http://www.csm.ornl.gov/ultranet/.

[50] Canarie network, “User Controlled Lightpaths (UCLP),” http://www.canarie.ca/

canet4/uclp/.

[51] ITU-T, “Recommendation G.7041: Generic Framing Procedure (GFP),” October

[52] ITU-T, “Recommendation G.707: Network Node Interface for the Synchronous Digi-

tal Hierarchy,” October. 2000.

[53] Special Issue of IEEE Communications Magazine on “Generic Framing Procedure

(GFP) and Data over SONET/SDH and OTN,” May 2002.

[54] Fujitsu, “FLM 150 ADM: Flexible OC-3 and OC-12 Add/Drop Multiplexer,” http://

us.fujitsu.com/services/Telecom/ByCateg/MetroEdgeNAccess/.

[55] Cisco, “Cisco ONS 15454 Optical Transport Platform,” http://www.cisco.com/en/US/

products/hw/optical/ps2006/ps2010/index.html.

[56] Ciena, “CIENA MultiWave MetroDirector K2™ Next-Generation Multi-Service Ac-

cess and Switching Platform,” http://www.ciena.com/products/k2/k2.htm.

[57] G. Beranano, et al., “Achieving UNI and NNI Interoperability,” OIF Forum, http://

www.oiforum.com/public/documents/OFC03_WP.pdf.

[58] M. Veeraraghavan, H. Lee, and R. Grobler, “A low-load comparison of TCP/IP and

end-to-end circuits for file transfers,” INET 2002, Arlington, VA, June 18-21 2002.

[59] V. Paxson and S. Floyd, “Wide-Area Traffic: The Failure of Poisson Modeling,” IEEE/

ACM Transaction on Networking, vol. 3, pp. 226-244, June 1995.

[60] H. Wang, M. Veeraraghavan, and R. Karri, “A hardware implementation of a signaling

protocol,” Proceeding of Opticomm 2002, Boston, MA, July 29-August 2, 2002.

[61] DOE Office Of Science High Performance Network Planning Workshop, http://do-

ecollaboratory.pnl.gov/meetings/hpnpw/workshopdescription.pdf, August 13-15,

[62] M. D. Brown, “Blueprint for the future of high-performance networking introduction,”

Communications of ACM, Vol. 46, No. 11, pp. 30-33, November 2003.

[63] C. de Laat, G. Gross, L. Gommans, J. Vollbrecht, and D. Spence, “Generic AAA Ar-

chitecture,” IETF RFC 2903, August 2000.

[64] W. Whitt, “Blocking when service is required from several facilities simultaneously,”

AT&T Technical Journal, vol. 64, pp. 1807-1856, October 1985.

[65] M. E. Crovella and A. Bestavros, “Self-similarity in World Wide Web Traffic Evi-

dence and Possible Causes,” IEEE/ACM Transaction on Networking, vol. 5, pp. 835-

846, December 1997.

[66] Y. Gu, X. Hong, M. Mazzucco, and R. L. Grossman, “SABUL: A High Performance

Data Transfer Protocol,” submitted to IEEE COMMUNICATIONS LETTERS.

[67] Y. Gu and R. L. Grossman, “End-to-End Congestion Control for High Performance

Data Transfer,” submitted to IEEE/ACM Transaction on Networking.

[68] “Tsunami,” http://www.indiana.edu/~anml/anmlresearch.html.

[69] E. He, J. Leigh, O. Yu, and T. A. DeFanti, “Reliable Blast UDP: Predictable High Per-

formance Bulk Data Transfer,” Proceeding of the IEEE Cluster Computing 2002, pp.

317-324, Chicago, Illinois, September 23-26, 2002.

[70] ANSI, “Information Technology - Scheduled Transfer Protocol (ST),” T11.1/Proj.

1245-M/Rev 4.0, October 2000.

[71] IETF, “Remote Direct Data Placement (RDDP),” http://www.ietf.org/html.charters/

rddp-charter.html.

[72] “Bonnie,” http://www.textuality.com/bonnie/.

[73] “TCPDUMP Public Repository,” http://www.tcpdump.org.

[74] B. Mah, “Pchar: A Tool for Measuring Internet Path Characteristics,” http://www.em-

ployees.org/~bmah/Software/pchar/.

[75] C. Dovrolis and R. Prasad, “Pathrate: A measurement tool for the capacity of network

paths,” http://www.cc.gatech.edu/fac/Constantinos.Dovrolis/pathrate.html.

[76] T. Bu, N.G. Duffield, F. Lo Presti, and D. Towsley, “Network Tomography on General

Topologies,” Proceedings of ACM SIGMETRICS 2002, pp. 21-30, Marina Del Rey,

California, 2002.

[77] Cisco ONS 15454 TL1 Command Guide, http://www.cisco.com/en/US/products/hw/

optical/ps2006/products_command_reference_book09186a00801a42d3.html.

[78] IEEE, “Standard 802.1Q: Virtual Bridged Local Area Networks,” May 2003.

[79] Extreme Networks, “Extreme Summit Ethernet switch,” http://www.extremenet-

works.com/products/summit/.

[80] “Abilene backbone network”, http://abilene.internet2.edu/.

[81] D. Awduche, L. Berger, T. Li, V. Srinivasan, and G. Swallow, “RSVP-TE: Extensions

to RSVP for LSP Tunnels,” IETF RFC 3209, December 2001.

[82] ITC UVA, “Fiber Leasing Service,” http://www.itc.virginia.edu/netops/fiber-leas-

ing.html.

[83] Switch and Data Colocation Service, http://www.switchanddata.com/Locations/

ListOfLocations/3101.

[84] M. Veeraraghavan, H. Lee, E. K. P. Chong, and H. Li, “A varying-bandwidth list

scheduling heuristic for file transfers,” Proceeding of IEEE ICC2004, Paris, France,

June 20-24, 2004.

[85] Cisco, “Cisco 12008 Gigabit Switch Router,” http://www.cisco.com/en/US/products/

hw/routers/ps167/ps191/index.html.

[86] Cisco, “Link Bundling on Cisco 12000 Series Internet Routers,” http://

www.cisco.com/en/US/products/sw/iosswrel/ps1829/

products_feature_guide09186a0080103708.html.

Enabling New Applications with Optical Circuit- Switched Networks · FRTP: Fixed Rate Transport...

Documents

Transmembrane E3 ligase RNF183 mediates ER stress-induced … · IB Calnexin IB GFP 100 55 70 GFP-RNF183 55 β-Tubulin 55 IB β-Tubulin IB GFP P GFP-RNF183 BiP 55 70 IB GFP IB BiP

GFP Kit Manual

Supplementary Figures and Tables - media.nature.com · A B Pmir-34::gfp Pmir-34 ∆seq1::gfp Pmir-34 ∆seq2::gfp whole dauer hypodermis sequence 1 sequence 2 Supplementary Figure

GFP Monografia 2009

GFP-OsCAF1B (L)

GFP Protein

gfp manual from clonech

GFP Transformation Lab

Cloning GFP into Mammalian cells - Aarhus Universitet · Protocol for cloning GFP into mammalian cells Aarhus University, Studiepraktik 2014 2 Introduction In your three days at Aarhus

DIG-GFP Imunoflousecence analysis

Gordon Faragher 19 November 2009 GFP Conference Personal tax update GFP Conference Personal tax update

Prasher - GFP Structure

Cellular/Molecular Jedi ... · Jedi-GFP and MEGF10-GFP construction was described previously (Wu et al., 2009). Jedi-GFP and MEGF10-GFP mutants were ... per well on collagen-coated

Review seminar gfp

i.p : GFP- Tspan blot: GFP- Tspan

GFP Experiment

Transformation of GFP

Product Literature BRINGING EFFICIENCY TO SUPERCHANGER … · 2020. 11. 23. · Plates are available in five models—designated GFP-205, GFP-187, GFP-145, GFP-097 and GFP-057, with

Official pGLO GFP powerpoint Spring 2005ronsmansp/Official_pGLO_GFP_slides.pdf · Green Fluorescent Protein (GFP) Chromatography Kit GFP Purification kit advantages • Links to real

Visualizing Cellular Phosphoinositide Pools with GFP …wang.eng.ucsd.edu/protocol/3. Cell Biology and...PX-domains (described first in NADPH oxidase sub-units) (6) bind phosphatidylinositol