View
2
Download
0
Category
Preview:
Citation preview
Enabling New Applications with Optical Circuit-
Switched Networks
_____________________________________________________________
A Dissertation
Presented to
the Faculty of the School of Engineering and Applied Science
University of Virginia
_____________________________________________________________
In Partial Fulfillment
of the Requirement for the Degree
Doctor of Philosophy
(Electrical Engineering)
_____________________________________________________________
by
Xuan Zheng
May 2004
Approval Sheet
The dissertation is submitted in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy (Electrical Engineering)
__________________
Xuan Zheng (Author)
This dissertation has been read and approved by the Examining Committee:
Professor Malathi Veeraraghavan (advisor)_______________________________
Professor Zongli Lin (Chairman) _______________________________
Professor Joanne Bechta Dugan _______________________________
Professor Maite Brandt-Pearce _______________________________
Dr. Wu-chun Feng _______________________________
Accepted for the School of Engineering and Applied Science:
_________________________
Dean, School of Engineering and Applied Science,
University of Virginia.
May, 2004
Abstract
New inventions in optical communications components are driving advances in net-
working architectures and protocols. However, user needs are not met by current network
solutions. Three gaps between user needs and network limitations are identified in this dis-
sertation. To bridge these gaps, we propose an optical circuit-switched solution called
Reconfigurable Ethernet/SONET Circuits for End Users (RESCUE). This solution is pro-
posed as an add-on to the primary Internet service already available to end users. It allows
the optical circuit-switched network to be operated in a call-blocking mode because the
primary Internet access path can be used as a fall-back option if the call setup attempt is
blocked. In RESCUE service, the circuits would essentially connect the end users directly
to either a service provider router or another end user in an optical circuit-switched net-
work. It allows end-host applications to enjoy direct high-speed Ethernet/SONET circuits.
We propose two types of applications using RESCUE service: (i) Dial-Up service for
Internet access, and (ii) end-to-end file transfers. They are proposed to overcome the three
gaps between user needs and network limitations. In this dissertation, we describe archi-
tectures and operations of these two applications. The routing decision algorithms for both
applications are proposed and quantitatively analyzed based on data-transfer delays and
network utilization. Analysis results show that a significant improvement in throughput
can be realized for data transfers in these two applications.
To implement applications that use the RESCUE service, we design and implement
three modules: a transport protocol module, a routing decision module, and a signaling
module. A high-speed transport protocol call Fixed Rate Transport Protocol (FRTP) is
proposed to substitute TCP over end-to-end RESCUE circuits to achieve better through-
put. The design and the implementation of FRTP with rate-based flow control and selec-
tive-ARQ error control are presented in this dissertation. The experimental results of this
implementation are presented in the context of our local-area testbed network. A routing
decision module is proposed to determine whether or not to attempt a RESCUE circuit
setup when end hosts have a choice of two communication paths. A signaling module is
needed to set up/release the RESCUE circuits.
The configuration of our local-area testbed network and the experiments designed for
this testbed network are introduced. A VLAN-based extension for local-area testbed net-
works is suggested to enhance the RESCUE service.
Finally, we list a number of enhancements that can be made to improve the RESCUE
service. These are described in the future work section.
To my wife Jie and my parents for their love and support
Acknowledgements
“Challenge” is the most appropriate word to describe the process of obtaining a doctoral
degree. Today, I am so glad that I am completing my doctoral study and I have learned
such advanced and interesting concepts in the field of networking. At this point, I would
like to thank everybody who helped me during this four-year process.
I would like to thank Prof. Malathi Veeraraghavan, my advisor, for her consistent guid-
ance and support throughout my doctoral program. Her extensive knowledge, enlighten-
ing direction, and continuous encouragement made my dissertation work smooth, positive,
and enjoyable. Besides the academic research, Prof. Veeraraghavan also provided long
hours of counseling, especially in improving my writing skills. She has been and will
always be an inspiration and an excellent role model.
I would then like to express my most sincere appreciation to my Ph. D. program com-
mittee members, Prof. Joanne Bechta Dugan, Prof. Maite Brandt-Pearce, Dr. Wu-chun
Feng, and Prof. Zongli Lin, for their generous help and numerous advices during my pro-
gram.
I would also like to thank the all other students in our research group, Anant Padmanath
Mudambi, Haobo Wang, Hojun Lee, Tao Li, Xiangfei Zhu, and Zhanxiang Huang, for
their friendship and kindly help.
Finally, this hard work is impossible to finish without the continuous love and support
from my dear wife Jie and my parents back in China. I dedicate this dissertation to them.
Contents
Chapter 1 Background and Problem Statement 1
1.1 Current Optical Switching Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Current Optical Network Architectures and Applications . . . . . . . . . . . . . . . . . . . 2
1.3 Gaps between User Needs and Current Network Solutions . . . . . . . . . . . . . . . . . . 4
1.3.1 Access Link Bottleneck Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3.2 TCP Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3.3 Difficulty in Creating End-to-end Connections to Meet Delay/Jitter Require-
ments of Interactive Real-time Applications. . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 2 Related Work 132.1 Related Work in the Packet-Switched Networking Community . . . . . . . . . . . . . . 13
2.1.1 Related Work to Address the Access Link Bottleneck Problem. . . . . . . . . . 132.1.2 Related Work to Address the TCP Limitations. . . . . . . . . . . . . . . . . . . . . . . 162.1.3 Related Work to Address the Difficulty in Providing End-to-end QoS . . . . 17
2.2 Related Work in the Circuit-Switched Networking Community . . . . . . . . . . . . . . 18
Chapter 3 Proposed RESCUE Service 19
3.1 Enabling Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Architecture and Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 RESCUE as an “Add-on” Service to Primary Internet Access . . . . . . . . . . . . . . . 24
3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Chapter 4 Application I: Dial-Up Internet access service using RESCUE cir-cuits 29
4.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Analytical Basis for the Routing Decision: Delay Analysis . . . . . . . . . . . . . . . . . 33
4.3 Analytical Basis for the Routing Decision: Utilization Analysis . . . . . . . . . . . . . 40
4.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Chapter 5 Application II: End-to-end RESCUE Circuits to Improve File Transfer Delays 43
5.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 Analytical Basis for the Routing Decision: Delay Analysis . . . . . . . . . . . . . . . . . 46
5.3 Analytical Basis for the Routing Decision: Utilization Analysis . . . . . . . . . . . . . 51
5.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Chapter 6 Implementation of Application II 586.1 Design and Implementation of a High-speed Transport Protocol . . . . . . . . . . . . . 59
6.1.1 Design Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596.1.3 FRTP Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.1.4 An Implementation of FRTP protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686.1.5 LAN Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736.1.6 Summary of FRTP implementation and experiments . . . . . . . . . . . . . . . . . . 82
6.2 Routing Decision Module Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3 Signaling Module Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.4 Local-area Testbed Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.4.1 Local-area Testbed Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876.4.2 Extension with VLAN Technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Chapter 7 Conclusions and Future Research 927.1 Summary and Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.2.1 Extension to Multi-protocol Interworking . . . . . . . . . . . . . . . . . . . . . . . . . . 967.2.2 Wide-area Testbed Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987.2.3 Call Scheduling in RESCUE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997.2.4 Router Disconnect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Bibliography 104
List of Figures
Figure 1. Towards Advancing the Value of Optical Networks . . . . . . . . . . . . . . . . . . . . . . 1
Figure 2. Current Optical Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Figure 3. Partial Topology of Polytechnic University’s Data Network . . . . . . . . . . . . . . . 5
Figure 4. One Sample Point for Total Usage of Polytechnic University Campus Access
Link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Figure 5. SONET Multiservice Provisioning Platform (MSPP) Architecture . . . . . . . . . 22
Figure 6. Configuration of End Hosts for RESCUE Service . . . . . . . . . . . . . . . . . . . . . . 22
Figure 7. The RESCUE Concept: Share optical network circuit resources on a call-by-call
basis and create high-speed Ethernet/SONET circuits on-demand; lines with
arrow-heads denote signaling messages; the dashed line denotes the dynamically
setup Ethernet/SONET circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Figure 8. RESCUE as an “Add-on” Service: the thick dashed lines show Ethernet/SONET
circuits set up on-demand between end hosts’ second NICs and routers, or
between the second NICs of two distant end hosts. In both cases, these become
alternative paths to the primary paths available through the hosts’ primary NICs.
25
Figure 9. Dial-Up Access Service Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Figure 10. An Extension of the Dial-Up Access Service . . . . . . . . . . . . . . . . . . . . . . . . . 32
Figure 11. Plot of equation (3) with a link rate of 1Gbps, , . . 37
Figure 12. Plot of per-circuit utilization for files in the range of (10KB, 50MB) with
=0.00001 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Figure 13. Use of RESCUE Circuits for End-to-end File Transfers. . . . . . . . . . . . . . . . . 44
Figure 14. Plot of equation (6) with , , 48
Figure 15. Plot of equation (6) with , , . . 48
Figure 16. Plot of equation (6) with , , ,
ρsig ρsp 0.7= = k 4=
pdialup
rc r 100Mbps= = ρsig ρsp 0.7= = k 20=
rc r 1Gbps= = ρsig ρsp 0.7= = k 20=
rc 100Mbps= r 1Gbps= ρsig ρsp 0.7= =
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Figure 17. A Three-link Network Model of RESCUE Service . . . . . . . . . . . . . . . . . . . . 52
Figure 18. Plot of Total Utilization on Each Access Link and the Core Link . . . . . . . . . 55
Figure 19. An End Host Configured for RESCUE Service . . . . . . . . . . . . . . . . . . . . . . . 58
Figure 20. The Model of FRTP Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Figure 21. Packet Formats in FRTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Figure 22. The Parameter-Exchange Packet in FRTP . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Figure 23. Data Sending/receiving Procedure in FRTP . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Figure 24. Feedback Checking and Processing at the FRTP Sender . . . . . . . . . . . . . . . . 71
Figure 25. Feedback Sending at the FRTP Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Figure 26. Packet-Loss Rates and Throughputs vs. the Sending Rate in FRTP Experiments
(DATA Packet Size=1500B, UDP Buffer Size=256KB, FRTP Buffer
Size=40MB, FRTP Data Block Size=8MB) . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Figure 27. An Example of Inter-packet Transmission Times within a FRTP File Transfer
(Sending Rate=50Mbps, DATA Packet Size=1500B, UDP Buffer Size=256KB,
FRTP Buffer Size=40MB, FRTP Data Block Size=8MB) . . . . . . . . . . . . . . . 76
Figure 28. CPU Utilization vs. the Sending Rate in FRTP Experiments (DATA Packet
Size=1500B, UDP Buffer Size=256KB, FRTP Buffer Size=40MB, FRTP Data
Block Size=8MB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Figure 29. Packet-Loss Rates and Throughputs vs. UDP Buffer Size in FRTP Experiments
(DATA Packet Size=1500B, Sending Rate=500Mbps, FRTP Buffer
Size=40MB, FRTP Data Block Size=8MB) . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Figure 30. Packet-Loss Rates and Throughputs vs. FRTP buffer size in FRTP Experiments
(DATA Packet Size=1500B, UDP Buffer Size=256KB, Sending
Rate=500Mbps, FRTP Data Block Size=8MB). . . . . . . . . . . . . . . . . . . . . . . . 80
Figure 31. Packet Losses and Throughputs vs. DATA Packet Size in FRTP Experiments
(MTU=1500B, UDP Buffer Size=256KB, FRTP Buffer Size=40MB, Sending
Rate=500Mbps) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Figure 32. Static Routing Decision Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Figure 33. Local-area Testbed Network Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . 87
Figure 34. RESCUE Circuit Extension with VLAN Technique. . . . . . . . . . . . . . . . . . . . 90
k 20=
Figure 35. A Representation of Networks Differentiated by Signaling Capabilities . . . . 97
Figure 36. Configuration of Wide-area Testbed Network . . . . . . . . . . . . . . . . . . . . . . . . 98
Figure 37. The Concept of Router Disconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Acronyms
AAA: Authentication, Authorization and Accounting
ACK: Acknowledgement
AIMD: Additive Increase Multiplicative Decrease
ARQ: Automatic-Repeat-reQuest
ASIC: Application-Specific Integrated Circuit
ASON: Automatic Switched Optical Network
CLI: Command-Line Interface
DiffServ: Differentiated Service
DMA: Direct Memory Access
DNS: Domain Name Service
EoS: Ethernet-over-SONET
FRTP: Fixed Rate Transport Protocol
FTP: File Transfer Protocol
GbE: Gigabit Ethernet
GFP: Generic Framing Procedure
GMPLS: Generalized MultiProtocol Label Switching
HDBP: High-Delay-Bandwidth-Product
HTTP: Hyper-Text Transfer Protocol
IntServ: Integrated Service
ISP: Internet Service Provider
LAN: Local-Area Network
LSP: Label-Switched Path
MAN: Metro-Area Network
MEMS: MicroElectroMechanical System
MPLS: Multi-Protocol Label Switching
MSPP: Multi-Service Provisioning Platform
MTU: Maximum Transmission Unit
NAK: Negative Acknowledgement
NIC: Network Interface Card
OADM: Optical Add/Drop Multiplexers
OCS: Optical Connectivity Service
OFC: Optical Fiber Communications
OGSI: Open Grid Services Infrastructure
OXC: Optical Crossconnects
QoS: Quality of Service
RESCUE: Reconfigurable Ethernet/SONET Circuits for End Users
RPR: Resilient Packet Ring
RSVP-TE: Resource ReSerVation Protocol with Traffic Engineering
RTT: Round-Trip Time
SRP: Spatial Reuse Protocol
ST: Scheduled Transfer
TCP: Transmission Control Protocol
TL1: Transaction Language 1
UCLP: User Controlled LightPath
UDP: User Datagram Protocol
UNI: User-Network Interface
VBLS: Varying-Bandwidth List Scheduling
VBR: Variable Bit-Rate
VC: Virtual Concatenation
VLAN: Virtual LAN
WAN: Wide-Area Network
WAND: Waikato Applied Network Dynamics
XC: CrossConnect
XCP: eXplicit Control Protocol
1
Chapter 1 Background and Problem Statement
New inventions in optical communications components are driving advances in net-
working architectures and protocols. Advances in networking architectures and protocols
are enabling new applications and bringing more requirements for optical components.
New applications, in turn, are motivating new inventions in optical communications com-
ponents. Figure 1 shows how these three factors interact.
1.1 Current Optical Switching Technologies
The latest developments in optical switching components include programmable tun-
able transmitters/receivers, Optical Add/Drop Multiplexers (OADMs), Optical CrossCon-
nects (OXC), and all-optical switches [1]. Tunable transmitters/receivers either have
lasers whose output/input wavelength can be tuned as needed, or an array of lasers with
different wavelengths that can be selectively enabled. OADMs are programmable if they
can be configured to add or drop different wavelengths at different interfaces. The whole
multi-channel signal does not need to be demultiplexed in an OADM, unlike in an optical
crossconnect, where multiple fibers, each carrying multiple channels are first terminated
Figure 1. Towards Advancing the Value of Optical Networks
Opticalcommunicationscomponents
Networkingartchitectures &protocols
Applications
2
on demultiplexers before being crossconnected in a space-division switch fabric [2]. All-
optical switches are analog switches, where both the I/O modules and the switch fabric are
optical. It benefits from scalability, bit-rate and protocol independence, and power effi-
ciency [3]. Large-scale all-optical circuit switches using MicroElectroMechanical System
(MEMS) technology are now commercially available.
Current-day optical circuit-switched networks are built using above optical switching
components. While commercial interest in all-optical networks is increasing, current opti-
cal networks are still hybrid optical/electronic networks, which consist of optical compo-
nents as well as electronic components. For example, SONET/SDH technology dominates
current wide-area optical transport networks. It defines the standards for carrying TDM
signals with different data rates using both electronic and optical media [4][5]. In this dis-
sertation we focus on developing new architectures and applications for circuit-switched
networks using SONET/SDH technology. However, our basic networking concepts can be
readily adapted to future all-optical WDM circuit-switched networks as these become
deployed.
1.2 Current Optical Network Architectures and Applications
A representation of current optical network architecture is shown in Figure 2 [6]. The
metro access network interconnects multiple geographically-distributed enterprises build-
ings, and connects them to an access service provider MSPP/crossconnect node. The
access service provider node also belongs to a metro optical core network that intercon-
nects multiple service provider nodes, such as Internet Service Provider (ISP) and tele-
phone service provider nodes. Metro optical access networks and metro optical core
networks can have either a ring or mesh topology. Packet-switched wide-area networks,
3
such as the Internet, then interconnect various metro optical networks by interconnecting
various service provider nodes and offer enterprises Wide-Area Network (WAN) connec-
tivity.
One typical service provided by existing optical networks is access service for enter-
prise users. Leased access circuits at SONET/SDH rates are provisioned from enterprise
MSPPs through the access service provider node to the ISP node located on the metro core
network. Embedded within these SONET/SDH rate signals are T1s, T3s, or Ethernet sig-
nals. Another typical service is the provisioning of inter-switch/inter-router circuits, which
provide high-speed connections between switches/routers.
These two kinds of optical circuits are usually leased for long terms, and therefore lack
flexibility. To enable “simple, cost-effective, and bandwidth-efficient” services [7], many
approaches are being implemented to allow optical networks to be operated in a switched
mode, including: (i) web services such as Open Grid Services Infrastructure (OGSI) [8]
[9] and Sun MicroSystem's Jini/JavaSpaces paradigm for high-availability distributed ser-
vices [10], and (ii) signaling solutions as in Generalized MultiProtocol Label Switching
(GMPLS) [11]-[13], User-Network Interface (UNI) 1.0 specification [14], and Automatic
Figure 2. Current Optical Network Architecture
Ethernet switch/IP router
MSPP
Enterprise building
Ethernethosts
Access service providerMSPP/crossconnectn
MSPP
Metro opticalaccess network
Internet serviceprovider router
Internet - Packet Switched backbone network(IP routers interconnecting various networks)
Metro opticalcore network
Leased lines
Inter-switchcircuitsWide-area
optical network
4
Switched Optical Networks (ASON) [15]. Both GMPLS and UNI 1.0 specifications
include a signaling protocol based on the Resource ReSerVation Protocol with Traffic
Engineering (RSVP-TE) [13].
The main applications envisioned for bandwidth-on-demand optical circuit services are
fast restoration and rapid provisioning of circuits between IP routers, frame-relay
switches, or crossconnects/telephony switches. A request for fast restoration is triggered
when a failure occurs. Requests for rapid provisioning of circuits are expected to be gener-
ated by network administrators when they identify a need for additional bandwidth
between their network switches/routers. Focus has been directed primarily at inter-switch/
inter-router circuits in service provider networks because traditionally these links are the
ones that require the high-bandwidth capability of optical circuits.
1.3 Gaps between User Needs and Current Network Solutions
The applications/services in current optical networks, including long-term enterprise
leased access circuits and service providers’ bandwidth-on-demand circuits, are essen-
tially working for inter-switch/inter-router connections. However, by extending our atten-
tion to the enterprise end hosts (note: not residential), we identify three gaps between user
needs and current network solutions.
1.3.1 Access Link Bottleneck Problem
An end-to-end data communication path for an enterprise user typically consists of three
types of segments: (i) Local Area Network (LAN) segments within enterprise buildings,
(ii) access segments from enterprise buildings to service provider buildings, and (iii) wide-
area segments cross WANs. In recent years LAN and WAN network segments have expe-
rienced a tremendous increase in data rates: links in LANs are evolving from 10/100 Base-
5
T Ethernet to Gigabit Ethernet (GbE) and even 10GbE, and links in service provider wide-
area backbone networks are evolving to OC48 (2.5Gbps), OC192 (10Gbps), or even
OC768 (40Gbps) [6]. At the same time, the evolution of data rates in access networks has
been slow, with the capacity still limited to data rates on the order of a few megabits per
second [16][17] due to the high costs of leased circuits. This is because leased access cir-
cuits are, by definition, not shared, which translates to high costs. These leased access cir-
cuits, as we will show, are usually heavily utilized. They are identified to be the bottleneck
links on end-to-end communication paths.
To determine if access circuits are indeed a bottleneck, we conducted a measurement
study on the T3 access link that connects Polytechnic University’s data network to the
Internet. We show a partial topology of Polytechnic University’s data network in Figure 3
with two of its buildings, Dibner and Rogers Hall. Each building consists of a number of
subnets. Each subnet has an Ethernet switch. The Ethernet switches from various floors
Figure 3. Partial Topology of Polytechnic University’s Data Network
Dibner Building Rogers Hall Building
Accessrouter
GigabitEthernet
T3 access linkFrom otherbuildings/campuses
Corerouter 1 KuidasLAN 1
Corerouter 2
Ethernetswitch
Ethernetswitch
Ethernetswitch
Ethernetswitch
Ethernetswitch
Ethernetswitch
6
are connected to Core routers located in the basements of the two buildings as shown in
Figure 3. All traffic from Dibner and other buildings, as well as from Roger Hall hosts,
destined for the WAN feed into the Core router (Core router 2 in Figure 3) in the Rogers
Hall basement. This router forwards all these packets to an access router via a 100Mbps
Ethernet link. Similarly, in the opposite direction, all packets arriving from the WAN pass
through the access router and the Rogers Hall Core router.
The access router has only two interfaces, the 100Mbps interface and the T3 link to the
WAN. This makes the 100Mbps Ethernet interface on the Access router an ideal snooping
point for access link traffic. We connected a Sun Sparc4 workstation called “kuidas” to
this 100Mbps Ethernet LAN (LAN 1 in Figure 3). Its Network Interface Card (NIC) was
set to operate in a promiscuous mode allowing it to capture all packets on the link. Consid-
ering the difference in data rates between the 100Mbps Ethernet link and the T3 access
link, our measurements could be slightly higher for the outgoing traffic* but accurate in
the incoming direction.
Running tcpdump on the workstation, we captured all packets involving wide-area com-
munications, but because of storage and privacy considerations, we only saved the first 68
bytes of all packets. This was adequate to capture the protocol layer headers for IP/ICMP,
IP/TCP and IP/UDP packets. We stored these packet headers in raw trace files to be
reduced later. We created one trace file for approximately 30 minutes of traffic to avoid
huge file sizes. A 30-minute period produced approximately 2 GB of trace data. Those
trace files were downloaded during light traffic hours to another machine with a larger
storage capacity than kuidas. We have collected a total of 37 trace files. The total volume
*Partial traffic might be dropped at the access router when the total outgoing traffic exceeds the 45Mbps of the T3 accesslink.
7
of these IP packet header traces is more than 80 Gigabytes (more than 1 Giga IP packet
headers). This database is available for other researchers in [18].
We wrote C++ programs to analyze the data. Our first goal was to determine the average
usage on the access link from the tcpdump trace files. We show a 10-minute-long sample
of total bandwidth usage on the access link in Figure 4 for the outgoing and incoming
directions. The time precision of these time-bandwidth curves is 1 second. Table 1 shows
the average value and 90% confidence interval of the bandwidth usage in the two direc-
tions (incoming and outgoing) for ten of the 37 trace files collected between April 24,
2002 and May 14, 2002 (part of the Spring 2002 semester, which allowed for peak mea-
surements).
The average bandwidth usage in both directions is close to 30Mbps or higher, some-
times even reaching close to 45Mbps. In the outgoing direction, we show an average of
Figure 4. One Sample Point for Total Usage of Polytechnic University Campus Access Link
8
49.6Mbps for the April 24, 16:01 trace. This is because we captured packets on the
100Mbps Ethernet link (see Figure 3) and not on the T3 link itself. These high loading
conditions indicate that Polytechnic University’s access link is heavily utilized and is
potentially the bottleneck link on some end-to-end paths. We also note that the data col-
lected by the Waikato Applied Network Dynamics (WAND) group on the University of
Auckland access link showed similar results [19].
1.3.2 TCP Limitations
Even though the access link bottleneck problem can be partially solved by increasing
the data rate of leased access circuits or other methods, the high-speed end-to-end commu-
nication is still a big challenge given the limitations of TCP protocol. TCP is widely used
over the Internet but is suboptimal for High-Delay-Bandwidth-Product (HDBP) networks
because of its slow start and Additive Increase Multiplicative Decrease (AIMD) conges-
tion control scheme [20]-[31]. First, TCP uses congestion window, or CWND, to deter-
mine how many packets can be sent at one time. The larger the congestion window size,
the higher the throughput. To achieve a steady-state throughput of 10 Gbps, a standard
TCP connection with a 100ms round-trip time requires an average congestion window of
Table 1. Bandwidth Usage on Polytechnic University’s T3 Access Link (Spring 2002)
Trace start time Trace length (seconds)
Average incoming bandwidth usage
(Mbps)
Average outgoing bandwidth usage
(Mbps)Apr. 24, 13:07 647 31.47 40.09Apr. 24, 13:50 658 30.2 39.8Apr. 24, 16:01 317 36.1 49.63Apr. 24, 17:01 710 30.17 33.62May. 13, 13:15 634 34.09 38.14May. 13, 14:29 582 42.43 36.12May. 13, 15:55 640 28.9 36.36May. 14, 13:25 685 29.43 30.71May. 14, 17:01 885 23.45 21.9
9
83,333 segments and a packet-loss rate lower than [22]. This is not realistic.
Second, TCP uses the same protocol irrespective of the end-to-end paths [32], which
results in a poor performance in wide-area environments [30]. This is because TCP send-
ers do not adapt their rate buildup scheme based on the features of the end-to-end path and
hence will incur an initial delay before the congestion window reaches a “streaming” state.
Third, bit errors are possible even on optical circuits; while optical fiber bit error rates are
very low, dust and poor connectors can increase these rates to the level. Bit errors
will be misinterpreted by the TCP sender as congestion signals, which unnecessarily
reduces the sending rate and requires a long time to recover.
These limitations make current IP-based networks, where TCP is the mostly used trans-
port protocol, hard to meet the bandwidth requirements of large file transfers, such as ter-
abyte and petabyte sized file transfers in particle physics, earth observation,
bioinformatics, radio astronomy, and other scientific studies [33].
1.3.3 Difficulty in Creating End-to-end Connections to Meet Delay/Jitter
Requirements of Interactive Real-time Applications
In addition to the bandwidth requirements, many interactive real-time applications have
strict Quality of Service (QoS) requirements for transfer delay and jitter, such as distrib-
uted collaborative visualization, remote computational steering, and/or remote instrument
control [34]. Given the connectionless nature of today’s Internet, it is hard to meet the
delay/jitter requirements of interactive real-time applications.
1.4 Problem Statement
Therefore, the problem statement of this work is as follows: Design new network archi-
tectures and solutions by exploiting and improving already-deployed circuit-switched net-
2 10 10–×
10 8–
10
works to bridge the gaps between user needs and network limitations. More specifically,
our work consists of three objectives:
1. Design a new network solution to solve the enterprise access bottleneck problem. The
design criterion is to successfully enable a new application providing enterprise end
users with high-speed Internet access paths, which allow applications to enjoy lower
packet-loss rates on the access path.
2. Design a new network solution to overcome TCP limitations in end-to-end data com-
munications. The design criterion is to successfully provide enterprise end hosts with
high-speed end-to-end connectivity and allow end hosts to enjoy a much better data-
transfer throughput than with current TCP/IP.
3. In addition to achieving high data-transfer throughput, we are designing our new net-
work solutions to overcome the difficulty of providing end-to-end QoS guarantees in
today’s Internet. The design criterion is to successfully provide end-to-end QoS guar-
antees, such as rate guarantee and delay/jitter guarantee to meet the requirements of
applications.
Given the difficulty in meeting above network design criteria with IP-based networks
(as we will show in the next Chapter), we develop our solution based on circuit-switched
networks. We take into account two major constraints in current optical networks while
developing our solution. First, SONET technology dominates current wide-area optical
transport networks. Originally designed for the public telephone network, SONET was
developed in the mid-1980s into a standard for optical telecommunications transport. It
defines a technology for carrying many signals of different capacities through a synchro-
nous, flexible, optical hierarchy. In recently years, advances in application-specific inte-
11
grated circuit (ASIC) technology has helped reduce the price of SONET switches greatly.
Meanwhile, a new set of enhancements have been proposed to drive SONET’s evolution
toward increased efficiency and flexibility in carrying data signals. As a result, the
SONET switch equipped with signaling engine and advanced transport capability began to
be deployed within enterprises networks. By noting that a SONET-based circuit-switched
network is an ideal platform to provide connection-oriented service, the question for us is
how to enable new network services by exploiting and improving the already-deployed
SONET network infrastructure. Second, Ethernet technology dominates current LAN
environments due to its low costs. This leaves us with another question of how to leverage
this Ethernet dominance to make our circuit-switched network solution more feasible and
easier to implement.
In this dissertation, we address three generic problems in creating a new network solu-
tion:
1. What is the mechanism for sharing the network resources? In circuit-switched net-
works, network resources are represented by circuits. Thus, our first mission is to
address the question of how to provision and share circuits by considering network
scalability and utilization as criteria.
2. What is the mechanism for transferring the data over the network? In other words,
what transport protocol should be used on end-to-end circuits? On an end-to-end cir-
cuit, contention for resources is resolved during circuit setup on a call-by-call basis.
Once the circuit is established, the full circuit bandwidth is dedicated for the session.
No congestion control scheme is needed to adjust the sending rate on the circuit.
Sending rates should in fact be matched as closely as possible to the circuit rates to
12
keep the pipe full. Therefore, our second mission is to design and implement a trans-
port protocol that can maintain a constant transfer rate.
3. What are the prospective applications? A good candidate should be one that can pro-
duce a high traffic load and fully utilize the circuit bandwidth. From the service pro-
vider’s perspective, the applications should be as common as possible to produce a
high traffic load and therefore achieve high network utilization, which translates to
low costs. From the user’s perspective, the application suited for dedicated circuits
should be able to fully utilize the circuit bandwidth and therefore achieve a low trans-
fer delay. In this dissertation, we propose two prospective applications and provide
corresponding numerical analysis on their performance.
13
Chapter 2 Related Work
2.1 Related Work in the Packet-Switched Networking Community
2.1.1 Related Work to Address the Access Link Bottleneck Problem
To address the first gap identified in Section 1.3, the most common solution is to
increase the data rate of the wide-area access links leased by the enterprises. The deploy of
optical fibers and ADMs has made such increases easier to implement. Nevertheless, costs
for high-bandwidth links remain quite high. For example, consider average costs for edu-
cational sites cited by NetworkVirginia [35]: a T3 link (45Mbps) has an annual cost of
about $53K, while an OC3 (155Mbps) has an annual cost of $133K. Reference [36] lists
annual costs of about $110K for an OC3, $320K for OC12, and $495K for an OC48 cir-
cuit. This is because leased links are, by definition, not shared, which translates to high
costs. Also, it is difficult for an enterprise to obtain leased-link bandwidth “on-demand”
(i.e., with short turn-around times).
Another solution to the access link bottleneck problem, currently under development, is
the Resilient Packet Ring (RPR) [37], one example of which is Cisco’s Spatial Reuse Pro-
tocol (SRP) [38]. The motivation for creating this new protocol is packet-switched rings
will be better suited for bursty Internet data traffic than the current circuit-switched
SONET/SDH/WDM rings. Due to the increased level of sharing possible on these packet-
switched rings, access link costs for high-speed access are likely to be lower than with
leased circuits. Therefore, the enterprises can more readily increase their WAN access cir-
14
cuit rates.
In these two approaches, while access link rates to an enterprise can be increased, the
more significant factor of packet losses on an individual flow is not addressed because
both operate in a packet-by-packet sharing mode. We show with the analysis below that
increasing the access link rate does not help if the packet-loss rate remains high, which can
happen because users often quickly fill up link capacity even as this is increased.
To study the impact of packet loss on end-to-end TCP delays, we use the analytical
model proposed by Padhye et al. [39], along with the extensions by Cardwell et al. [40],
which have been validated with both experimental and simulation results. These models
include all the complex steps of TCP data transfers: the time spent in slow start ,
the expected cost of a recovery following the first loss , the time spent in conges-
tion avoidance , and the time to delay the acknowledgement (ACK) for the initial
segment :
(1)
The reader is referred to [39][40] for the detailed closed-form expressions for each term
on the right hand side of (1). These expressions are functions of three key parameters: the
bottleneck link rate , the packet-loss rate , and the round-trip time (RTT) on the
TCP/IP path. We set the time for delayed ACKs to 0 because we assume a starting initial
window size of 2 [41] and the ACK-every-other-segment strategy.
Setting different numerical values for these three parameters as shown in Table 2 below,
we evaluate using (1) and the expressions provided in [40]. We compute
from the round-trip propagation delay by adding a rough estimate of packet queue-
ing delay using as a parameter, along with packet emission time. The values
E Tss[ ]
E Tloss[ ]
E Tca[ ]
E Tdelack[ ]
E Ttcp[ ] E Tss[ ] E Tloss[ ] E Tca[ ] E Tdelack[ ]+ + +=
r Ploss
E Ttcp[ ] RTT
Tprop
Ploss RTT
15
presented can be thought of as the input parameters themselves with the low values
representative of metro-area paths (0.1-1ms), and higher values (50ms) representative of
wide-area paths. The bottleneck link rate and is used to compute ,
which is the size of the congestion window at which the TCP flow reaches a “streaming”
state. When the congestion window reaches , any further increase is irrelevant
because the sender does not even complete (or just about completes) emitting its current
congestion window before ACKs that increase the congestion window are received.
From the numerical results presented in Table 2, we note the following. First, as
increases, the mean transfer delay increases significantly if is high. For example,
when increases from (lightly-loaded path) to 0.001, assuming a bottleneck link
rate of 45Mbps (as in Polytechnic University’s access link), delay goes up from 19.219sec
Table 2. Mean TCP Transfer Delays for a 100MB File
Input parameters Intermediate derived results Final results
r (Mbps) Tprop (ms)Queuing delay
plus service time (ms)
RTT (ms)Wmax
(segments) for
a 100MB file (s)
450.1
0.398ms0.498 1.868 18.261
5 5.398 20.242 18.29650 50.398 188.993 19.219
1000.1
0.179ms0.279 2.325 8.220
5 5.179 43.158 8.26350 50.179 418.158 10.010
10000.1
0.018ms0.118 9.833 0.822
5 5.018 418.167 1.00150 50.018 4168.167 4.925
0.01
450.1
0.576ms0.676 2.535 18.395
5 5.576 20.91 20.85950 50.58 189.66 129.13
1000.1
0.38ms0.359 2.992 8.292
5 5.259 43.825 13.50150 50.26 418.825 128.276
10000.1
0.038ms0.126 10.5 0.860
5 5.026 418.833 12.82850 50.02 4168.833 127.682
RTT
r RTT Wmax RTT r×=
Wmax
p E Ttcp[ ]
10 5–
p
Tprop
p 10 5–
16
for a 100MB transfer to over 2 minutes (129.13sec) if the transfer is on a wide area (say
across the USA where is 50ms). This means for wide-area paths, it is especially
important to maintain a low . Second, we note that for such wide-area paths, there is lit-
tle benefit to be gained by increasing the bottleneck link rate if stays the same. For
example, increasing the bottleneck link rate from 100Mbps to 1Gbps results in decreasing
the mean file-transfer delay from 128.276s to 127.682sec, when is 50ms and is
0.001. The implication of this result is that even if access link rates are increased, if the
link sharing mode is still packet-by-packet “socialistic” sharing, then the link bandwidth
could be filled up with traffic from other users (especially in university access links with
students being heavy users of the Internet) causing to stay the same. Therefore, we con-
clude that it is more important to drop than it is to increase access link rate beyond a cer-
tain level.
2.1.2 Related Work to Address the TCP Limitations
To address the second gap of TCP limitations, many researchers are proposing enhance-
ments to TCP’s congestion control [22]-[24] and/or flow control [25]-[27]. These
enhancements proposed upgrades of TCP congestion control algorithm at end hosts to bet-
ter fit HDBP environments. Not requiring router upgrades makes them easier to imple-
ment. On the other hand, solutions requiring upgrades to routers have also been proposed,
such as eXplicit Control Protocol (XCP) and Jumbo Frame. XCP is a feedback-based con-
gestion control system that uses direct, explicit, feedback from routers to avoid congestion
in the network [28], while Jumbo Frame was proposed to use a larger Maximum Trans-
mission Unit (MTU) in both end hosts and routers [29].
These enhancements are essentially designed to achieve high end-to-end throughput in
Tprop
p
p
Tprop p
p
p
17
future high-capacity Internet. However, they did not change the assumption of a shared
connectionless packet-switched Internet and still take network utilization and fairness as
the first-priority consideration. Like standard TCP, these enhancements use congestion-
control mechanism to adjust sending rates during data transfers based on congestion lev-
els. The fairness is achieved by sharing the network resources in a “socialistic” manner
among all data flows. They lack the capability to provide end-to-end QoS, such as band-
width guarantee and delay guarantee, for end users. Therefore it is hard to implement “pay
more, get more” service in IP-based networks.
2.1.3 Related Work to Address the Difficulty in Providing End-to-end QoS
To address the third gap, people made efforts to add connection-oriented characteristic
into IP-based network. Solutions consist of Integrated Service (IntServ) [42] and Differen-
tiated Service (DiffServ) [43]. IntServ provides end-to-end per-flow QoS by making
resource reservation end-to-end through RSVP signaling. However, the scalability prob-
lem due to the complexity of per-flow reservation and per-packet handling makes IntServ
inapplicable when the number of flows is large. DiffServ, instead, provides differentiated
QoS for a small number of classes by maintaining a separate queue for each class of
aggregate traffic. However, it lacks the capability to provide per-flow and end-to-end QoS.
Another solution of setting up high-speed circuits for end user sessions has been exam-
ined in a proposal called TCP switching [44]. The concept is to classify TCP flows at IP
routers and initiate requests for dynamic circuit setup for individual TCP flows through
optical circuit-switched networks. Advances in network processor technology have been
targeted at enabling high-speed flow classification needed to trigger circuit setup/release.
Nevertheless these approaches have remained difficult to realize in practice because of
18
scalability reasons.
2.2 Related Work in the Circuit-Switched Networking Community
Circuit-switched networking is ideal to provide “pay more, get more” service and end-
to-end QoS because of its connection-oriented nature. Increasingly a number of optical
circuit-oriented testbeds are being deployed, e.g., CANARIE's CA*net 4 [45], Starlight
[46], SURFnet [47], UKLight [48], etc. DOE's Ultranet [49] will include the ability to
offer end-to-end circuits. Some of these networks use all-optical switches with the granu-
larity of a circuit being a single wavelength, while others use hybrid electronic/optical
switches that provide sub-lambda granularity.
Most research efforts in these optical circuit-oriented testbeds are focused on how to
provision the circuits. For example, User Controlled Lightpaths (UCLP) project [50] sup-
ported by CANARIE network aims to provide user-controlled end-to-end optical circuits
to meet the QoS requirements. However, UCLP and other related projects use a central-
ized approach to provision circuits, in which the network inventory, topology, and routing
information are stored in a global database, and the circuit setup requests are processed by
a central management system. The complexity of such a centralized management system
makes fast provisioning hard to implement and limits the scalability of networks. Further-
more, proposed applications for these optical testbeds are limited to the very large data
transfers and other eScience applications within a small community instead of commodity
applications for a wide (Internet-scale) community. This results in small traffic loads, with
which it is hard to achieve high network utilization.
19
Chapter 3 Proposed RESCUE Service
To fill the three gaps identified in Section 1.3 between user needs and network solutions,
we propose an end-to-end optical networking solution called Reconfigurable Ethernet/
SONET Circuits for End Users (RESCUE). The concept is to provide end hosts with
high-speed, end-to-end circuit connectivity on a call-by-call shared basis, where a “cir-
cuit” consists of Ethernet segments at the ends that are mapped into Ethernet-over-SONET
long-distance circuits.
At first glance it appears that to extend the services of optical networks to end hosts, we
somehow need to extend the reach of optical networks all the way to desktops. Attempts to
extend optical networks to the desktop were made in mid-to-late nineties with ATM-to-
the-desktop projects, most of which failed. However, three major advances have occurred
since the late nineties that allow us to implement a solution for extending optical network
services to end hosts without actually dropping fiber to desktops.
First, optical fiber has been deployed extensively within both Metro-Area Networks
(MANs) and enterprise/university campuses. Second, Fast Ethernet and GbE technologies
have been deployed at end hosts using existing twisted-pair copper wires and these end
host links are not bottlenecks on end-to-end paths. The bottleneck for enterprise users is
the enterprise access link rather than the drop to the desktop. Third, a new system called
Multi-Service Provisioning Platform (MSPP) has been defined, developed, and more
importantly, already deployed within enterprises. Among its functions, which we will
20
review in Section 3.1, MSPPs provide a means for crossconnecting Ethernet signals from
end hosts to equivalent Ethernet-over-SONET (EoS) signals on wide-area access links.
Currently, service providers such as Verizon [35] offer “Ethernet access services” to enter-
prises through MSPPs. These Ethernet/SONET circuits (i) are leased for long durations,
and (ii) originate/terminate at routers. We propose to (i) use these hybrid circuits in a
dynamic mode, and (ii) extend them to end hosts.
Leveraging these advances, we propose an optical network service called RESCUE.
With this service end hosts should be able to dynamically request reconfigurable Ethernet/
SONET circuits for durations as short as a few milliseconds. We describe factors that
enable RESCUE service in Section 3.1, and the basic RESCUE architecture and opera-
tions in Section 3.2. In Section 3.3, we describe an important aspect of our approach,
which is to use additional NICs at end hosts for the RESCUE service so that it is an “add-
on” service to basic Internet access rather than a replacement. Two types of applications
using RESCUE solution are addressed in Section 3.4.
3.1 Enabling Technologies
As stated earlier, MSPPs have already been deployed in enterprises. The primary reason
for this deployment is to integrate T1s from PBXs carrying voice traffic and T1s/T3s/
Ethernet signals from wide-area-access IP routers carrying data traffic on to a SONET/
SDH/WDM signal used for wide-area access (hence the term “multi-service”). For our
proposal, the multiplexing aspect of MSPPs is not relevant. Instead we exploit the ability
of MSPPs to encapsulate Ethernet frames into SONET frames using EoS specifications,
such as Generic Framing Procedure (GFP) [51], along with Virtual Concatenation (VC)
[52], a technique for allowing arbitrary-bandwidth SONET signals to be created to reduce
21
wasted bandwidth [53], e.g., a 100Mbps Ethernet signal can be carried on two OC1 cir-
cuits instead of an OC3 circuit.
The architecture of a typical MSPP [54]-[56] is shown in Figure 5. Nodes within an
enterprise are connected to interface cards, such as Ethernet (10Mbps/100Mbps), T1, T3,
and Gbps Ethernet. The Ethernet cards encapsulate Ethernet frames into SONET frames
using EoS devices. The CrossConnect (XC) card is used to crossconnect signals from
incoming ports to outgoing ports. The control card typically has a processor and imple-
ments management software to control the MSPP. Communication with the control card is
through its own Ethernet and/or serial interface. The wide-area access link card is a high-
rate SONET, SDH and/or WDM interface. Typically, Ethernet, T1, T3 signals from the
interface cards connected to nodes within the enterprise are crossconnected through the
XC card to equivalent-rate signals on the wide-area access SONET link.
Increasingly, optical crossconnects and MSPPs now implement control-plane signaling
protocols to enable the dynamic setup and release of optical circuits across the network.
For example, a signaling protocol interoperability test involving many vendors’ products
was demonstrated at the Optical Fiber Communications (OFC) Conference in March 2003
[57].
Figure 5. SONET Multiservice Provisioning Platform (MSPP) Architecture
Control OC12, OC48, OC192
Interface cards Ethernet T1 T3
.
.
....
GbE
Cables to the enterprise switches/routers/PBXs
SONET/SDH/WDMwide-area access link
.
.
(10/100)
XC:cross-connect card
Ethernet and/or serial interface
22
3.2 Architecture and Operations
Details of the end host configuration and the MSPP within an enterprise are shown in
Figure 6.
RESCUE hardware configuration requires:
1. Second Ethernet NICs in end hosts, which are connected to the Ethernet ports of a
signaling-capable enterprise MSPP.
2. A high-speed optical circuit with multiple channels should be leased from the enter-
prise MSPP to either a wide-area signaling-capable network switch or another signal-
ing-capable enterprise/service-provider MSPP.
3. Software enhancement is needed at end hosts to generate call setup/release requests
for applications that can benefit from high-bandwidth RESCUE service. The details
of end host RESCUE software will be discussed in Chapter 6.
RESCUE operation is as follows. RESCUE circuit consists of multiple channels and
Figure 6. Configuration of End Hosts for RESCUE Service
Optical circuit-switched networkEthernetswitch/IP router
Ethernethosts
RESCUE circuit withmultiple channels
NIC 2 NIC 1
Enterprise building
To ISP's router
To ISP's router oranother signaling-capable
network switch
MSPP EthernetInterface
From other endhosts
SONETInterface
Primary Internetleased access circuit
Application +RESCUE software
OS
23
allows end hosts sharing these channels on a call-by-call basis. A call setup request for a
RESCUE Ethernet/SONET circuit is generated by end-host software. This request is
received by the control software on the enterprise MSPP to which the requesting-host’s
second NIC is connected. The control software locates a free equivalent-rate SONET cir-
cuit on its access circuit and crossconnects the Ethernet signal from the requesting end
host to this SONET circuit. The enterprise MSPP’s control software then forwards the
call-setup request to the next switch on the path. Circuit setup proceeds hop-by-hop in this
manner. Once setup, the circuit is held and used for a short duration, and then released
using a similar hop-by-hop circuit release procedure. Subsequently a different communi-
cation session can reuse the same resources. The advantage of this dynamic, distributed
circuit-provisioning approach with signaling is apparent comparing to the traditional cen-
tralized approach using management systems. It is scalable given the much less processing
complexity at each hop. The superfast-provisioning becomes possible by implementing
hardware-accelerated signaling [58] at circuit switches, which is a key to achieve high net-
work utilization.
Figure 7 illustrates an example of circuit setup in metro-area networks. An end host in
enterprise 1 requests an Ethernet/SONET circuit to a router in service provider M’s net-
work. Call setup proceeds hop-by-hop with the signaling messages (RSVP-TE Path mes-
sage in the forward direction and RSVP-TE Resv message in the opposite direction) being
processed at each intermediate node. If resources are available on links L1, L2, L4, and
L6, the MSPPs and optical crossconnects en route will be programmed for the circuit (the
dashed line represents the dynamically setup circuit). RESCUE Ethernet/SONET circuits
can be set up from a host to a router/switch or another host.
24
RESCUE service can be introduced gradually by interconnecting signaling-capable
switches/MSPPs via leased circuits. For example, two buildings of a single organization
located within a metro area may lease a multi-channel circuit between the buildings but
share these channels on a call-by-call basis. In this scenario, only the enterprise MSPPs
would need to be signaling-capable.
3.3 RESCUE as an “Add-on” Service to Primary Internet Access
We illustrate how RESCUE service is configured as an “add-on” service to primary
Internet access in Figure 8. The primary NICs in end hosts are connected through the usual
LAN Ethernet switches/IP routers to the enterprise MSPP, which in turn is connected to an
Internet router by a leased circuit passing through the enterprise MSPP. For example, in
Figure 8, Leased circuit I is the primary Internet access link for enterprise building 1.
Hosts requiring access to RESCUE service will be equipped with second NICs as shown
in Figure 8. These second NICs are connected to ports on the enterprise MSPP’s Ethernet
Figure 7. The RESCUE Concept: Share optical network circuit resources on a call-by-callbasis and create high-speed Ethernet/SONET circuits on-demand; lines with arrow-heads denote signaling messages; the dashed line denotes the dynamically setupEthernet/SONET circuit
MSPP
Enterprise 1
MSPP MSPP
Hosts/Routers
MSPP
Metro-area optical circuit-switched network consisting ofelectronic/all-optical crossconnects/add-drop multiplexers
Enterprise N Service provider 1 (e.g., ISP) Service provider M
L1 L3L5
L4L2L6
Ethernet
SONET
Wide-areanetwork(WAN)
(1)
(2)
(3) (4)
(9)(10)
(11)
(12)(6)
(5)(8)
(7)
(1)-(6): Path message(7)-(12): Resv message
L7
EndHosts
EndHosts
Hosts/Routers
25
interface card allowing them to be crossconnected on-demand to equivalent EoS circuits
at the MSPP. For communication between two entities that can be connected by a direct
EoS circuit, there is a choice of two paths: the primary TCP/IP path and an Ethernet/
SONET circuit. For example, an end host in enterprise building I with a second Ethernet
NIC configured for RESCUE service has two paths to Router I in Figure 8: (i) the primary
leased circuit I reachable through its primary NIC, enterprise Ethernet switches/IP routers,
and its MSPP (see solid line marked “Leased circuit I”), and (ii) an on-demand Ethernet/
SONET circuit through its second NICs and MSPP (see dashed line from the MSPP in
enterprise building I to Router I).
The presence of two such paths raises the question of which path an end-host applica-
Internet - Packet Switched (IP routersinterconnecting various networks)
MSPP
Ethernethosts
T1/T3/Ethernet
Ethernet switches/IP routers
MSPPEnterprise building I
T1/T3/Ethernet
Enterprise building II
Optical
of SONET/SDHWDM Add/DropMultiplexers (ADMs),
.........
Enterprise/MDU
Figure 8. RESCUE as an “Add-on” Service: the thick dashed lines show Ethernet/SONET cir-cuits set up on-demand between end hosts’ second NICs and routers, or between the sec-ond NICs of two distant end hosts. In both cases, these become alternative paths to theprimary paths available through the hosts’ primary NICs.
Leasedcircuit II
Leasedcircuit I
Ethernet over SONET (EoS) circuit
networks consisting
crossconnects
building
Packet-switchednetworks
(MPLS,FrameRelay,ATM,etc.)
Differenttypes ofaccess networks(PDH,CATV, wireless,FTTH,etc.)
Enterprises and homes
Primary NICs
Second NICs Router I
NIC: Network Interface Card
circuit-switched
Set 1Set 2 Set 3
requested on-demand
Ethernet over SONET (EoS) circuitrequested on-demand
Ethernethosts
Ethernet switches/IP routers
26
tion should choose. We recognize that it is not appropriate to attempt a circuit setup for all
communication sessions. For example, for a small-file transfer (file size is on the order of
a few KB), the total delay incurred in setting up a circuit and then transferring the file
could be larger than the delay incurred in directly using the TCP/IP path. Thus, a routing
decision needs to be made at end hosts with access to RESCUE. We provide a thorough
analysis for the routing decision in Chapter 4 and 5, and the details of routing decision
implementation in Chapter 6.
Having the option to fall back to the primary TCP/IP path allows for a RESCUE service
provider to operate the circuit-switched network at a high enough call-blocking probabil-
ity to achieve satisfactory utilization. As is well known, resource utilization and call-
blocking probability operate at cross purposes. Without the option of falling back to the
primary TCP/IP path, the circuit-switched network would need to be engineered to operate
at a low call-blocking probability at the expense of utilization. It would make it more diffi-
cult to achieve “cost-effective, bandwidth-efficient” optical networks. The presence of the
dual path also allows applications to take advantage of both paths during a file transfer as
will be explained in Chapter 5.
The RESCUE concept and its applications are novel in three ways. First, the RESCUE
architecture is a “parallel-hybrid” solution in contrast to today’s “sequential-hybrid net-
works” where different types of switches could exist sequentially on an end-to-end path.
RESCUE is proposed as an “add-on” service to existing Internet connectivity that extends
all the way to end hosts giving end hosts an option between two paths: (i) an Internet
packet-switched path, and (ii) a dedicated circuit. End-host applications need to make a
routing decision on which path to use. In current-day networks, such routing decisions are
27
typically made only at switches, not at end hosts. Drawing an analogy to transportation
system, we note that in some situations (e.g., travelling between New York and Boston),
people have a choice of multiple transportation options. In this dissertation, we illustrate
the advantages and costs of such an approach. Second, not only is our proposal to extend
bandwidth-on-demand high-speed (e.g., Gbps, 10Gbps) circuit services to end hosts new
to the optical networking research community, but our proposal to enable these networks
to support calls with very small holding times (e.g., in the order of milliseconds for single
data transfers within a file transfer application) pushes the envelope of bandwidth-on-
demand thinking. By introducing small data transfers as well as elephant data transfers as
applications, we are aiming to create a large-scale circuit-switched network providing
commodity services. The high network utilization required by a scalable network is possi-
ble with high traffic loads from commodity services, which translates to the low costs seen
by users.
3.4 Applications
Next, we consider the question of how to use RESCUE circuits for applications. In this
dissertation, we address two applications that can use RESCUE circuits: (i) Dial-Up ser-
vice to dynamically set up high-speed circuits bypassing enterprise access links to the
ISP’s routers, and (ii) end-to-end file-transfer application,.
In Dial-Up service, RESCUE circuits would essentially connect end hosts directly to the
ISP routers serving the enterprise. It would bypass the shared access circuit of the enter-
prise and thus allow the end host applications to enjoy lower packet loss. This addresses
the access link bottleneck problem (the first gap) described in Section 1.3. In the end-to-
end file transfer application, end-to-end RESCUE circuits are established between end
28
hosts located on optical circuit-switched networks. By using a high-speed transport proto-
col instead of TCP on RESCUE circuits, low file-transfer delays can be achieved. This
addresses the TCP limitations (the second gap) described in Section 1.3.
29
Chapter 4 Application I: Dial-Up Internet access service
using RESCUE circuits
In Section 1.3.1, we identified the enterprise access link bottleneck problem. Further-
more, in Section 2.1 we noted that the absolute data rates of these bottleneck links are less
important than packet-loss rates, which are usually high on the heavily congested access
circuits. Given these results, we propose a Dial-Up Internet access service that bypasses
the shared access link of the enterprise and thus allows the end host application to enjoy
lower packet-loss rates. It not only increases the access bottleneck link rate, but more
importantly decreases the probability of loss experienced by a single TCP flow to signifi-
cantly improve file-transfer delays. Section 4.1 describes the operational steps in using
RESCUE for Dial-Up service. Section 4.2 and 4.3 describes our analytical basis for the
routing decision.
4.1 Description
The architecture of our Dial-Up service is shown in Figure 9. In Dial-Up service, an end
host with a second Ethernet NIC can request a direct high-speed Ethernet/SONET circuit
to the ISP’s IP router. This is comparable to current-day Dial-Up telephone service but at a
significantly higher bandwidth. The RESCUE software will make a routing decision on
whether to use the host’s primary NIC or whether to request an Ethernet/SONET circuit
through the host’s second NIC to the ISP’s IP router. The circuits are held only for the
30
duration of single transfer within a TCP session (note that within the holding time of a
TCP connection, there can be many data transfers). This allows for increased sharing of
resources. Without such an implementation, the SONET resources on the access link will
lie unused during a user’s “think” time. In other words, sharing of the access link set up
for RESCUE access service will be on a call-by-call basis rather than a packet-by-packet
basis as is done on current access links. This results in better performance for the call that
does successfully obtain a RESCUE circuit, but the price paid is access-link utilization.
We will discuss this trade-off along with many other details in the following subsections.
When an end host’s RESCUE software requests a circuit by generating a signaling mes-
sage, the SONET/SDH optical circuit-switched network consisting of the enterprise
MSPP, access network switches and ISP’s MSPP (see Figure 9) may or may not be able to
accommodate the request. If no spare circuit is available on this path, then the call setup
request is blocked. The end host software is programmed to handle call blocking by then
Figure 9. Dial-Up Access Service Architecture
Dial-Up server(signaling
+configuration
software)
Internet serviceprovider
SONETMSPP
Ethernetswitch/IP router
Ethernethosts
User space
EthernetInterface
RESCUE circuitfor Dial-Up service
Primary Internetleased access circuit
Ethernetswitch/IP
router
From otherend hosts
ARP tableMap MAC addresses
to newly setupRESCUE circuit
Routing tableMap IP address to
newly setupRESCUE circuit
NIC 2 NIC 1
SONETMSPP
Enterprise building
Optical circuit-switchedaccess network
Application +RESCUE software
OS
31
switching to the primary NIC and using the shared WAN access link. If however a circuit
is available, the MSPP will crossconnect the host’s Ethernet signal to an equivalent-rate
SONET/SDH circuit established via the access network to the ISP’s IP router.
Since the ISP’s IP router is the terminating endpoint for the Ethernet/SONET circuit, it
should be capable of receiving signaling messages and accepting/rejecting circuit setup
requests. However, given the difficulty in adding new software to routers, we propose an
external “Dial-up server,” which consists of (i) software to terminate signaling messages
on behalf of the IP routers, and (ii) software to configure the routers. The signaling com-
ponent of the software will respond to the signaling messages issued by MSPPs/access
network switches. The configuration part of the software will have administrative user
permissions to write into the routing table and ARP table of the ISP’s IP router. This step
is required to create a mapping of the IP address and MAC address of the host connected
temporarily via the newly established Ethernet/SONET circuit to the corresponding inter-
face on the ISP’s IP router because at different time instants, different Ethernet hosts are
reachable through the same interface of the IP router. It allows the router to route packets
arriving from the Internet destined to end hosts by consulting the updated routing and
ARP tables.
Different arrangements are possible for increasing levels of sharing with the RESCUE
access service:
1. An enterprise could lease bandwidth from its MSPP directly to the ISP’s IP router for
RESCUE Dial-Up access service. This model is similar to today’s wide-area Internet
access link where a leased circuit is obtained by an enterprise to terminate directly on
the ISP’s IP router. This leased circuit is shared on packet-by-packet basis. In con-
32
trast, the RESCUE leased circuit would be shared on a transfer-by-transfer basis.
Multiple simultaneous flows can be accommodated if the leased circuit bandwidth
can support multiple Ethernet-rate circuits. For example, with an OC12 leased cir-
cuit, six simultaneous Fast Ethernet RESCUE circuits can be supported. This
arrangement is easy to introduce within today’s Internet, however, with the limited
sharing, utilization may be compromised to achieve low call blocking probability.
2. In addition to the previous arrangement, if the ISP’s MSPP supports signaling proto-
cols, then the interfaces on the ISP’s IP router can be shared. In the previous arrange-
ment, for each enterprise that leases say Ethernet-rate circuits, the ISP’s IP router
needs an equivalent number of interfaces connecting it to the MSPP. If the ISP’s
MSPP supports signaling, there can be aggregation with call-level sharing of a small
set of ISP’s router interfaces among many enterprises. This will increase sharing and
hence lower costs.
3. A more advanced arrangement requires the access network provider to upgrade their
SONET/SDH/WDM switches with signaling capability. The RESCUE access service
becomes truly shared among many enterprises reaching many ISPs via the access net-
work.
m
ISPMSPP
....
Subnet 1
Subnet 2
ISP
128.239.5
156.78.5
Access
le0
Destination I/FIP routing table
128.239.5 le0156.78.5 le0
Figure 10. An Extension of the Dial-Up Access Service
network
Enterprise MSPP
Ethernetswitch
Ethernetswitch
PC11
PC12
PC1n
PC21
....PC22
PC2n
IP router
33
Another extension of the Dial-Up access service concept is to not only allow end hosts
to connect their second NICs directly into MSPP ports, but also allow Ethernet switches
serving small subnets to be connected to the enterprise MSPP ports for RESCUE access
service as shown in Figure 10. When the first host on an Ethernet switch requests a RES-
CUE circuit, the MSPP sets up this circuit (assuming resources are available). If a second
host on the same Ethernet switch initiates an application that causes a RESCUE circuit
request, the enterprise MSPP’s signaling software can respond saying the RESCUE circuit
is already established. This requires a small amount of extra book-keeping on the part of
the enterprise MSPP’s signaling software. It needs to know the MAC addresses of all the
hosts hanging off the Ethernet switches connected to each of its RESCUE ports. Another
change is in the last step of the RESCUE circuit setup, which involves writing the routing
table and ARP table at the ISP’s IP router. Here instead of one IP address and one MAC
address, the addresses of all the hosts connected to the Ethernet switch on the RESCUE
circuit should be written into the ISP’s IP router tables. For example, in Figure 10, if the
Ethernet signal from the Ethernet switch with subnet number 128.239.5 is first crosscon-
nected to an equivalent-rate SONET signal, which is then released and the same SONET
signal crossconnected to the Ethernet switch with subnet number 156.78.5, then the IP
routing table entry for the former should be replaced with an entry for the latter.
4.2 Analytical Basis for the Routing Decision: Delay Analysis
In this section, we create an analytical model and obtain numerical values to provide a
quantitative basis for the routing decision. Let be the mean transfer delay
incurred if a Dial-Up circuit setup is attempted prior to a data transfer within a TCP ses-
sion:
E Tdialup[ ]
34
(2)
where is the call-blocking probability on the circuit-switched network, is
the mean call-setup delay of a successful circuit setup, is the mean delay
incurred in a failed call-setup attempt, is the mean time to transfer the file
using the Dial-Up circuit for access, and is the mean time to transfer the file
using the primary access link. If the call is not blocked, mean delay experienced is
, but if it is blocked, then after incurring a cost , the end
host has to use the TCP/IP path and hence will incur the delay. If a circuit
setup is not attempted, the mean delay is simply .
We compare from (2) with to determine whether the Dial-Up
end host should directly resort to the primary path or whether it should attempt a Dial-Up
circuit setup. Approximating to be equal to , we get:
(3)
and are computed using the analytical models of TCP pre-
sented in [39] and [40] (equation (1) in Section 1.3.1) with different packet-loss rates,
and , different bottleneck link rates, and , respectively,
but the same end-to-end propagation delay, . The mean call-setup delay is
derived by counting mean signaling message transmission delays, mean call-processing
delays (to process signaling protocol messages), and a round-trip propagation delay
between the Dial-Up end host and the ISP’s IP router :
E Tdialup[ ] 1 Pb–( ) E Tsetup[ ] E Ttcpdialup[ ]+( ) Pb E Tfail[ ] E Ttcp
primary[ ]+( )+=
Pb E Tsetup[ ]
E Tfail[ ]
E Ttcpdialup[ ]
E Ttcpprimary[ ]
E Tsetup[ ] E Ttcpdialup[ ]+ E Tfail[ ]
E Ttcpprimary[ ]
E Ttcpprimary[ ]
E Tdialup[ ] E Ttcpprimary[ ]
E Tfail[ ] E Tsetup[ ]
if E Tsetup[ ]
1 Pb–---------------------- E Ttcp
primary[ ] E Ttcpdialup[ ]–( )≥⎝ ⎠
⎛ ⎞ resort directly to the TCP/IP path
if E Tsetup[ ]
1 Pb–---------------------- E Ttcp
primary[ ] E Ttcpdialup[ ]–( )<⎝ ⎠
⎛ ⎞ attempt circuit setup
E Ttcpprimary[ ] E Ttcp
dialup[ ]
pprimary pdialup rprimary rdialup
Tprop E Tsetup[ ]
Tpropdialup
35
(4)
where is the cumulative size of signaling messages used in call setup, is the sig-
naling link rate, is the number of switches on the Dial-Up circuit path (between the
Dial-Up end host and the ISP’s IP router), and is the call-processing delay incurred at
each switch. We approximate the queueing delay for the signaling link with an M/D/1
queue at a load , and the queueing delay for the call processor also with an M/D/1*
queue at a load .
Numerical results:
In addition to the numerical values shown in Table 2, we compute additional values in
Table 3.
*M/D/1 queueing models are quite accurate here since inter-arrival times between file transfers have been shown tobe exponentially distributed [59], and signaling message lengths and call-processing delays are more-or-less con-stant.
Table 3. Mean TCP Transfer Delays for a 100MB File
Input parameters Intermediate derived results Final results
r (Mbps) Tprop (ms)Queuing delay
plus service time (ms)
RTT (ms)Wmax
(segments) for
a 100MB file (s)
45
0.10.436
0.536 2.01 18.2695 5.436 20.385 18.381
50 50.436 189.135 26.039
1000.1
0.1970.297 2.475 8.222
5 5.197 43.308 8.41250 50.197 418.308 23.200
10000.1
0.0200.120 10 0.824
5 5.020 418.333 2.320150 50.020 4168.333 21.320
E Tsetup[ ]msigrs
--------- 1 ρsig
2 1 ρsig–( )---------------------------+⎝ ⎠
⎛ ⎞× k 1+( )× Tsp 1 ρsp
2 1 ρsp–( )-------------------------+⎝ ⎠
⎛ ⎞× k× Tpropdialup+ +=
msig rs
k
Tsp
ρsig
ρsp
p E Ttcp[ ]
5 10 5–×
36
We plot the two sides of (3) in Figure 11 assuming that both and are
1Gbps, and the following parameter values for : , ,
0.0001
450.1
0.4600.560 2.1 18.274
5 5.460 20.475 18.50850 50.46 189.225 37.572
1000.1
0.210.31 2.558 8.226
5 5.21 43.392 8.63250 50.2 418.392 36.030
10000.1
0.0210.121 10.083 0.825
5 5.021 418.417 3.60350 50.02 4168.417 35.107
0.0005
450.1
0.5320.632 2.37 18.324
5 5.532 20.745 19.54550 50.53 189.495 88.701
1000.1
0.2390.339 2.825 8.253
5 5.239 43.658 10.40850 50.24 418.658 88.876
10000.1
0.0240.124 10.333 0.840
5 5.024 418.667 8.88850 50.02 4168.667 88.463
0.005
450.1
0.7310.831 3.116 19.124
5 5.731 21.491 34.35650 50.73 190.241 303.936
1000.1
0.3290.429 3.575 8.670
5 5.329 44.408 31.93050 50.33 419.408 301.528
10000.1
0.0330.133 11.083 1.025
5 5.033 419.417 30.15350 50.03 4169.417 299.754
0.01
450.1
0.8330.933 3.499 20.210
5 5.833 21.874 51.13050 50.833 190.624 445.549
1000.1
0.3750.475 3.958 9.229
5 5.375 44.792 47.11250 50.375 419.792 441.535
10000.1
0.0380.138 11.5 1.243
5 5.038 419.833 44.15850 50.038 4169.833 438.581
Table 3. Mean TCP Transfer Delays for a 100MB File
Input parameters Intermediate derived results Final results
r (Mbps) Tprop (ms)Queuing delay
plus service time (ms)
RTT (ms)Wmax
(segments) for
a 100MB file (s)p E Ttcp[ ]
rprimary rdialup
E Tsetup[ ] ρsp ρsig 0.7= = k 4=
37
, , , and . The 4 s call-pro-
cessing delay is based on our work on hardware-accelerated signaling protocol implemen-
tations [60]. is set to either 0.1ms or 50ms. Note that is the end-to-end round-
trip propagation delay between the two end hosts participating in the TCP connection,
while is the round-trip propagation delay between the end host invoking a RES-
CUE circuit and its ISP’s IP router. The latter is likely to be local and hence we only use
the 0.1ms number.
For the three horizontal lines in Figure 11 on which values are listed, the y-axis is the
left-hand side of (3), i.e., . For the remaining three lines, which are
marked with values and values, the y-axis is the right-hand side of (3),
i.e., , which we refer to as the “difference” curves. Under some
circumstances, there are crossovers between the difference curves and the horizontal lines.
For transfers of sizes below the crossover size, the end-host software should resort directly
to using the primary path, and for file sizes larger than the crossover size, the software
msig 100B= rs 10Mbps= Tpropdialup 0.1ms= Tsp 0.004ms= µ
Tprop Tprop
Tpropdialup
Figure 11. Plot of Equation (3) with a Link Rate of 1Gbps, , ρsig ρsp 0.7= = k 4=
(a) Tprop is 0.1ms (b) Tprop is 50ms
Pb
E Tsetup[ ] 1 Pb–( )⁄
pprimary pdialup
E Ttcpprimary[ ] E Ttcp
dialup[ ]–
38
should attempt a Dial-Up circuit setup.
We computed crossover sizes for various combinations of , , ,
and . The crossover file sizes for different conditions are listed in Table 4-5. The
first column in Table 4-5 show the values of the packet loss rate on the end-to-end path,
, if the primary access link is used and the packet loss rate on the end-to-end path,
and , if a RESCUE access circuit is setup. For different values of this pair of
parameters, we compute the crossover file size above which a RESCUE circuit setup
should be attempted for a data transfer under different operating conditions (call blocking
probabilities) of the access network.
, the round-trip propagation delay for the file transfer (from client to server) is set
to either 0.1ms or 50ms. The round-trip propagation delay between the enterprise building
and the ISP’s router, incurred when setting up a RESCUE circuit, as in (4), is
0.1ms. The bottleneck link rates on the primary path and on the path with the RESCUE
circuit are both set to either 1Gbps (Table 4) or 100Mbps (Table 5).
Table 4. Crossover File Sizes when
;
0.0001, 0.00001 40MB 43MB 52MB 349KB 361KB 396KB0.0001, 0.00005 58MB 63MB 79MB 387KB 403KB 449KB0.001, 0.00001 3.9MB 4MB 4.8MB 84KB 85KB 90KB0.001, 0.0005 5.5MB 6MB 7.5MB 78KB 81KB 88KB0.01, 0.00001 318KB 344KB 424KB 20.7KB 20.9KB 21.1KB0.01, 0.005 491KB 534KB 671KB 16.2KB 16.5KB 17.1KB
Table 5. Crossover File Sizes when
;
0.0001, 0.00001 21MB 23MB 28MB 318KB 331KB 367KB
pprimary pdialup rprimary
rdialup
pprimary
pdialup
Tprop
Tpropdialup
rprimary rdialup 1Gbps= =
Tprop 0.1ms= Tprop 50ms=
pprimary pdialup Pb 0.01= Pb 0.1= Pb 0.3= Pb 0.01= Pb 0.1= Pb 0.3=
rprimary rdialup 100Mbps= =
Tprop 0.1ms= Tprop 50ms=
pprimary pdialup Pb 0.01= Pb 0.1= Pb 0.3= Pb 0.01= Pb 0.1= Pb 0.3=
39
From these results we see that if the file transfer is across a wide area (high round-trip
propagation delays), a RESCUE circuit should be attempted even for small files (in the
order of KB) unless the end host knows that its access link is not the bottleneck link on the
path. For metro-area transfers ( of 0.1ms), a RESCUE circuit should be attempted
for files in KB range if the end host knows that upon setting up a RESCUE circuit it can
clearly lower the end-to-end loss rate from say 1% to 0.001% or to even half its value
0.5%. If the bottleneck link rates are lower at 100Mbps, the crossover file sizes beyond
which a RESCUE circuit should be attempted are even lower than when this link rate is
1Gbps.
Finally, we give crossover file sizes by assuming different bottleneck link rates for the
RESCUE path and the primary path. In general one can expect the primary leased link
bandwidth to be higher than an individual host’s RESCUE circuit. For example, enter-
prises may lease OC3 links for primary access, while RESCUE hosts may only have
100Mbps Ethernet NICs as their second (RESCUE) NICs. Even if the latter link rate is
lower, there are crossover file sizes above which attempting a RESCUE circuit is benefi-
cial if the packet loss rate of the RESCUE path is lower than on the primary path. As noted
in Section 2.1, link rates sometimes have little impact on total delay. Table 6 shows these
0.0001, 0.00005 27.5MB 30MB 37MB 372KB 388KB 435KB0.001, 0.00001 2MB 2.2MB 2.8MB 75KB 78KB 82KB0.001, 0.0005 3.2MB 3.6MB 4MB 75KB 78KB 85KB0.01, 0.00001 128KB 141KB 182KB 18.6KB 18.9KB 20.4KB
0.01, 0.005 218KB 236KB 292KB 14.9KB 15.2KB 16.8KB
Table 5. Crossover File Sizes when
;
rprimary rdialup 100Mbps= =
Tprop 0.1ms= Tprop 50ms=
pprimary pdialup Pb 0.01= Pb 0.1= Pb 0.3= Pb 0.01= Pb 0.1= Pb 0.3=
Tprop
40
crossover file sizes.
4.3 Analytical Basis for the Routing Decision: Utilization Analysis
Sharing of the access circuit for Dial-Up service is on a call-by-call basis rather than a
packet-by-packet basis on current access circuits. This results in better performance for
the call that does successfully obtain a RESCUE circuit, but the price paid is utilization.
Per-circuit utilization uc if a RESCUE circuit is used is given by:
(5)
where f is the size of the file being transferred. We plot the numerical results for per-circuit
utilization uc in Figure 12 for different values of round-trip propagation delay, , and
RESCUE circuit link rate, .
We first note that as file size f is increased, the per-circuit utilization increases. Once a
TCP connection is established, it begins with the initial slow start phase, where it slowly
increases its congestion window. Thus the corresponding utilization begins with a low
value and increases slowly. When the file size is small, the data transfer is completed
before the TCP connection enters the streaming state, which leads to low utilization [58].
If the file size is large enough (e.g. 200KB when is 0.1ms and 50MB when is
Table 6. Crossover File Sizes when , , and
.
;
0.0001, 0.00001 405KB 416KB 447KB0.0001, 0.00005 487KB 500KB 538KB0.001, 0.00001 85KB 87KB 91.3KB0.001, 0.0005 90KB 92KB 99.5KB0.01, 0.00001 18.9KB 19KB 20.4KB0.01, 0.005 16.2KB 16.4KB 17.2KB
Tprop 50ms= rprimary 155Mbps=
rdialup 100Mbps=
pprimary pdialup Pb 0.01= Pb 0.1= Pb 0.3=
ucf rdialup⁄
E Ttcpdialup[ ] E Tsetup[ ]+
------------------------------------------------------=
Tprop
rdialup
Tprop Tprop
41
50ms), the TCP connection will finally reach the streaming state, where data packets will
effectively be transmitted continuously, and a higher utilization will be seen.
The second observation is that the propagation delay has a significant impact on utiliza-
tion. A higher utilization is achieved in low-propagation-delay environments than in high-
propagation-delay environments. For example, with a 100Mbps link rate and 10-5 packet
loss rate, a 10MB file transfer over a RESCUE circuit results in a 97.2% per-circuit utili-
zation when the propagation delay is 0.1ms, but only 57.1% when the propagation delay is
50ms. From our file transfer delay analysis, we found that RESCUE circuits should be
attempted even for small files across a wide area (high round-trip propagation delays), but
from a utilization perspective we see the need to place a lower bound on file sizes for
wide-area transfers.
Finally, in Figure 12 we see that increasing the link rate of RESCUE circuits from
100Mbps to 155Mbps results in a drop in per-circuit utilization when the propagation
Figure 12. Plot of Per-circuit Utilization for Files in the Range of(10KB, 50MB) with =0.00001pdialup
42
delay is 50ms. For example, for a 50MB file transfer, the drop is from 80% to 67%. As
noted in Section 2.1, increasing link rates beyond some level has little positive impact on
file transfer delays especially when the propagation delay is high. Therefore it is not
always beneficial to increase link rates.
4.4 Chapter Summary
Enterprise access links remain bottlenecks even as LAN and WAN link rates increase. In
this chapter, we proposed a Dial-Up Internet access service in which end hosts run device
drivers that request Dial-Up RESCUE circuits from the end host to an ISP IP router,
bypassing congested shared access links. This results in lowered packet-loss rates and thus
translates into lower mean file-transfer delays (or increased TCP throughputs). This is
especially true for wide-area TCP paths when round-trip propagation delays are high. The
circuits leased for Dial-Up service are shared on a call-by-call basis unlike Internet access
lines that are shared on a packet-by-packet basis. This helps improve user performance.
The trade-off is in utilization. To keep utilization high, we propose that a Dial-Up circuit is
only held for durations of single data transfers.
43
Chapter 5 Application II: End-to-end RESCUE Circuits
to Improve File Transfer Delays
There is a growing interest in improving current protocols or developing new ones to
increase the effective throughput of file transfers on the Internet [61][62]. Of particular
interest is the effective throughput of transfers of large files, for which current TCP has
been shown to be inadequate [21]. Contrary to the conventional thinking of video-stream-
ing transfers being the prime contributor to high-bandwidth applications, we note that file
transfers can enjoy any amount of bandwidth. The higher the rate, the lower the file-trans-
fer delay. This is unlike video-streaming applications, which with compression technolo-
gies often require only a few Mbps but long durations. Increasingly the Grid community is
recognizing the value of optical circuit-switched networks to carry out transfers of very
large files created by scientists [62]. Thus, we propose to use high-speed RESCUE circuits
for the end-to-end file transfer application.
Section 5.1 describes the operational steps in using end-to-end RESCUE circuits for file
transfers. Sections 5.2 and 5.3 describe our routing decision algorithm based on delay and
utilization analysis.
5.1 Description
Figure 13 shows the architecture for the proposed end-to-end file transfer applications.
In the end-to-end file transfer applications, high-speed, end-to-end RESCUE circuits are
44
requested automatically by end-host software when file-transfer applications on the end
host require high-throughput end-to-end communication. The circuit consists of a concat-
enation of Ethernet signals from end hosts to MSPPs within enterprises and Ethernet-over-
SONET signals between enterprise MSPPs across wide-area optical circuit switched net-
works.
File-transfer applications based on Hyper-Text Transfer Protocol (HTTP) and File
Transfer Protocol (FTP) typically involve the exchange of small messages prior to the
actual data transfer. Exploiting the presence of the dual paths (another one of our reasons
for RESCUE being an “add-on” service), we propose using the primary TCP/IP path for
these small messages. RESCUE circuits are used only for the actual data transfer. To
achieve high circuit utilization, we propose that the circuit be held only for the duration of
the actual data transfer and released immediately upon completion. Furthermore, we rec-
ommend that the EoS circuit be unidirectional from the sender to the receiver.
Optical Connectivity Service (OCS): When a sending host is ready to transfer a file, it
has to determine whether the correspondent end can be reached by a direct Ethernet/
Figure 13. Use of RESCUE Circuits for End-to-end File Transfers
Internet - Packet Switches(IP routers interconnecting
various networks)
Optical circuit-switchednetworks
SONETMSPP
Ethernetswitch/IP router
Kernalspace
Ethernethosts
User space
EthernetInterface
From otherend hosts
NIC 2 NIC 1
Enterprise building
SONETMSPP
Ethernetswitch/IP router
Kernalspace
Ethernethosts
User space
EthernetInterface
From otherend hosts
NIC 2NIC 1
Enterprise building
Primary Internetleased access circuit
RESCUE circuit for End-To-End file transfer
service
Application +RESCUE software
OS
Application +RESCUE software
OS
45
SONET circuit. We propose a service called Optical Connectivity Service (OCS), similar
to the Domain Name Service (DNS), to maintain connectivity information. As with DNS,
information can be cached to reduce delay overhead incurred in determining whether the
correspondent end host is reachable with RESCUE. Alternatively, OCS service can be
implemented in a centralized fashion as a part of a carrier network management system.
For example, it can be combined with an Authentication, Authorization and Accounting
(AAA) service [63]. OCS is important to enable a gradual growth of RESCUE users. If an
end host with RESCUE capability wants to communicate with an end host without such
capability, it will simply use the Internet. If, through OCS, it determines that the corre-
spondent host also has RESCUE capability, and furthermore it is connected via the same
optical circuit-switched network, it can use a RESCUE circuit.
Transport protocol on end-to-end RESCUE circuits: For the actual data transfer on
RESCUE circuits, we recommend using a combination of a rate-based transport protocol
on the unidirectional end-to-end Ethernet/SONET circuit from the server to the client and
a TCP connection for the reverse direction through the IP network. Standard TCP is not
well-suited for end-to-end circuits, i.e., paths on which there are no packet switches,
because of the congestion-control mechanisms built into Standard TCP. This functionality
is not only unnecessary if the end-to-end path is a circuit, it is also detrimental because bit
errors will be interpreted as congestion losses causing variations in the sending rate. For
full utilization of the circuit what we need is a transport protocol that uses rate-based flow
control and constantly sends data. As for error control, we do expect losses both as a result
of bit errors on links (which are likely to be rare because of the high quality of optical fiber
transmission), and receive-buffer overflows resulting from mismatches in the instanta-
46
neous rate at which the sending NIC emits data on to the circuit and the rate at which the
receiving end host software moves data to disk. A negative acknowledgement (NAK)
based error-control scheme is well suited for end-to-end circuits where data blocks are
received in sequence. Retransmissions can be sent on the Ethernet/SONET circuit if they
occur in the middle of the transfer and on the TCP connection if they occur at the end. The
reason for the latter is to avoid having to hold the circuit open after completion of the ini-
tial file transfer while waiting for the final acknowledgment confirming completion.
5.2 Analytical Basis for the Routing Decision: Delay Analysis
In this section we use the analytical model described in Section 4.2 to provide a quanti-
tative basis for the routing decision in the file-transfer application. Using similar reasoning
to that presented in (2) and (3), we can base the routing decision on whether to attempt a
circuit setup for an end-to-end file transfer as follows:
(6)
is the actual file-transfer delay on the end-to-end Ethernet/SONET circuit:
(7)
where is the size of the file being transferred and is the data rate of the circuit.
is the mean time to transfer the file using the primary TCP/IP path and computed
using the analytical models of TCP presented in [39] and [40] (equation (1) in Section
1.3.1). We have not included retransmission delays here because on Ethernet/SONET cir-
cuits, retransmissions required due to random bit errors and/or receive-buffer overflows
if E Tsetup[ ]
1 Pb–---------------------- E Ttcp[ ] Ttransfer–( )≥⎝ ⎠⎛ ⎞ resort directly to the TCP/IP path
if E Tsetup[ ]
1 Pb–---------------------- E Ttcp[ ] Ttransfer–( )<⎝ ⎠⎛ ⎞ attempt circuit setup
Ttransfer
Ttransferfrc----- Tprop
2------------+=
f rc
E Ttcp[ ]
47
are needed in both the TCP path and the Ethernet/SONET end-to-end circuit. Since the
routing decision is based on comparing delays on the two paths before deciding whether
or not to attempt a circuit setup, we have omitted retransmission delays on both paths.
Including this delay would in fact favor using the Ethernet/SONET circuit. This is because
bit errors on the TCP/IP path would be misinterpreted as packet losses caused by conges-
tion leading to a reduction in the sending rate.
The key difference between (6) and (3) is that we have in (6) instead of the
term in (3). This is because in the Dial-Up application, since only the access
link is being bypassed, TCP is still required end-to-end, while on the end-to-end circuit,
the transfer time after the circuit is set up is given by (7). A second difference is in the
term . In (4), we used the term to denote the round-trip propagation
delay between the RESCUE end host and its ISP’s IP router. To obtain the numerical
results, we assumed this delay to be typically small, and hence set it to 0.1ms. In the file-
transfer application will include an end-to-end round-trip propagation delay
between the two hosts, which could have a small or large value depending on the distance
between the two hosts. For example, we use 0.1ms for a metro-area path and 50ms for a
wide-area path.
Numerical results:
We provide two sets of numerical results. In sub-section A, we consider the case when
the circuit rate is the same as the bottleneck link rate on the primary TCP/IP path. In sub-
section B, we consider the case when the circuit rate is only 100Mbps while the bottleneck
rate on the TCP/IP path is 1Gbps.
A. Discussion of the routing decision (6) if
Ttransfer
E Ttcpdialup[ ]
E Tsetup[ ] Tpropdialup
E Tsetup[ ]
rc r=
48
We plot the two sides of (6) in Figure 14 and Figure 15 assuming that both and are
100Mbps and 1Gbps respectively. The parameter values used to compute are
same as those used in Section 4.2 for the Dial-Up application except for , the number of
switches, and . We increase from 4 to 20 since the end-to-end circuit between
hosts could consist of more circuit switches than the Dial-Up path from a host to its ISP’s
IP router. is set to either 0.1ms or 50ms as previously mentioned.
rc r
E Tsetup[ ]
k
Tprop k
Tprop
Figure 14. Plot of Equation (6) with , , rc r 100Mbps= = ρsig ρsp 0.7= = k 20=
(a) Tprop is 0.1ms (b) Tprop is 50ms
Figure 15. Plot of Equation (6) with , , rc r 1Gbps= = ρsig ρsp 0.7= = k 20=
(a) Tprop is 0.1ms (b) Tprop is 50ms
49
For wide-area paths, when is 50ms, for the entire file range (100KB, 1GB), a
RESCUE circuit setup should be attempted for the and values shown. This is
because is always less than the difference (see
(6)). However, on low-propagation delay paths (Figure 14(a) and Figure 15(a) in which
), we see that there are crossover file sizes below which an end host
should resort directly to the TCP/IP path and above which it should attempt an Ethernet/
SONET circuit setup. These crossover file sizes are listed in Table 7 and Table 8. As an
example, if , the call-blocking probability on the optical circuit-
switched network is 30% and the packet-loss rate on the TCP/IP path is 1%, then for
calls in which the RESCUE circuit traverses 20 switches, 650KB is a crossover file size in
low-propagation delay environments, i.e., when the end-to-end path is within a single
metro area. For files below this size, the end host application software should directly
resort to the TCP/IP path.
Table 7. Crossover File Sizes when and
Number of switches on the circuit Number of switches on the circuit
610KB 640KB 840KB 2.4MB 2.65MB 3.4MB
490KB 550KB 730KB 2MB 2.2MB 2.8MB
120KB 140KB 180KB 500KB 550KB 650KB
Table 8. Crossover File Sizes when and
Number of switches on the circuit Number of switches on the circuit
4.8MB 5.4MB 7.2MB 22MB 24MB 30MB
3.2MB 2.4MB 2.2MB 9MB 10MB 12MB
Tprop
Pb Ploss
E Tsetup[ ] 1 Pb–( )⁄ E Ttcp[ ] Ttransfer–
Tprop 0.1ms=
rc r 100Mbps= = Pb
Ploss
rc r 100Mbps= = Tprop 0.1ms=
Measure of loading on
Ckt. sw.network
TCP/IP path
k 4= k 20=
Pb 0.01= Pb 0.1= Pb 0.3= Pb 0.01= Pb 0.1= Pb 0.3=
Ploss 0.0001=
Ploss 0.001=
Ploss 0.01=
rc r 1Gbps= = Tprop 0.1ms=
Measure of loading on
Ckt. sw.network
TCP/IP path
k 4= k 20=
Pb 0.01= Pb 0.1= Pb 0.3= Pb 0.01= Pb 0.1= Pb 0.3=
Ploss 0.0001=
Ploss 0.001=
50
B. Plots of the routing decision (6) if and
Our reason for considering this option is as follows. Consider the case when the leased
circuit for the primary Internet access of an enterprise is increased to 1Gbps. Say, the line
leased for RESCUE service is also 1Gbps. In this case, if a RESCUE circuit for a single-
file transfer is allocated the full 1Gbps rate of the RESCUE leased circuit, all other calls
will be blocked. For increased sharing of the RESCUE leased circuit, each RESCUE cir-
cuit may only be allocated 100Mbps. This is the classical difference between bandwidth
sharing modes of circuit-switched and packet-switched networks. In this section, we con-
sider the question of whether there is any value for the circuit-switched path even with this
ten-fold handicap in the data rate.
We plot the two sides of (6) in Figure 16 assuming that and
. When , for the entire file range (100KB, 1GB), a RESCUE
circuit should be attempted first as shown in Figure 16(b). However, when
, for the entire file range (100KB, 1GB), the TCP/IP path should be used
directly if there is such a rate mismatch. For the case, the difference
is not only smaller than , but also negative.
This implies that the mean transfer delay on the higher-rate TCP path is smaller than the
time to transfer on the lower-rate circuit.
In summary, from this file-transfer delay analysis, we conclude that a circuit setup
300KB 360KB 500KB 1.2MB 1.4MB 1.8MB
Table 8. Crossover File Sizes when and
Number of switches on the circuit Number of switches on the circuit
rc r 1Gbps= = Tprop 0.1ms=
Measure of loading on
Ckt. sw.network
TCP/IP path
k 4= k 20=
Pb 0.01= Pb 0.1= Pb 0.3= Pb 0.01= Pb 0.1= Pb 0.3=
Ploss 0.01=
rc 100Mbps= r 1Gbps=
rc 100Mbps=
r 1Gbps= Tprop 50ms=
Tprop 0.1ms=
Tprop 0.1ms=
E Ttcp[ ] Ttransfer– E Tsetup[ ] 1 Pb–( )⁄
51
should be attempted if is 50ms for files 100KB or larger even with gross rate mis-
matches, while in low propagation-delay environments, the decision depends upon the file
size, the rates on the two paths, and the loading conditions on the two paths.
5.3 Analytical Basis for the Routing Decision: Utilization Analysis
While file-transfer delay is an important user measure for making the routing decision
of whether or not to attempt a circuit setup, service provider measures such as utilization
should also be considered since utilization ultimately does impact users through prices
charged. Total network utilization has two components: per-circuit utilization, , and
aggregate circuit utilization, .
Per-circuit utilization uc is given by:
, where . (8)
where is the average file size. Even though we hold circuits only for the duration of
the transfer, and only set up unidirectional circuits, given that call holding times are short,
Figure 16. Plot of Equation (6) with , ,,
rc 100Mbps= r 1Gbps=ρsig ρsp 0.7= = k 20=
(a) Tprop is 0.1ms (b) Tprop is 50ms
Tprop
u uc
ua
ucE Ttransfer[ ]
E Tsetup[ ] E Ttransfer[ ]+----------------------------------------------------------= E Ttransfer[ ] E X[ ]
rc-----------=
E X[ ]
52
call-setup delays lower utilization. This is because switches hold resources for a call as its
setup procedure proceeds end-to-end. When is large (e.g., 50ms), there should be a
minimum file size below which circuit setup should not be attempted. Without such a min-
imum, per-circuit utilization can be poor. For example, consider a file size of 100KB. For
the 50ms propagation delay environments, we concluded from our file-transfer delay anal-
ysis in Section 5.2 that circuits should be attempted for all file sizes in the range (100KB,
1GB). However, for a 100KB file transfer on a 100Mbps circuit with 4 switches on the
end-to-end path, we need 50.158ms setup time and 8ms total transfer time. As a result, the
utilization per circuit is only 13.7%. From our file-transfer delay analysis, we found cross-
over file sizes for low-propagation delay environments, but from a utilization perspective
we see that such crossover file sizes are necessary for large-propagation delay environ-
ments.
To obtain aggregate circuit utilization , we model the three-link scenario shown in
Figure 17. In general, depending upon the number of switches equipped with signaling
engines, an Ethernet/SONET circuit could traverse many links. In the scenario shown in
Figure 17, each switch connects enterprises. Assume that each access link has
circuits and the core link has circuits. We further assume that each enterprise gener-
Tprop
SwitchSwitch... ...N N
maccess
mcore
maccess
maccess
maccess
...
Local trafficLong distance traffic
Figure 17. A Three-link Network Model of RESCUE Service
ua
N maccess
mcore
53
ates the same offered call load to the network of which a fraction represents local traffic
(i.e., calls between two enterprises within a same metro network) and the remaining frac-
tion representing long distance traffic (to enterprises located in the other metro net-
works). Both local and long distance call arrival processes are assumed to be Poisson.
We use the reduced load approximation [64], also well known as Erlang's fixed point
method, for call-blocking probabilities, which is known to be quite accurate except under
very high offered call loads. Nevertheless we validated our analytical results with simula-
tions and found a close match.
Call-blocking probabilities for the access links and core links are given by:
and (9)
(10)
(11)
where is the Erlang-B formula and (10) and (11)
characterize the thinning effect of the load due to blocking on other links. is the frac-
tional offered call load of each enterprise created by only allowing for transfers with files
larger than some crossover file size, , requesting a RESCUE circuit. Using the Pareto
distribution for file sizes [65], we compute the fractional offered load as
(12)
where , the shape parameter, is 1.06 and , the scale parameter, is 1000 bytes as com-
puted in [65], and is the total offered load.
The fixed-point reduced load approximation requires an iteration of (9)-(11) until there
is convergence to a single point. We start the iteration assuming and
f
1 f–
Pbcore
Erl νcoremcore,( )= Pb
accessErl νaccess
maccess,( )=
νcoreNν 1 f–( ) 1 Pb
access–( )2
=
νaccess νf 1 Pbaccess–( ) ν 1 f–( ) 1 Pb
core–( ) 1 Pbaccess–( )+=
Erl ρ m,( ) ρmm!⁄( ) ρk
k!⁄k 0=
m
∑⎝ ⎠⎜ ⎟⎛ ⎞⁄=
ν
χ
ν
ν ρE X( )-----------P X χ≥( )E X X χ≥( )[ ] ρ α 1–( )
αk--------------------- k
χ---⎝ ⎠⎛ ⎞
α αχα 1–------------ ρ k
χ---⎝ ⎠⎛ ⎞
α 1–
= = =
α k
ρ
Pbaccess 0←
54
. When the iterations converge, the end-to-end call-blocking probabilities
become:
and (13)
(14)
and corresponding link utilizations are:
(15)
(16)
Combining the two components of utilization ( in (8) and in (15)~(16)), we obtain
total utilization for the two types of links:
(17)
(18)
Numerical results:
To obtain numerical results, we assume the following input parameter values:
, , and . While the two switches shown in Figure
17 could both belong to the same metro network, for purposes of understanding the impact
of propagation delay, we assume that these two switches are located in distant metro net-
works with the round-trip propagation delay between the two switches being 50ms. We
assume intra-area round-trip propagation delay to be 0.1ms. Furthermore we assume local
calls pass through 3 switches (MSPPs at each enterprise and one of the two switches
shown in Figure 17) and long-distance calls pass through 4 switches (2 enterprise MSPPs
Pbcore 0←
Pblocal 1 1 Pb
access–( )2
–= Pblong dist– 1 1 Pb
access–( ) 1 Pbcore–( )–=
Pb Pblocal
f Pblong dist– 1 f–( )+=
uaaccess 1 Pb
local–( )νf 1 Pblong dist––( )v 1 f–( )+
maccess
-----------------------------------------------------------------------------------------------------=
uacore 1 Pb
long dist––( )Nν 1 f–( )
mcore
---------------------------------------------------------------=
uc ua
u
uaccess
uaaccess E X X χ≥| ⟩[ ]( ) rc⁄
E Tsetup[ ] E X X χ≥| ⟩[ ]( ) rc⁄+----------------------------------------------------------------------×=
ucore
uacore E X X χ≥| ⟩[ ]( ) rc⁄
E Tsetup[ ] E X X χ≥| ⟩[ ]( ) rc⁄+----------------------------------------------------------------------×=
N 100= f 0.8= mcore 10maccess=
55
and the two switches shown in Figure 17).
We plot the numerical results of and in Figure 18(a) and Figure 18(b)
respectively, for different call-blocking probabilities , and different values of in Fig-
ure 18. As crossover file size is increased, is kept constant by increasing mcore and
maccess correspondingly. The “zigzag” pattern of the plots occurs because mcore and mac-
cess have to be integers.
As the crossover file size is increased, the access link aggregate utilization, ,
decreases because offered load decreases. Since the for local calls
( and ) is small (0.8ms), and 80% of the calls are assumed to be
local calls, per-circuit utilization is high even for the small files, e.g., 91% for a 100KB
file. Increasing the crossover file size will not improve the per-circuit utilization signifi-
cantly. As a result, the total access link utilization in Figure 18(a) decreases
slightly as crossover file size is increased.
In Figure 18(b), we plot , the utilization of the core link. All calls passing through
uaccess
ucore
Pb ρ
χ Pb
Figure 18. Plot of Total Utilization on Each Access Link and the Core Link
(a) Access link utilization uaccess (b) Core link utilization ucore
χ uaaccess
E Tsetup[ ]
Tprop 0.1ms= k 3=
uaccess
χ
ucore
56
this link are long-distance calls ( and ). As the crossover file size
is increased, the plots show total utilization increasing because per-circuit utilization
increases. However, beyond a critical crossover file size, the drop in the offered call load
and the corresponding drop in the aggregate utilization slows the increase of the
total utilization, making it stable at some value below 1 or even dropping it slightly. For
example, the optimal crossover file size is 2.7MB when is 5 and is 30%.
Another observation is that high utilization is possible by operating the network at a
high call-blocking probability (30%). For example, with and a blocking probabil-
ity of 30%, we can achieve a 93% total utilization on the core link using a crossover file
size of 500KB, while at a blocking probability of 1%, we can only achieve 84% utiliza-
tion.
5.4 Chapter Summary
In this chapter, we proposed to improve delay performance of file transfers by using
end-to-end RESCUE circuits where possible. To achieve high circuit utilization, we pro-
posed a unidirectional end-to-end RESCUE circuit from the server to the client and a rate-
based transport protocol on the RESCUE circuit. If the circuit setup is successful, there is
a huge advantage in total delay especially in wide-area environments. For example, a 1TB
file requires on 2.2 hours on a 1Gbps end-to-end circuit but could take more than 4 days
on a TCP/IP path in a WAN environment. We analyzed the conditions under which a cir-
cuit setup should be attempted. For WAN environments, it is clear that a circuit setup
should be attempted for large and medium-sized file transfers. For small-size file transfers
(on the order of KB), we see the need to place a lower bound for file sizes. Without such a
lower bound, network utilization can be poor. In lower propagation-delay environments,
Tprop 50ms= k 4= χ
uacore
ρ Pb
ρ 10=
57
one should consider the loading conditions on the two paths, probability of packet loss on
the TCP/IP path and call blocking probability through the circuit-switched network,
before deciding whether or not to attempt the circuit setup.
58
Chapter 6 Implementation of Application II
To take advantage of the benefits brought by the RESCUE service, software enhance-
ments are needed at end hosts. We identify three basic modules for the end-host RESCUE
software: a routing decision module, a signaling module, and a high-speed transport proto-
col module for end-to-end file-transfer applications. An overview of the end-host RES-
CUE software architecture is shown in Figure 19. The user application shown in Figure 19
interacts with the RESCUE routing decision module to decide whether or not to attempt a
circuit setup. If it decides to attempt a circuit setup, the RESCUE signaling module ini-
tiates a call-setup request to the signaling-enabled network switches. If the circuit setup is
successful, RESCUE software will direct the user application to initiate data transfers on
the Ethernet/SONET circuit through the end host’s second NIC. Depending upon the
application, TCP or some other transport protocols could be used on the circuit. If, on the
contrary, the routing decision module determines the primary TCP/IP path is preferred, or
Application
Signaling
TCP NIC I
NIC II
TCP/othertransportprotocols
End host RESCUE software
Routing decision
Primary TCP/IP path
Dynamically configured Ethernet/EoS circuit
Figure 19. An End Host Configured for RESCUE Service
59
if the circuit setup fails, the user application will be directed to the primary TCP/IP path
through the end host’s primary NIC.
Details of the high-speed transport protocol module are presented in Section 6.1 along
with experimental results. Section 6.2 and 6.3 describe the basic functionality and features
for the routing decision module and the signaling module.
6.1 Design and Implementation of a High-speed Transport Protocol
In Section 1.3.2, we show that TCP is a poor choice for dedicated high-speed end-to-end
circuits because of its slow start and congestion avoidance algorithms. Also, TCP’s win-
dow-based flow control and positive-ACK based error control scheme are not well suited
for dedicated end-to-end circuits. In this section, we consider the question of what trans-
port protocol to use on the end-to-end high-speed RESCUE circuits for the file-transfer
application.
6.1.1 Design Rationale
To design a transport protocol for the end-to-end RESCUE file transfers, we start by
considering the purpose and role of transport protocols. Transport protocols perform the
functions necessary to achieve reliable transfer of data on an end-to-end basis. The end-to-
end paths in the file-transfer application will consist of Ethernet, GbE, or 10GbE segments
at the ends connected via wide-area SONET/SDH circuits and/or WDM all-optical light-
paths. Since resources are reserved in a dedicated manner for these circuits, congestion is
handled during circuit setup. Once the circuit is successfully provisioned, congestion con-
trol functionality is not required during the data transfer. There is no contention for
resources during the actual data transfers, and hence, no possibility of data loss at the cir-
cuit-based network switches unlike in packet-based network switches. Nevertheless losses
60
can occur even on these RESCUE circuit due to (i) link errors and (ii) receive-buffer
overflows. Link errors arise from bit and burst errors on the physical media. Even though
optical fiber, the physical medium of circuit-switched networks, is fairly reliable, link
errors are unavoidable. We will describe error-control solutions for recovery from link
errors and receive-buffer overflows.
A. Flow Control
There are three well-known, flow-control methods: ON/OFF, window-based, and rate-
based. In ON/OFF and window-based flow control schemes, the receiver sends messages
to control the behavior of the sender dynamically. These receiver-based flow control
schemes are suitable for networks, such as the Internet, where the available bandwidth and
receiver rates are unknown to the sender. However, when used in circuit-switched net-
works, they leave open the possibility of the circuit lying idle while a sender awaits an ON
or “window-open” signal from the receiver, which could lead to poor circuit utilization. In
RESCUE service, since the circuit rate assignment is known a priori to the sender, it is
possible to set the sending rate to match the circuit rate/receiver rate. Therefore, the flow-
control scheme in our transport protocol is rate-based. Rate-based flow control uses the
available bandwidth in an efficient manner. It can be implemented by setting the inter-
packet generation time at the sender.
However, implementation of a rate-based flow control is more complicated than the
above discussion indicates because of receive-buffer overflows. What are these and why
do they occur? At first glance, it appears that with a dedicated circuit, one could simply
match the sending rate to the receiver rate (which is ideally equal to the circuit rate), thus
eliminating the need for receiver buffers. For example, on an end-to-end telephone cir-
61
cuits, senders (speakers) generate audio data at the same rate at which receivers (listeners)
consume data. However, unlike telephones that perform their single dedicated function,
general-purpose computers that are typically involved in file transfers engage in several
other activities concurrent with file transfers. This requires operating systems to schedule
various tasks in and out of the processor as needed, which implies that data received on a
NIC is not moved at a guaranteed constant rate from the NIC to the disk. Furthermore disk
access rates are not constant. There can be significant variability based on the location to
which data needs to be written. Variability also arises at the sender. Based on when the
operating system at the sender schedules the network-related kernel threads and drivers,
data is moved from disk to memory (user-space and/or kernel-space) to the NIC at varying
rates. This variability at the receiver and at the sender leads to the interesting question of
how to select an appropriate circuit rate (transfer rate). If a pessimistic rate is chosen to
avoid (at all costs) the possibility of the sender sending data faster than the rate at which
the receiver can move the data, the overall transfer delay could be higher than if an opti-
mistic rate is chosen allowing for losses and subsequent retransmissions. Thus rate-based
flow control schemes are not trivial to implement.
B. Error Control
To counter losses from link errors and receive-buffer overflows, we propose to use the
selective-Automatic-Repeat-reQuest (selective-ARQ) scheme to achieve a high efficiency.
NAKs can be used to indicate packet losses instead of requiring the sender to maintain
timers and await positive acknowledgements (ACKs) because of the guaranteed in-
sequence delivery of data blocks on dedicated circuits. However, ACKs are still needed
because the sender needs to update its retransmission buffers. Retransmission buffers are
62
required at the sender because of the high delay overheads involved in disk accesses. Ide-
ally, if one could implement a transport protocol using OS-bypass, where a software pro-
gram on a processor on the NIC or hardware circuitry moves data directly from the disk to
the NIC for transmission, then no retransmission buffers will be needed at the sender. If a
block of data is lost, it can be re-extracted directly from the disk and retransmitted. How-
ever, in software implementations on general-purpose hosts using standard Ethernet NICs,
we will need retransmission buffers at the sender where data is held until acknowledged to
avoid repeated disk accesses. Retransmission buffers cannot be too large because of the
limited memory size at end hosts. Therefore we need ACKs to confirm the delivery of
packets and allow the release of the corresponding space in the retransmission buffer.
There is a trade-off between the retransmission buffer size and the frequency of sending
positive-ACKs. The longer the time interval between two consecutive ACKs, the less the
overhead incurred. But the retransmission buffer has to be correspondingly larger to store
more unacknowledged packets. We need further experimentation to select appropriate val-
ues for the retransmission buffer size. Thus, we choose to use a combination of NAKs for
immediate packet-loss indications with timer-based ACKs to clear the retransmission
buffer. A NAK will also be treated as a cumulative ACK for all segments prior to the one
being requested in the NAK.
Resequencing buffers are similarly needed at the receiver to accumulate together Ether-
net frames to create a block before performing a write access of the disk*. Accessing the
disk for each Ethernet frame will result in excess overhead.
C. Use of dual communication paths
*Again, with an OS-bypass implementation, resequencing buffers can be eliminated.
63
As described in Section 3.3, the system architecture leverages dual paths: (i) a dedicated
end-to-end high-speed circuit and (ii) a TCP/IP path. The high-speed dedicated circuit is
used only for the actual data transfer and is held open only for as long as user data flows
from the server to the client. Next consider whether to send retransmissions on the dedi-
cated end-to-end circuit or on the TCP/IP path. Our answer is to use the dedicated end-to-
end circuits for retransmissions unless these are needed at the end of the transfer. The rea-
son for not wanting to use the dedicated circuit for any retransmissions needed at the end
of a transfer is that the circuit will have to lie idle while the sender awaits acknowledg-
ment of data reception from the receiver. On the other hand, if an aggressive transfer rate
is used, we can expect a fair number of retransmissions to handle receive-buffer over-
flows. Hence we recommend using the dedicated end-to-end circuits for most of the
retransmissions.
The RESCUE circuit set up for the file transfer would be a unidirectional circuit for uti-
lization reasons. This raises the question of how to transport reverse-path control mes-
sages, such as ACKs and NAKs. If the RESCUE circuit is used for this exchange,
utilization will suffer. Given the end hosts have two communication paths in RESCUE
service, we propose using the TCP/IP path for such exchanges.
As a summary, we identify that the transport protocol on end-to-end high-speed RES-
CUE circuits should use a rate-based flow control scheme and a selective-ARQ based
error control scheme with both NAKs and ACKs.
6.1.2 Related Work
A number of new transport protocols have been proposed for high-speed networks using
the easy-to-use UDP socket API offered by end-host systems and implemented as applica-
64
tion-level processes. Examples include SABUL [66], UDT [67], Tsunami [68], and
RBUDP [69]. Others have enhanced TCP [22]-[24] and implemented these enhancements
in the kernel space. Most of these enhancements are for high-speed packet-switched net-
works, which means they run congestion-control mechanisms to adjust sending rates dur-
ing data transfers based on congestion levels, and therefore not suited for end-to-end
circuits.
RBUDP is an exception, being specifically targeted at photonic networks and imple-
mented with rate-based flow control. In RBUDP, the sender transmits whole user data at a
fixed rate and then retransmits all lost packets at the end of user-data transfer according to
the response from the receiver. This results in a poor circuit utilization during the retrans-
mission phrase because the circuit have to be held open after completion of the initial user-
data transfer while waiting for the final acknowledgment confirming completion.
Another class of transport protocols, such as Scheduled Transfer (ST) [70] and RDDP
[71], has been designed for OS-bypass implementations. They provide sufficient hooks to
allow for a high-speed, OS-bypass implementation, a feature that is necessary to achieve
true high-speed end-to-end throughput. It does this by having the sender specify a receiver
memory address in the data block header, which causes the receiving NIC to simply write
the received payload using Direct Memory Access (DMA) into the specified memory
location. This results in a low end-host transport layer delay. However, to gain this advan-
tage, ST requires a programmable processor on the NIC, which make it hard to implement.
Following the design rationale in Section 6.1.1 and using some concepts from above
protocols, we developed our own transport solution called Fixed Rate Transport Protocol
(FRTP). We specify the FRTP protocol in Section 6.1.3. The implementation of FRTP pro-
65
tocol is presented in Section 6.1.4 and some preliminary experimental results are shown in
Section 6.1.5 in the context of our local-area testbed network.
6.1.3 FRTP Specification
A. The Model of FRTP Connections
According to design rationale C, “use of dual communication paths”, we design a two-
channel model for FRTP. Each FRTP session has two channels: a data channel and a con-
trol channel. The data channel is used for actual user-data transfer on the end-to-end RES-
CUE circuit from the sender to the receiver. The control channel is used for control-
information (control messages for flow control and error control) transfer on the primary
TCP/IP path between the receiver and the sender.
B. Packet Formats in FRTP
All the user-data and control information are encapsulated into either FRTP DATA
Control process
Data transferprocess
Control process
Data transferprocessData channel over
RESCUE circuits
Control channel overprimary TCP/IP path
The sender The receiver
Figure 20. The Model of FRTP Connections
66
packets or FRTP control packets, as shown in Figure 21. The DATA packet carries the
user-data payload along with a unique 32-bit sequence number for error control. The pay-
load length in DATA packets should fit in to the path MTU size to avoid the IP segmenta-
tion (1472Bytes in our implementation). Taking the simplicity as design principle, we
define minimal numbers of control packets required by error control functionality. The
control packets include ACK and ERR (NAK) packets, which carry the control informa-
tion required by error control. Each control packet includes a 32-bit packet type field (1
for ACK and 2 for ERR), a 32-bit attribute field (the next expected sequence number if it
is an ACK packet, or the total number of lost packets reported if it is an ERR packet), and
an optional variable-length (up to 1440Bytes) attribute value field only for ERR packets to
carry the sequence numbers of lost packets. The ACK packets are cumulative positive
acknowledgments telling the sender that the receiver has received all the DATA packets
prior to the sequence number it carries, while the ERR packets are negative acknowledge-
ments that carry the sequence numbers of lost packets.
Sequencenumber Payload
Packet type
}
32bits
}
32bits
ACK
Attribute Attribute value(only for ERR)}
32bits
DATA packet
Control packet
1 The nextexpected SN
2The number of
lost packetsreported
The SNs of lostpacketsERR
Up to 1440Bytes
Figure 21. Packet Formats in FRTP
67
In addition, one special packet is used to exchange the FRTP parameters between the
sender and the receiver prior to the actual data transfer. We show the format of this packet
in Figure 22. Currently, the packet contains only two parameters, the receiver’s data chan-
nel identifier and the user-specified sending rate. Each parameter is carried in a 16-bit
field. The packet can be readily extended to carry more FRTP parameters in future ver-
sions.
B. Flow Control and Error Control in FRTP
According to design rationale A, FRTP uses a rate-based flow-control scheme. The
receiver notifies the sender with a user-specified sending rate prior to the actual data trans-
fer (using the parameter-exchange packet in Figure 22). The actual sending rate then is set
by the sender, which stays unchanged during the whole data transfer. The constant sending
rate is implemented by setting the inter-packet generation time at the sender. This time is
computed by the sender using the sending rate and DATA packet size.
According to design rationale B, FRTP error control is selective-ARQ scheme, in which
only the lost packets are retransmitted. Both the sender and the receiver detect the packet
losses. The receiver keeps track of the next expected sequence number for loss detection
purpose. Since the sequenced delivery is guaranteed on RESCUE circuits, any mismatch
between the currently received sequence number and the expected sequence number indi-
cates a packet loss. The sequence numbers of lost packets are then sent back to the sender
Data channelidentifier
The sendingrate} }
16bits 16bits
Figure 22. The Parameter-Exchange Packet in FRTP
68
via ERR packets. A time-out scheme is used by the sender to detect packet losses. The
sender keeps the retransmission timer for outstanding DATA packets. When the timer
expires, the sender will assume that expired packets are lost. To reduce the overheads
involved in disk accesses, the sender maintains a retransmission buffer to hold unacknowl-
edged DATA packets. ACK packets are sent periodically by the FRTP receiver to confirm
the delivery of DATA packets and allow the release of the corresponding space in the
retransmission buffer.
6.1.4 An Implementation of FRTP protocol
Our implementation of FRTP protocol is being developed based on SABUL. Overall, it
is implemented as an application-level process using a combination of UDP and TCP.
According to the design rationale, the data channel is an end-to-end dedicated circuit
carrying Ethernet packets. In theory, the file being transferred can be segmented into
FRTP DATA packet and carried directly in Ethernet packets. In practice, to simplify FRTP
implementation, we use a UDP/IP socket. The UDP/IP protocol layers do not add any
functionality in the transfer of FRTP DATA packets on dedicated end-to-end Ethernet/
SONET circuits. However, the use of a UDP socket greatly simplifies programming. It
helps us avoid kernel-level programming need for a direct transfer of FRTP DATA packet
in Ethernet packets. The FRTP data channel identifier is thus a UDP port number.
Following the design rationale described in Part C of Section 6.1.1, a TCP connection is
required via the primary IP path between the sender and the receiver. Therefore, as shown
in Figure 23, a FRTP session starts with a TCP connection establishment between the
sender and the receiver. After the initiation, the FRTP sender opens a TCP listening port
and waits for any incoming connection attempt. A TCP connection is established upon
69
receipt of a request from the FRTP receiver. The next step is FRTP parameter exchange.
The sender and receiver exchange a set of FRTP parameters via the TCP socket, such as
the user-specified sending rate and data channel identifier (UDP data channel’s port num-
ber).
After a successful TCP control channel establishment and FRTP parameter exchange,
the sender starts the actual data-transfer via the UDP socket on the end-to-end circuit*.
During the whole data transfer, the sender is responsible for data transmission and retrans-
missions based on feedback from the receiver. A select/poll mechanism is used by the
*UDP is a connectionless protocol, so there is no connection establishment procedure for UDP data channel.
FRTP receiver
Initiation
Establish TCPcontrol channel
Listening
Establish TCPcontrol channel TCP channel
FRTP sender
FRTP parameterexchange
Copy one block ofdata into
retransmission buffer
TCP channel
Initiation
FRTP parameterexchange
* Check andprocess feedbackfrom the receiver
The loss list isempty?
Pick up a lostpacketEncapsulate a new
DATA packet
Wait one inter-packet time
** Send feedbackto the sender if
necessary
Transmit aDATA packet
ReceiveDATA packet
If an errordetected?
Update the loss list and the nextexpected sequence number
Send ERR packetto the sender
Move one block ofdata out of
resequencing bufferTCP channel
UDP channel
Yes
No
Network-IO thread
Disk-IO thread
Network-IO thread
Disk-IO thread
Retransmission buffer
The loss list
Resequencing buffer
The loss list
No
Yes
Figure 23. Data Sending/receiving Procedure in FRTP
70
sender to handle two threads, a disk-I/O thread and a network-I/O thread, simultaneously.
The disk-I/O thread copies the data from disk (in disk-to-disk transfers) or upper-layer
application buffer (in memory-to-memory transfers) and places the data into the FRTP
sending buffer. This buffer is also used as a retransmission buffer for error control pur-
poses. The disk-I/O thread copies the user data block by block into the retransmission
buffer in a loop manner unless the buffer is full, which could happen due to the excessive
packet losses. In this case, the disk-I/O thread waits for a fixed time interval before trying
again. Meanwhile, a separate network-I/O thread reads the user data in the retransmission
buffer and encapsulates them into FRTP DATA packets. It sends one DATA packet every
inter-packet generation time via the UDP socket on to the end-to-end circuit. The sender
maintains a list to record the sequence numbers of lost DATA packets. Every inter-packet
generation time, the sender checks the loss list first. If it is not empty, the DATA packet
with the minimal sequence number in the loss list is retrieved from the retransmission
buffer and retransmitted. Otherwise, the network-I/O thread encapsulates a new DATA
packet and sends it out to the network. The new transmitted data will be kept in the FRTP
retransmission buffer till the corresponding acknowledgement is received. Every inter-
packet generation time, the network-I/O thread also checks ACK and ERR packets from
the receiver, as illustrated in Figure 24. The DATA packets that are acknowledged by
ACK packets are released from the FRTP retransmission buffer, while the sequence num-
bers of lost packets reported by ERR packets are inserted into the loss list. The sender
itself also detects packet losses by keeping an EXP timer (retransmission timer). Instead of
maintaining timers for each DATA packet, The FRTP sender maintains only one timer for
all new transmitted DATA packets for simplicity. If an ACK or ERR is not received before
71
the EXP timer expires (1s time-out value in our implementation), all outstanding DATA
packets will be inserted into the loss list and retransmitted.
Similarly, at the receiver side, a network-I/O thread receives and decapsulates DATA
packets. The data payloads are written into the FRTP resequencing buffer to be accumu-
lated together into blocks. The data blocks in the resequencing buffer are then copied to
the disk (in disk-to-disk transfers) or the upper-layer application buffer (in memory-to-
memory transfers) by a disk-I/O thread. The receiver also maintains a packet loss list. The
Figure 24. Feedback Checking and Processing at the FRTP Sender
Check TCPcontrol channel
Any incomingpacket?
Release the ACKeddata in
retransmission buffer
Remove the SNs ofACKed packets from
the loss list
Insert the SNs of lostpackets into the loss
list
No
ACK ERR
* Check andprocess feedbacksfrom the receiver
if the EXP timeris expired?
Insert the SNs of alloutstanding packets
into the loss list
No
Yes
Yes
72
sequence number of each lost packet is kept in the loss list until a correct retransmitted
copy is received. ACK packets are sent back to the sender periodically (every 100ms in
our implementation) or immediately when FRTP resequencing buffer is full*. The
sequence number carried in ACK packets is the next expected sequence number, which
equals the largest received sequence number plus 1 or the minimum sequence number in
the loss list if the loss list is not empty. Packet losses are detected by comparing the
received sequence number to the next expected sequence number. If a packet loss is
detected, the sequence numbers of the lost DATA packets are inserted into the loss list, and
*This could happen when the sender sends the data in a burst at a rate higher than the specified sending rate.
Figure 25. Feedback Sending at the FRTP Receiver
if the ACKtimer isexpired?
if the ERRtimer isexpired?
The loss list isempty?
Send ACK(largest received
SN+1)
Send ACK(smallest SN in
the loss list)
Send ERR(all SNs in the
loss list)
** Send feedbackto the sender if
necessary
Yes
Yes
Yes
No
No
No
73
an ERR packet containing the sequence numbers of the lost DATA packets is sent back to
the sender immediately. Since the retransmitted DATA packets could also be lost due to
link errors and receive-buffer overflows, to ensure the reliable delivery of DATA packets,
the whole loss list is sent back to the sender by ERR packets periodically (every 20ms in
our implementation) if it is not empty.
6.1.5 LAN Experiments
As indicated in the name, one of the key features of FRTP is to produce the constant
sending rate and allows us to set the corresponding circuit rate. The constant sending rate
is realized by controlling the inter-packet transmission time with rate-based flow control.
To test the effectiveness of rate-based flow control in FRTP implementation, we conduct
the experiment to measure the performance of FRTP.
Another question about FRTP is how to select an appropriate sending rate (circuit rate).
We realize that a higher sending rate does not always produce a higher throughput due to
the limitation of the end hosts. To select an appropriate rate, we must take account of
many factors, such as the scheduling scheme of operating systems, the hard disk access
rate, UDP buffer size, MTU size, and other FRTP related parameters. Although some of
these factors, such as the scheduling scheme of operating systems and the hard disk access
rate, cannot be controlled, other parameters can be adjusted to achieve better performance.
Hence, we test the impacts of these parameters in experiments.
In our experiments, we connected two Dell Precision 650 workstations via a Dell Pow-
erConnect Gigabit Ethernet switch. Each Dell workstation has a 2.4-GHz Intel XeonTM
CPU connected to a 533-MHz front-side bus (34Gbps CPU bandwidth), an E7505 chipset
with 512MB of DDR 266MHz memory (17Gbps memory bandwidth), an 80GB ATA/100
74
7200 RPM EIDE disk drive with 2MB cache (400Mbps average writing rate measured by
Bonnie [72]), and a 64bit/100MHz PCIx bus for the GbE NIC (6.4Gbps network band-
width). The operating systems on both workstations are RedHat Linux 9 with version
2.4.20-30.9 kernel. Tcpdump is used to better facilitate the analysis of data transfers [73].
Our experiments focus on the performance of bulk data transfers. We ran FRTP applica-
tions on both workstations and transferred a 127MB file between them.
A. Results with Default Settings
We began the experiments with default FRTP settings: 256KB UDP buffer size,
1500Bytes MTU size, 40MB FRTP buffer size, and 8MB block size for disk I/O opera-
tions. The sending rate is increased from 50Mbps to 1Gbps. Figure 26 plots packet-loss
rate and transfer throughput (note that we use the term “throughput” to denote “goodput”
in the whole Section 6.1.5) versus sending rate.
Figure 26. Packet-Loss Rates and Throughputs vs. the Sending Rate in FRTP Experiments(DATA Packet Size=1500B, UDP Buffer Size=256KB, FRTP Buffer Size=40MB,FRTP Data Block Size=8MB)
(a) FRTP throughput (b) FRTP packet-loss rate
75
As the sending rate is increased from a low value, the plots show the throughput increas-
ing as expected with zero or a small packet-loss rate. When the sending rate reaches a cer-
tain level (~200Mbps in Figure 26), the packet-loss rate becomes significant. This happens
because of the limitations of the end-host hardware and software modules that are
involved in moving blocks of the file from disk into the FRTP buffer, UDP buffer, and
finally into the NIC at the sender, and similarly through the NIC, UDP buffer, FRTP
buffer, and disk at the receiver. The major bottleneck in our experiment configuration is
the hard disk access rate, especially the writing rate. Even though the measured average
writing rate of our disks is around 400Mbps, its worse-case writing rate might be much
lower than the average value. For example, the disk driver requires a certain amount of
time to switch between cylinders and heads. During this switching time, the disk read/
write operations have to be suspended. To better reflect the disk's real-time performance,
the term “disk sustained transfer rate” was introduced. This rate is dependent on the disk's
media transfer rate, but includes the overheads required for cylinder switching time and
head switching time. Based on manufacturer’s specifications of our disks and other similar
products, a rough approximation of the sustained transfer rate of our disks is 200Mbps.
This explains the significant packet-loss rate when the sending rate is larger than
200Mbps.
As expected, increasing the sending rate beyond 200Mbps leads to excessive packet
losses because of receiver-buffer overflows, causing the retransmissions to impact overall
throughput. As a result, the throughput slowly reaches an “optimal” value (~370Mbps in
Figure 26) at a 590Mbps sending rate and then decreases. This “optimal” value approxi-
mates the 400 Mbps average disk access rate, the expected bottleneck in the experiment.
76
The small difference is caused by the additional processing overhead incurred in handling
packet losses. The results for the memory-to-memory transfer experiments are also shown
in Figure 26. By removing the effects of disk access, which is the bottleneck in the disk-
to-disk transfer, we can achieve a higher throughput (up to 910Mbps) without incurring
significant packet losses.
To verify the effectiveness of the FRTP rate-based flow control algorithm, we captured
several trace files while running FRTP file transfers. The actual inter-packet transmission
times seen on the link were retrieved and measured from trace files. Figure 27 shows an
example of inter-packet transmission times within a FRTP file transfer at a 50Mbps send-
Figure 27. An Example of Inter-packet Transmission Times within a FRTP File Transfer(Sending Rate=50Mbps, DATA Packet Size=1500B, UDP Buffer Size=256KB,FRTP Buffer Size=40MB, FRTP Data Block Size=8MB)
77
ing rate. The plot shows that the variance of actual inter-packet transmission times is very
small. The standard variance of inter-packet transmission times is only 0.00005, which is
quite acceptable considering the inevitable variability at the sender. Due to the limitation
of the machine running Tcpdump, we did not collect the measurement data at very high
sending rates. But we expect a similar behavior even under very high sending rates.
Another important measurement is CPU utilization of FRTP. As an application-level
implementation, FRTP consumes a large amount of CPU resources; thus its performance
is easily compromised by other concurrent processes. In all experiments presented in this
section, we carefully disabled all other possible user processes while running FRTP appli-
cations. Figure 28 plots CPU utilization versus sending rate in FRTP file transfers.
The plot shows that CPU utilization of the sender is always greater than 60%, some-
Figure 28. CPU Utilization vs. the Sending Rate in FRTP Experiments (DATA PacketSize=1500B, UDP Buffer Size=256KB, FRTP Buffer Size=40MB, FRTP DataBlock Size=8MB)
78
times reaching close to 90%. On the receiver side, CPU utilization is relatively lower, but
also greater than 60% when the sending rate is 500Mbps or higher. Requiring such a high
CPU utilization is a major drawback of application-level implementations. They require a
lot of CPU cycles, thus leaving little time for the application to do any computation
(though this is a non-issue for bulk-data transfer) and not allowing other concurrent CUP-
intensive applications. For example, when we start a Matlab process while running FRTP
with a 500Mbps sending rate, the FRTP throughput immediately dropped from 380Mbps
to 80Mbps.
B. Impact of UDP Buffer Size
UDP buffer size has a large impact on FRTP’s performance. It is well known that TCP
throughput is improved by properly selecting TCP buffer size. Similarly, better perfor-
mance can be achieved by increasing UDP buffer size in FRTP. At the FRTP sender, a
small UDP sending buffer increases the number of memory copies from the FRTP buffer
to the UDP buffer. This causes a serious degradation of FRTP performance in environment
where the CPU resource and/or the bus speed is bottleneck. At the receiver, a small UDP
receiving buffer also incurs unnecessary CPU overhead and data movement delays. Fur-
thermore, a small UDP receiving buffer increases the possibility of receive-buffer over-
flows due to variations in the data movement rate from the UDP buffer to the FRTP
buffers. When the operating system is temporarily unable to schedule system resources to
move the received data out of the UDP buffer, a small UDP receiving buffer can overflow,
especially in high-speed data transfers.
In this part of experiments, we fixed the sending rate at 500Mbps and observed FRTP
performance under different UDP buffer sizes. UDP buffer size was changed from 64KB
79
to 4MB by calling system function setsockopt(). All other parameters in FRTP used
default values. We plot packet loss and transfer throughput versus UDP buffer size in Fig-
ure 29.
As the UDP buffer size is increased, Figure 29 shows that FRTP throughput increases
while the packet-loss rate decreases as expected. For example, with a 2MB UDP buffer,
the average throughput at a 500Mbps sending rate increases to 386Mbps (a 20.6%
improvement from 320Mbps with the 64KB UDP buffer) and the loss rate drops from
16.96% to 0.03%. By removing the UDP buffer size limitation, the throughput gradually
approaches the theoretical optimal value, i.e. the average disk writing rate. The highest
throughput value seen in the experiment is 400.03Mbps, which matches the 400Mbps
average disk writing rate measured by Bonnie very well. Compare these results with those
of our experiment with default setting, in which we could only achieve a maximal
370Mbps at a 590Mbps sending rate with a 19% packet-loss rate. We conclude that a
larger UDP buffer size does help improve FRTP performance.
Figure 29. Packet-Loss Rates and Throughputs vs. UDP Buffer Size in FRTP Experiments(DATA Packet Size=1500B, Sending Rate=500Mbps, FRTP Buffer Size=40MB,FRTP Data Block Size=8MB)
(a) FRTP throughput (b) FRTP packet-loss rate
80
However, increasing the UDP buffer size does not always bring us benefits. There is no
obvious improvement seen when we further increase the UDP buffer size beyond 2MB.
This is understandable because with a large UDP buffer, the only bottleneck is the disk
writing rate, and thus, increasing the UDP buffer size will not help improve throughput.
UDP buffer size should be tailored for each transfer. In our particular case, a UDP buffer
size slightly higher than 2MB produced “optimal” results.
C. Impact of FRTP Buffer Size
In this set of experiments, the sending rate was fixed at 500Mbps. All parameters were
set to default values except the FRTP buffer size. We changed the FRTP buffer size from
9MB to 40MB and observed FRTP performance under different FRTP buffer sizes. Figure
30 plots packet-loss rate and transfer throughput versus FRTP buffer size.
As FRTP buffer size increases from 9MB to 40MB, FRTP throughput increases from
305Mbps to 342Mbps, a 12.1% improvement. In our particular experiment, the “optimal”
FRTP buffer size is around 16MB although a slightly higher throughput value can be seen
Figure 30. Packet-Loss Rates and Throughputs vs. FRTP buffer size in FRTP Experiments(DATA Packet Size=1500B, UDP Buffer Size=256KB, Sending Rate=500Mbps,FRTP Data Block Size=8MB)
(a) FRTP throughput (b) FRTP packet-loss rate
81
with larger FRTP buffer sizes. Increasing the FRTP buffer size beyond 16MB brings little
benefit because of the dramatic increase in packet-loss rate. This is reasonable because a
too large FRTP buffer will consume a large amount of system resources for memory man-
agement. Again, the “optimal” value of FRTP buffer size depends on the particular hard-
ware and software configurations, and should be tailored for each transfer.
E. Impact of FRTP DATA Packet Size
The last parameter that we tested in our experiments is the DATA packet size. The
throughput improves with larger packets. This is because the time needed for packet
encapsulation and decapsulation is smaller. To quantify the impact of packet size, we
repeated the previous experiments with a similar configuration. However, since the Dell
PowerConnect Ethernet switch that we used in previous experiments does not support
larger MTUs, we connected the two Dell Precision 650 (P650) workstations via a direct
1Gbps Ethernet link. We fixed the sending rate at 500Mbps and observed FRTP perfor-
Figure 31. Packet Losses and Throughputs vs. DATA Packet Size in FRTP Experiments(MTU=1500B, UDP Buffer Size=256KB, FRTP Buffer Size=40MB, SendingRate=500Mbps)
(a) FRTP throughput (b) FRTP packet-loss rate
82
mance under different packet sizes. We increased the DATA packet size from the default
value of 1472B to 14972B. All other parameters in FRTP were set to default values. To
avoid IP fragmentation, we changed the system MTU size correspondingly. We plot
packet loss and transfer throughput versus DATA packet size in Figure 31.
As the FRTP DATA packet size is increased from 1500B to 6500B, FRTP throughput
increases from 342Mbps to 381Mbps, while the packet-loss rate drops from 14.8% to
8.68%. But the improvement is not apparent when we increased the packet size further.
This is because the possibility of receive-buffer overflows increases with packet size,
which then offsets a part of the benefits brought in by the reduction of encapsulation/
decapsulation overhead.
To avoid IP fragmentation, we set the path MTU size to be the same as FRTP DATA
packet size. However, we expect that the benefits gained with larger FRTP DATA packet
sizes will be offset by IP fragmentation overhead if there are switches on the end-to-end
paths that do not support larger MTUs. For this reason, we suggest a cautious use of large
packet sizes in FRTP, especially on paths where the support for large MTUs is unknown.
6.1.6 Summary of FRTP implementation and experiments
In this section, we presented the design and the implementation of a transport protocol
for dedicated end-to-end circuit call FRTP. FRTP consists of a rate-based flow control and
a selective-ARQ error control. We implemented FRTP as an application-level process and
conduct a series of experiments.
The experimental results showed that FRTP successfully produces a constant sending
rate during the data transfer. The inter-packet transmission times seen on the wire are quite
accurate and constant at different sending rate. This indicates that the rate-based flow con-
83
trol in FRTP is very effective and the most important objective of our transport protocol
work, a fixed sending rate, is successfully achieved by FRTP.
The experimental results also show that FRTP is able to achieve very high throughput.
In our disk-to-disk experiments, FRTP successfully achieved the theoretical maximum
throughputs, a 400Mbps disk access rate in our experimental configuration. In memory-
to-memory experiments, by removing the disk access bottleneck, FRTP could achieve a
throughput up to 910Mbps without too much packet losses.
We also notice from the experiments that several configurable parameters, such as UDP
buffer size, FRTP buffer size, and packet size, have great impacts on the performance of
FRTP. To obtain the optimal results, these parameters have to carefully tuning. In most
cases, a better performance can be seen with the larger buffer size and packet size.
However, due to the variability of the receiving capability, the receive-buffer overflows
and corresponding packet losses can not be completed eliminated. This will affect the per-
formance of FRTP and make FRTP throughput always lower than the sending rate. The
higher sending rate is, the higher FRTP throughput is, however, the lower bandwidth utili-
zation is (bandwidth utilization is defined as throughput divided by sending rate). There-
fore, the appropriate circuit rate should be chosen by counting the trade-off between
throughput and circuit utilization.
One disadvantage of FRTP is its high CPU utilization. One should avoid using FRTP
with other CPU-intensive processes concurrently. The variability of the receiving capabil-
ity is so severe due to the resource contention between concurrent processes that the
throughput and circuit utilization would drop to an unacceptable level. This is the biggest
problem of running application-level implementations on general-purpose end hosts. On
84
the contrary, transport protocols designed for OS-bypass implementations, such as ST, use
very little CPU resources, which makes them more attractive in environments where other
CPU-intensive processes are running. We will explore those OS-bypass transport proto-
cols in future work.
6.2 Routing Decision Module Design
Given the RESCUE service is configured as an “add-on” service to primary Internet
access, for communication between two entities that can be connected by a direct Ether-
net/SONET circuit, there is a choice of two paths: the primary TCP/IP path and an Ether-
net/SONET circuit. The routing decision module determines whether or not to attempt a
circuit setup based on network parameters required by the routing decision algorithm.
The analysis in Sections 4.2 and 5.2 shows that ideally the routing decision software at
end hosts should use dynamically obtained values of RTT, call-blocking probability on
the circuit-switched path, packet-loss rate on the TCP/IP path, bottleneck link rate
on the TCP/IP path, on the circuit-switched path, and other such measures. These
parameters can be estimated by using some measurement tools. For example, Pchar [74] is
a tool to characterize the bandwidth and packet-loss rate along an end-to-end path through
the Internet, Pathrate [75] is an estimation tool to estimate the bottleneck link rate on an
end-to-end TCP path, and tomography experiments [76] have shown that can be
estimated by end hosts. However, such a dynamic algorithm integrated with measurement
tools can be complex.
Since the benefit of using RESCUE is not significant under some circumstances (e.g.,
for small file transfers), a simpler alternative is to use values for these parameters under
nominal operating conditions of the two networks and program static values for the deci-
Pb
Ploss r
rc
Ploss
85
sion points. As an example, say we determine using such tomography experiments that
is 0.01, the service provider wants to be 0.3 (to achieve a given utilization), and
is determined to be 5 Erlangs (dependent upon the number of end hosts connected to
enterprise MSPPs at each enterprise and the file-transfer generation rate per host). For
these numerical values, the static crossover file size should be set to 2.7MB for long-dis-
tance transfers and 650KB for local calls in end-host application software. The former
comes from a utilization consideration and the latter from a delay consideration (see Table
7). As to whether a call is a local or long-distance call can be determined from the RTT
measurement taken during TCP connection establishment (as stated in Section 5.1, all
transfers require a TCP connection for short message exchanges).
An implementation of the routing decision module will consist of the routing decision
algorithm described in Chapter 5, and a database maintaining all the network parameters
required by the routing decision algorithm. The database can be dynamic, in which case
network parameters are measured periodically to reflect the latest network state, or static,
in which case parameters under nominal operating conditions of the two networks are
stored in the database.
We present a design for the routing decision module in Figure 32. There are three com-
ponents: (i) database, (ii) pre-computation module, and (iii) run-time module. The
database has a structure similar to that of the forwarding database in an IP router. Each
entry in the database corresponds to a destination IP address or a group of destination IP
addresses. The columns correspond to network parameters along the path between the
local host and the destination host. The pre-computation module executes measurement
tools to obtain network parameter values (though some may be programmed in at the start
Ploss Pb
ρ
86
and left unchanged, such as ). At update, the pre-computation module populates the
database entries and computes the crossover file size for each entry. Upon receiving a
query from the user application, the run-time module consults the database to retrieve the
crossover file size corresponding to the requested destination IP address. The crossover
file size will be compared with the requested transfer size . The routing decision is made
as a result of this comparison as shown in Figure 32.
6.3 Signaling Module Design
To support RESCUE service, the end host must be equipped with a signaling module,
which is able to send/receive signaling messages to/from signaling-capable network
switches and process signaling messages according to the signaling standards. The signal-
ing protocol specifications include IETF’s GMPLS, OIF’s UNI, and ITU-T’s ASON.
These specifications are designed as a common control plane (signaling and routing) for
many connection-oriented networks. GMPLS and ASON support both RSVP-TE and CR-
LDP as signaling protocols, while UNI only supports RSVP-TE signaling. Since RSVP-
Pb
fc
f
Dest IP Ploss Pb Tprop r rc
192.168.0.2 0.01 10% 30ms 100Mbps 100Mbps
... ... ... ... ... ...
192.168.0.8 0.001 10% 30ms 10Mbps 100Mbps
... ... ... ... ... ...
Table lookup
QUERY(f, dest)
File sizecomparison
Attempt circuit setupif f > fc
Use TCP/IP pathif f < fc
Crossoverfile size
...
...
27MB
600KB
Database
Run-timemodule
Pre-computationmodule
Figure 32. Static Routing Decision Module
87
TE is the protocol implementation by many network switch venders, e.g., Sycamore and
Ciena, we choose RSVP-TE for our end-host signaling module.
6.4 Local-area Testbed Network
6.4.1 Local-area Testbed Network
The goal of our experiments is to demonstrate the end-host RESCUE software and the
file-transfer application on the RESCUE circuit. The experiments will be performed in a
local-area environment within UVA. The testbed network configuration is shown in Fig-
ure 33.
End hosts in the configuration are high-performance workstations (DELL Precision
650). Each workstation has a 2.4-GHz Intel XeonTM CPU connected to a 533-MHz front-
side bus, an E7505 chipset with 512MB of DDR 266MHz memory, an 80GB ATA/100
7200 RPM EIDE disk drive with 2MB cache, and a 64bit/100MHz PCIx bus for periph-
eral devices. The operating systems on workstations are RedHat Linux 9 with version
2.4.20-30.9 kernel to allow the flexible re-configuration. Each workstation is equipped
Sycamoreswitch
Sycamoreswitch
CiscoMSPP
CiscoMSPP
Ethernet switch
Application
Signaling
TCP NIC I
NIC II
Dellworkstation 1
RESCUE software
Routingdecision
Application
Signaling
TCPNIC I
NIC II
RESCUE software
Routingdecision
TL1messages
TL1messages
RSVP_TEmessages
Dellworkstation 2
FRTP FRTP
RSVP_TEmessages
RSVP_TEmessages
Dellworkstation
3
Figure 33. Local-area Testbed Network Configurations
88
with two NICs. One is an Intel 82545EM Gigabit Ethernet card and the other is an Intel
82544EI Gigabit Ethernet card. Both Ethernet cards are 64-bit PCIx copper interface
cards, which are capable of transmitting and receiving data at Gbps. The purpose of work-
station 3 is to demonstrate sharing, i.e., after the communication session ends between
workstation 1 and workstation 2, workstation 3 can set up a call to workstation 2 reusing
the emulated wide-area RESCUE link.
The packet-switched Internet is emulated by a Dell PowerConnect 16-port Gigabit
Ethernet switch. It connects to the end hosts through their primary NICs, the control cards
on MSPPs, and the control cards on the circuit-switched crossconnects. All the control
messages, including signaling messages for RESCUE circuit setup and FRTP-related mes-
sages, are routed through this Ethernet switch.
Each Dell workstation has a NIC connecting to the Ethernet card on the MSPP, which is
Cisco ONS 15454 as shown in Figure 33. The latter in turn are connected to the optical
circuit-switched network, which is emulated by one or more Sycamore SN 16000 switches
equipped with GMPLS/UNI/NNI signaling engine.
To test dynamic RESCUE circuit setup, we are implementing a signaling module, a sub-
set of RSVP-TE, at the end hosts. This module will generate signaling messages according
to the GMPLS RSVP-TE signaling standard. Signaling messages are carried within IP dat-
agrams and routed to the control cards of the signaling-capable switches. However, the
current version of Cisco ONS 15454 MSPP control software only implements UNI client-
side (UNI-C). The purpose of UNI-C is to generate circuit setup requests. It does not allow
for the provisioning of connections through the MSPP via UNI/GMPLS signaling. How-
ever, it does offer a Transaction Language 1 (TL1) [77] interface for circuit provisioning.
89
Therefore, an end-to-end RESCUE circuit is established as two segments: the crosscon-
nection between the Ethernet port connecting the secondary NIC on workstations and the
SONET port on the Cisco MSPP (dashed line in Figure 33), and the wide-area optical cir-
cuit cross the Sycamore circuit switches (solid line in Figure 33). The crossconnection
within the MSPP is set up/released by issuing TL1 messages to the control card on the
MSPP. The optical circuit cross the circuit-switched network is then established using
RSVP-TE signaling.
When a user application at workstation 1 requires a communication path to workstation
2, it first sends a query to the routing decision module of the RESCUE software. The rout-
ing decision module sends back an acknowledgement to notify the user application
whether a RESCUE circuit should be set up or not. If the acknowledgement is positive, the
end-host signaling module will send TL1 messages to the MSPP’s control card to set up a
crossconnection between the Ethernet interface card connected to the workstation 1’s sec-
ondary NIC and the SONET interface card (an OC3 SONET card in our experiment). At
the remote side (PC 2), a similar TL1 session should be triggered to set up the crosscon-
nection upon receiving the notification from PC 1. Meanwhile, the end-host signaling
module on workstation 1 sends signaling messages (RSVP-TE messages) to the Sycamore
switch to trigger a GMPLS circuit setup. If both the MSPP crossconnection setup and
GMPLS circuit setup are successful, the end-host RESCUE software on workstation 1 and
2 will direct the user application to the secondary NICs and start the actual data transfer on
the end-to-end RESCUE circuit cross the enterprise MSPP and optical circuit. The high-
speed transport protocol, FRTP, will be used for data transfers to achieve high end-to-end
transfer throughputs. On the other hand, if either the routing decision module replies with
90
a negative acknowledgement or the circuit setup fails, the primary NIC and packet-
switched path (the Ethernet switch in Figure 33) will be used.
6.4.2 Extension with VLAN Technique
One extension of the local-area RESCUE testbed network is to not only allow end hosts
to connect their secondary NICs directly into MSPP ports, but also allow Ethernet
switches serving small subnets to be connected to the enterprise MSPP ports for RESCUE
service as shown in Figure 34. In Figure 34, the secondary NICs on end hosts are con-
nected to the enterprise MSPP Ethernet ports through an Ethernet switch with advanced
Virtual LAN (VLAN) function. VLAN is a technique allowing networks to be segmented
logically without having to be physically rewired [78]. By bundling two ports on the
switch into a logical (virtual) subnet, we can effectively establish a dedicated circuit
between two ports. The Extreme Summit4 switch [79] is an Ethernet switch that supports
this VLAN capability. The Extreme switch provides a Command-Line Interface (CLI) to
allow a network administrator to manage its VLAN configurations. For example, PC 1 in
Figure 34 could initiate a CLI command to set up a VLAN (VLAN 1 in Figure 34) associ-
Figure 34. RESCUE Circuit Extension with VLAN Technique
Ethernetswitch
……
1
2
n
PC 1
PC 2
……
PC n
MSPP
CLI commands
Ethernet switchwith VLAN
3
VLAN # Associated ports
11
1 & 32 & 3
91
ating port 1 and port 3. By isolating Ethernet ports of VLAN 1 from other ports on the
switch, we effectively establish a dedicated data path between port 1 and port 3, or effec-
tively a direct circuit between the PC 1 and the enterprise MSPP. The path to the MSPP
through port 3 can be shared among enterprise end hosts by updating the port assignment
of VLAN 1. For example, after PC 1’s communication session ends, we could replace port
1 in VLAN 1 with port 2, and allow PC 2 to set up a dedicated data path between port 2
and port 3.
By introducing VLAN switches into RESCUE network configurations, the end hosts are
no longer required to be connected to the enterprise MSPP Ethernet ports directly. This
allows for the sharing of limited Ethernet port resources on MSPPs among a large number
of end hosts. The VLAN setup procedure can be combined with the MSPP crossconnec-
tion setup procedure described in the previous section when establishing an end-to-end
RESCUE circuit.
92
Chapter 7 Conclusions and Future Research
In the following sections, the contribution of this research is summarized and future
research are introduced. This research resulted in eight publications, which are listed at the
end of this chapter.
7.1 Summary and Conclusions
In this dissertation, we proposed extending the services of optical networks to end hosts.
This is feasible today given the deployment of fiber to enterprises, MSPPs in enterprises,
and EoS technologies within these MSPPs. Our proposed service called Reconfigurable
Ethernet/SONET Circuits to End Users (RESCUE) offers a means for setting up and
releasing on-demand circuits consisting of Ethernet LAN segments and Ethernet-over-
SONET metro- and/or wide-area segments. RESCUE is proposed as an add-on service to
the currently available Internet access. This allows end host applications to attempt Ether-
net/SONET circuit setup, and if the attempt fails due to a lack of resources, the applica-
tions can fall back to the basic Internet service.
RESCUE service provides an effective way to overcome the three gaps identified in
Section 1.3. First, the dial-up Internet access service using RESCUE circuit enables end
hosts to bypass an enterprise’s heavily shared leased access links and therefore enjoy
much lower packet-loss rates on end-to-end communication paths. Second, end-to-end
QoS guarantees, which are hard to implement in the existing Internet, can be provided by
end-to-end RESCUE circuits. Different QoS requirements can be met by simply setting
93
different circuit rates. Third, by using a new transport protocol on end-to-end RESCUE
circuits, TCP limitations in HDBP environments can be overcome. Significant improve-
ments can be achieved on data-transfer throughput.
RESCUE circuits are shared on a call-by-call basis, which makes it easy to implement a
“pay more, get more” service. To use RESCUE, end hosts need an additional NIC and a
software upgrade. We carried out a detailed analysis of how the dial-up service and file
transfers can take advantage of RESCUE service in Chapter 4 and Chapter 5 respectively.
The analysis results showed that the end host will enjoy a much shorter file-transfer delay
on RESCUE circuits than on the TCP/IP path if the circuit setup is successful.
The RESCUE concept brings in a new idea of leveraging the Internet in developing the
circuit-switched service. We realized that a pure circuit-switched network service is hard
to deploy in a standalone mode for the following reasons: (i) not all types of applications
are suited for RESCUE circuits, such as small-file transfers and Variable Bit-Rate (VBR)
applications, (ii) without the Internet as a fallback option, the circuit-switched service will
have to be operated in a low call-blocking probability. To achieve a low call-blocking
probability, network utilization will have to be sacrificed especially during the service
growth period when traffic load is low, and (iii) without the Internet path for reverse-direc-
tion control message transport, the circuit-switched network will need to support both
low-rate and high-rate circuits, making the switches more expensive.
RESCUE solution calls for a revolutionary combined usage of two types of networks, a
circuit-switched network and a packet-switched network. RESCUE proposes a “parallel-
hybrid” network architecture in contrast to today’s “sequential-hybrid” network architec-
ture. In this “parallel-hybrid” network, the primary connectionless packet-switched Inter-
94
net is not only used as a backup path for those applications failing to obtain a RESCUE
circuit, it also can be used to carry control messages for the data transfers on end-to-end
RESCUE circuits. These are two key features that make our network architecture feasible
to introduce and grow while constantly achieving high utilization. To our knowledge, this
“parallel-hybrid” solution has not been proposed elsewhere.
The long-term objective of our research work is to create a large-scale circuit-switched
network providing commodity services. Scalability and network utilization are two
widely-used network design criteria. In RESCUE service, the network scalability problem
is addressed by using dynamic, distributed end-to-end circuit provisioning with signaling
protocols. In contrast to the centralized approach, signaling protocols enable distributed
provisioning, and therefore allow the network to grow to any size. The network utilization
problem is addressed by creating commodity applications, which will help increase traffic
loads, and by using the Internet as a back-up path. By allowing both small data transfers
and large data transfers, we envision the creation of high traffic load and corresponding
higher network utilization, which translates to low costs seen to users. Per-circuit utiliza-
tion is also considered by using superfast provisioning and rate-based flow control in
RESCUE service. Superfast provisioning is possible with the distributed signaling
approach, which does not entail the human and/or central management intervention. Hard-
ware-accelerated signaling can be used to further speed up the signaling processing capa-
bility of network switches. A transport protocol with a rate-based flow control should be
used on end-to-end RESCUE circuit to achieve 100% bandwidth utilization.
In the implementation chapter, we discussed the key features needed in a transport pro-
tocol that works in conjunction with end-to-end file transfer applications using RESCUE
95
service. We called this protocol Fixed Rate Transport Protocol (FRTP), one that uses a
rate-based flow control scheme and a selective-ARQ based error control scheme. An
application-level implementation of FRTP based on UDP sockets was then presented
along with experimental results. Our work extended previous work on transport protocols
significantly. Different from TCP and other transport protocols designed for IP-based net-
work, FRTP is designed for data transfers on dedicated end-to-end circuits. The goal is to
generate a constant sending rate to match the circuit rate, and therefore achieve high cir-
cuit utilization.
For the experimental results, we concluded that the rate-based flow control in FRTP
implementation is effective when there is no other process running at end hosts. FRTP
successfully produces a constant and accurate sending rate during the data transfer. The
experimental results also showed that FRTP is able to achieve a very high throughput, lim-
ited only by the end-host configurations. A better performance can be achieved by care-
fully tuning several parameters, such as UDP buffer size, FRTP buffer size, and packet
size.
However, we also noticed that the performance of FRTP downgrades when there are
other concurrent CPU-intensive processes running on the end hosts. This is because FRTP,
as an application-level implementation, needs a lot of CPU cycles when the sending rate is
high, and therefore leaves little CPU time for other processes. We realized that this situa-
tion is hard to avoid given the current-day general-purpose end hosts and non-realtime
operating systems. This problem not only adds a constraint for the usage of FRTP, but also
reminds us of a shortcoming of our circuit-switched solution, which is as follows: a cir-
cuit-switched network is not adaptive to changes in data-transfer rates caused by the vari-
96
ability of data processing at end hosts. In a packet-switched network, this is not a problem
because the bandwidth gap left by a reduction of the sending rate in one data flow can be
filled by other data flows. However, in a circuit-switched network, since the circuit band-
width allocation remains unchanged throughout the transfer, any reduction in FRTP
throughput would result in poor circuit utilization. We will explore solutions to this prob-
lem in our future work.
7.2 Future Research
7.2.1 Extension to Multi-protocol Interworking
Ideally, if all signaling-capable switches along the end-to-end path support the same sig-
naling specification, the circuit provisioning procedure will be quite simple and standard.
However, such a standard signaling-driven circuit-switched network does not exist end-to-
end today. First, Ethernet switches and IP routers dominate today’s local-area networks.
They are generally connectionless switches. Second, in the circuit switch equipment
industry, different vendors support different signaling specifications based on their own
considerations. Usually one vendor’s switch is not compatible with a switch from a differ-
ent vendor unless they have protocol conversion capabilities. Although vendors are mak-
ing efforts to allow for the interconnection of their networking products, a full signaling-
interoperable network is still under development.
Differentiated by signaling capabilities, the whole network can be divided into multiple
autonomous sets, as illustrated in Figure 35. At first glance it appears that a dedicated end-
to-end circuit can only be established between two hosts connected by a single circuit-
switched network, e.g. host 1 and host 3 in Figure 35 connected by a GMPLS network. In
this case, a client-side GMPLS signaling implementation at end hosts is sufficient. How-
97
ever, for communications between two hosts connected by different types of networks, a
simple signaling solution will not work. For example, for communications between end
host 1 and end host 2 in Figure 35, GMPLS signaling alone is not sufficient because all
other networks on the end-to-end path do not support GMPLS.
To solve this problem, we propose an external “signaling agent”, which works like a
coordinator and a translator between different types of networks. The signaling agent is
able to talk to different networks because it will consist of multiple signaling modules.
When a circuit setup involves an end-to-end path across different networks, the signaling
agent sends signaling messages to each of autonomous network to initiate the intra-area
circuit setup. It also coordinates the neighboring networks to set up the inter-area circuits
between edge nodes.
With the signaling agent, the remaining question is how to set up dedicated data paths
within each autonomous packet-switched network. In Section 6.4.2, we already noted that
a dedicated data path can be set up in Ethernet-based LANs by using the VLAN technol-
ogy. Multi-Protocol Label Switching (MPLS) has been implemented in some IP routers.
Figure 35. A Representation of Networks Differentiated by Signaling Capabilities
Set 2
Circuit-switched network 2(UNI)
Set 1
Circuit-switched network 1(GMPLS)
Set 4
Packet-switched network 2(VLAN)
Signaling module
End host 1
Signaling module
End host 2
Signaling agent
...... ......Signaling module
End host 3
GMPLS module UNI
module
Set 3
Packet-switched network 1(MPLS)
......
MPLS module
VLAN module
98
For example, IP routers in the Abilene backbone network of Internet2 [80] have MPLS
implementation. MPLS is a connection-oriented packet-switched technology [81]. It sup-
ports traffic-engineered Label-Switched Path (LSP) setup and provides service-level guar-
antees for LSPs. By associating a strict bandwidth guarantee with the LSP, a dedicated
data path can effectively be set up across a network of IP routers/MPLS switches.
With the help of all the above technologies, it is possible to set up an end-to-end RES-
CUE circuit by sequentially concatenating multiple network segments along the end-to-
end path, such as SONET circuits, MPLS LSPs, and VLAN Ethernet paths.
7.2.2 Wide-area Testbed Network
After completing local-area experiments, we plan to run the same experiments on a wide
area. Based on costs, we plan to use either point-to-point links between UVA, CUNY and
ORNL, or a star configuration as illustrated in Figure 36. The reason we have two MSPPs
per campus is that typically external circuits arrive at one or two buildings on campus
(labeled “Telcom Building” in Figure 36), while researchers have laboratories in other
buildings (labeled “Research Building” in Figure 36). For example, at UVA, we found that
EndHosts
MSPP
Figure 36. Configuration of Wide-area Testbed Network
MSPP
Research
Telcom Building
Building
UVA campus
EndHosts
MSPP
MSPP
Research
Telcom Building
Building
ORNL campus
EndHosts
MSPP
MSPP
Research
Telcom Building
Building
CUNY campus
Point-to-point option(similar circuits canbe setup between UVA andORNL or CUNY and ORNL)
Star optionCollocation
facility
OC48 across campus OC48 across campus OC48 across campus
99
there is sufficient fiber installation but to connect our laboratory to UVA’s main telecom-
munication’s building will cost about $2,500 a year (on-campus fiber leasing service
[82]). Since GbE signals travel limited distances even with single-mode fiber (of the order
of 3 miles), we propose using an MSPP within the Research Building to carry the Ethernet
frames long-distance using SONET. In the star configuration, collocation service can be
obtained to place a SONET crossconnect at a central location. For example, Switch and
Data company [83] provides a collocation facility in Reston, VA, to which we could lease
OC48 links from each of the three campuses shown in Figure 36 to test the network con-
figuration.
7.2.3 Call Scheduling in RESCUE
In RESCUE file-transfer applications, resources of optical link are shared in a call-
blocking mode, in which the link capacity is typically subdivided between application
streams that share the link. Once the bandwidth allocation is made at the start of the trans-
fer, it remains unchanged throughout the transfer. We refer to this mode of usage of band-
width as a “fixed-bandwidth circuit switching.” In contrast, packet-switched networks
share the capacity resources packet by packet, in which all files are divided into packets,
and packets are sent one after another using the full capacity of the link. If there are
streams sending packets, then effectively each stream receives the same share of the link
capacity as in circuit-switched networks. However, when one or more traffic streams com-
plete their file transfers, the remaining transfers can take advantage of the bandwidth made
available by the completed transfers. The consequence of such bandwidth partitioning is
that transfers experience larger average delays with fixed-bandwidth circuit switching
than with packet switching.
100
Noticing that the fundamental source of poor average delay performance of fixed-band-
width circuit switching for file transfers is their inability to take advantage of bandwidth
that becomes available subsequent to the start of a transfer, we propose a scheme in which
the capacity allocated for a file transfer varies from time range to time range. We call this
scheme Varying-Bandwidth List Scheduling (VBLS) [84]. This is unlike the fixed-
bandwidth allocation mode where a fixed assignment of bandwidth is made for the entire
duration of the transfer. With information on the size of the file that an end host wants to
transfer, the network can fit this file into time ranges when bandwidth is available based
on the capacity allocations of ongoing transfers. This allows the network to offer an
incoming file transfer an increased amount of bandwidth for future time ranges whenever
there are fewer competing transfers. We will explore more details of VBLS in future work.
7.2.4 Router Disconnect
We realized that there is a practical problem in deploying RESCUE. While RESCUE
solution provides end hosts a choice of a second path, with its distinctly different service
quality when compared to the packet-by-packet shared IP path, has its obvious advan-
tages, the costs lie in the additional infrastructure needed to support such a deployment.
Access link costs are especially of concern. It is expensive for an enterprise to lease a sec-
ond access link for RESCUE service, one that terminates on a signaling-capable SONET/
SDH/WDM switch. We are currently exploring a solution to this problem in which such
an additional access link is not necessary. The solution is to use the enterprise-router-to-
ISP-router access link to carry Internet traffic in default mode and then dynamically reas-
sign its capacity (or a part of its capacity) as needed for RESCUE circuits.
We show an experimental setup of two Cisco 12008 routers [85] along with Cisco 15454
101
MSPPs in Figure 37. The MSPP OC3 interface cards have four ports as do the 12008
router OC3 cards. We can connect these ports as shown in Figure 37. In default mode, two
leased circuits are setup between the two routers and the forwarding data tables at these
routers are setup to use both these circuits (by setting some of the “east” host addresses to
be reached via one port at the west side router with others using the second port and vice
versa at the east side router). When a RESCUE circuit setup request is generated, we will
have the RESCUE software first send a message to these routers to disable one of these
OC3 circuits and then set up the end-to-end Ethernet-EoS-Ethernet circuit illustrated by
the dashed line in Figure 37. Normally disabling an interface on the router triggers OSPF
routing protocol to update the router’s routing table and leads to packet losses during
updating period. Cisco 12008 routers, however, support link bundling, a technique group-
ing multiple links together into one logical link to provide higher bandwidth, redundancy,
and load sharing between links [86]. For example, in Figure 37, two OC3 enterprise leased
circuits can be bundled into one logical link. When one of these OC3 circuits is disabled
for RESCUE service, the router automatically routes the Internet traffic to the remaining
circuit. No packet would be lost. The removed OC3 circuit can be restored for default
Internet traffic after the RESCUE circuit is released. Questions of how quickly these
Control XC OC3
MSPP I
Gb/s GRP Sw. OC3
12008 router
Hostsfabric
12008 router
Control XCOC3
MSPP II
GRP Sw.OC3fabric
Hosts
Figure 37. The Concept of Router Disconnect
Ethernet
10/100Ethernet
10/100Ethernet
Gb/sEthernet(west)
(east)
SONET network(emulated by fibers)
102
updates happen and whether such a dynamic bandwidth change of router-to-router circuits
causes problems in TCP flows will be answered with the experiment.
7.3 Publications
As a result of our research, the following papers are presented and submitted to interna-
tional conferences, journals, and magazines:
• M. Veeraraghavan and X. Zheng, “A Reconfigurable Ethernet/SONET Circuit Based
Metro Network Architecture,” IEEE JSAC on Advances in Metropolitan Optical
Networks (Architectures and Control), 2004.
• M. Veeraraghavan, X. Zheng, W. Feng, Hojun Lee, E. Chong, and H. Li, “Scheduling
and transport for file transfers on high-speed optical circuits,” Journal of Grid
Computing on High Performance Networking, 2004.
• X. Zheng, M. Veeraraghavan, and H. Lee, “Using Dial-Up Optical Circuits to Address
the Access Link Bottleneck Problem,” Under revision based on reviews from Infocom
2004.
• Best student paper award, M. Veeraraghavan, X. Zheng, H. Lee, M. Gardner, and W.
Feng, “CHEETAH: Circuit-switched High-speed End-to-End Transport
ArcHitecture,” Proceeding of Opticomm 2003, Dallas, TX, Oct. 13-16, 2003.
• M. Veeraraghavan, D. Logothetis, and X. Zheng, “Using dynamic optical networking
for high-speed access,” Optical Networks Magazine, special issue on “Dynamic
Optical Networking around the Corner or Light Years Away?”, vol. 4, no. 5, pp. 30-40,
Sep. 2003.
• M. Veeraraghavan, X. Zheng, W. Feng, H. Lee, E. Chong, and H. Li, “Scheduling and
Transport for File Transfers on High-speed Optical Circuits,” PFLDnet 2003, Chicago,
103
Feb. 16-17, 2004.
• M. Veeraraghavan, H. Lee, and X. Zheng, “File transfers across optical circuit-switched
networks,” PFLDnet 2003, Geneva, Switzerland, Feb. 3-4, 2003.
• T. Moors, M. Veeraraghavan, Z. Tao, X. Zheng, and R. Badri, “Experiences in
automating the testing of SS7 Signaling Transfer Points,” International Symposium on
Software Testing and Analysis (ISSTA), Via di Ripetta, Rome - Italy, July 22-24, 2002.
104
Bibliography
[1] M. Sakaguchi and K. Kaede, “Optical switching device technologies,” IEEE Commu-
nications Magazine, vol. 25, pp. 27-32, May 1987.
[2] M. Veeraraghavan, M. Karol, R. Karri, R. Grobler, and T. Moors, “Architectures and
protocols that enable new applications on optical networks,” IEEE Communications
Magazine, vol. 39, pp. 118-127, March 2001.
[3] S. Yao, B. Mukherjee, and S. Dixit, “Advances in photonic packet switching: an over-
view,” IEEE Communications Magazine, vol. 38, pp. 84-94, February 2000.
[4] Bellcore Publication GR-253-Core “Synchronous Optical Network (SONET) Trans-
port Systems: Common Generic Criteria,” January 1999.
[5] ITU-T, “Recommendation G.784: Synchronous Digital Hierarchy (SDH) manage-
ment,” June 1999.
[6] M. Kuznetsov, M. M. Froberg, S. R. Henion, H. G. Rao, J. Korn, K. A. Rauschenbach,
E. H. Modiano, and V. W. S. Chan, “A Next-Generation Optical Regional Access Net-
work,” IEEE Communications Magazine, vol. 38, pp. 66-72, January 2000.
[7] I. Habib, D. Awduche, and A. Fumagalli, “Advances in Metropolitan Optical Net-
works (Architectures and Control),” IEEE JSAC Call for Papers, http://www.argreen-
house.com/society/J-SAC/Calls/met_optical.html.
[8] OGSI, “Open Grid Services Infrastructure v1.0 (Draft 29),” http://www.gridforum.org/
ogsi-wg/, April 5, 2003.
105
[9] OGSI, “Grid Service Specification (Draft 8),” http://www.gridforum.org/ogsi-wg/,
February 2, 2003.
[10] M. Sampson, “World's First Working Prototypes of User Control of Lightpaths Dem-
onstrated,” http://www.canarie.ca/canet4/obgp/index.html, May 27, 2003.
[11] E. Mannie, “GMPLS Architecture,” IETF Internet Draft, http://www.ietf.org/internet-
drafts/draft-ietf-ccamp-gmpls-architecture-07.txt, May 2003.
[12] P. Ashwood-Smith, et al. “Generalized MPLS - Signaling Functional Description,”
IETF Internet Draft, http://www.ietf.org/proceedings/01dec/I-D/draft-ietf-mpls-gen-
eralized-signaling-07.txt, November 2001.
[13] P. Ashwood-Smith, et al. “Generalized MPLS - RSVP-TE Extensions,” IETF RFC
3473, January 2003.
[14] OIF Architecture, OAM&P, PLL, & Signaling Working Groups, “User Network Inter-
face (UNI) 1.0 Signaling Specification,” http://www.oiforum.com/public/documents/
OIF-UNI-01.0.pdf, October 2001.
[15] ITU-T, “Recommendation G.8080/Y.1304: Architecture for Automatic Switched Op-
tical Networks (ASON),” http://www.itu.int/itudoc/itu-t/aap/sg15aap/history/g8080/.
[16] A. Parikh, “Ethernet enlightens optical access,” Network World, Oct. 9, 2000, http://
www.nwfusion.com/news/tech/2000/1009tech.html.
[17] T. Brooks, “Optical networks: At your service,” Network World, Apr. 10, 2000, http:/
/www.nwfusion.com/columnists/2000/0410brooks.html.
[18] X. Zheng, “Internet traffic measurement experiments,” http://www.ece.virginia.edu/
~xz3y/research/measurements/measureindex.html.
[19] J. Michael and I. Graham, “The Auckland data set: an access link observed,” Proceed-
106
ing of the 14th ITC Specialists Seminar on Access Networks and Systems, April 2001.
[20] First International Workshop on Protocols for Fast Long-Distance Networks, PFLDnet
2003, http://datatag.web.cern.ch/datatag/pfldnet2003/, Geneva, Switzerland, February
3-4, 2003.
[21] W. Feng and P. Tinnakornsrisuphapá, “The Failure of TCP in High-Performance Com-
putational Grids,” Proceeding of SC2000: High-Performance Network and Computing
Conference, Dallas, TX, November 2000.
[22] S. Floyd, “HighSpeed TCP for Large Congestion Windows,” IETF RFC 3649, De-
cember 2003.
[23] C. Jin, D. X. Wei, and S. H. Low, “FAST TCP: motivation, architecture, algorithms,
performance,” Proceeding of IEEE Infocom 2004, March 2004.
[24] T. Kelly, “Scalable TCP: Improving Performance in HighSpeed Wide Area Net-
works,” PFLDnet 2003, http://datatag.web.cern.ch/datatag/pfldnet2003/, February 3-
4, 2003, Geneva, Switzerland.
[25] J. Semke, J. Mahdavi, and M. Mathis, “Automatic TCP Buffer Tuning,” Proceeding
of ACM SIGCOMM 1998, pp. 315-323, October 1998.
[26] W. Feng, M. Gardner, M. Fisk, and E. Weigle, “Automatic Flow-Control Adaptation
for Enhancing Network Performance in Computational Grids,” Journal of Grid Com-
puting, vol. 1, pp. 63-74, 2003.
[27] M. Gardner, W. Feng, and M. Fisk, “Dynamic Right-Sizing in FTP (drsFTP): An Au-
tomatic Technique for Enhancing Grid Performance,” Proceeding of the 11th IEEE
Symposium on High-Performance Distributed Computing, Edinburgh, Scotland, July
2002.
107
[28] D. Katabi, M. Handley, and C. Rohrs. “Internet congestion control for high bandwidth-
delay product network,” Proceeding of ACM SIGCOMM, Pittsburgh, August 2003.
[29] M. Mathis, “Raising the Internet MTU,” http://www.psc.edu/~mathis/MTU/.
[30] N. S. V. Rao and W. C. Feng, “Performance trade-offs of TCP adaptation methods,”
Proceeding of Intl. Conf. Networking, 2002.
[31] T. Dunigan, M. Mathis, and B. Tierney, “A TCP Tuning Daemon,” Proceeding of the
2002 ACM/IEEE conference on Supercomputing, Baltimore, Maryland, July 2002.
[32] D. Comer, “Internetworking with TCP/IP, Volume I: Principles, Protocols, and Archi-
tecture,” Prentice Hall, 1991.
[33] DOE Office Of Science High Performance Network Planning Workshop, http://do-
ecollaboratory.pnl.gov/meetings/hpnpw/workshopdescription.pdf, August 13-15,
2002.
[34] “End-To-End Provisioned Optical Network Testbed for Large-Scale eScience Appli-
cations,” http://www.ece.virginia.edu/~mv/html-files/ein-home.html.
[35] “About NetworkVirginia,” http://www.networkvirginia.net, information posted in
March 2003.
[36] UCAID costs, http://ncne.nlanr.net/training/techs/2000/000515/Talks/love1-
jt05152000/tsld014.htm.
[37] IEEE 802.17, Resilient Packet Ring Working Group, http://grouper.ieee.org/groups/
802/17/documents.htm.
[38] D. Tsiang and G. Suwala, “The Cisco SRP MAC Layer Protocol,” IETF RFC 2892,
August 2000.
[39] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling TCP Throughput: A Sim-
108
ple Model and its Empirical Validation,” IEEE/ACM Transaction on Networking, vol.
9, pp. 31-46, February 2001.
[40] N. Cardwell, S. Savage, and T. Anderson, “Modeling TCP Latency,” Proceeding of
IEEE Infocom, vol. 3, pp. 1742-1751, Tel-Aviv, Israel, March 2000.
[41] M. Allman, V. Paxson, and W. Stevens, “TCP Congestion Control”, IETF RFC 2581,
April 1999.
[42] R. Braden, D. Clark, and S. Shenker, “Integrated Services in the Internet Architecture:
an Overview,” IETF RFC 1633, June 1994.
[43] K. Nichols, S. Blake, F. Baker, and D. Black, “An Architecture for Differentiated Ser-
vices,” IETF RFC 2474, December 1998.
[44] P. Molinero-Fernandez and N. McKeown, “TCP switching: exposing circuits to IP,”
IEEE Micro, vol. 22, pp. 82-89, January-February 2002.
[45] “CANARIE's CA*net 4”, http://www.canarie.ca/canet4/.
[46] “Starlight”, http://www.startap.net/starlight/.
[47] “SURFnet”, http://www.surfnet.nl/en/.
[48] “UKlight”, http://www.ja.net/development/UKLight/.
[49] “DOE UltraScience Net”, http://www.csm.ornl.gov/ultranet/.
[50] Canarie network, “User Controlled Lightpaths (UCLP),” http://www.canarie.ca/
canet4/uclp/.
[51] ITU-T, “Recommendation G.7041: Generic Framing Procedure (GFP),” October
2001.
[52] ITU-T, “Recommendation G.707: Network Node Interface for the Synchronous Digi-
tal Hierarchy,” October. 2000.
109
[53] Special Issue of IEEE Communications Magazine on “Generic Framing Procedure
(GFP) and Data over SONET/SDH and OTN,” May 2002.
[54] Fujitsu, “FLM 150 ADM: Flexible OC-3 and OC-12 Add/Drop Multiplexer,” http://
us.fujitsu.com/services/Telecom/ByCateg/MetroEdgeNAccess/.
[55] Cisco, “Cisco ONS 15454 Optical Transport Platform,” http://www.cisco.com/en/US/
products/hw/optical/ps2006/ps2010/index.html.
[56] Ciena, “CIENA MultiWave MetroDirector K2™ Next-Generation Multi-Service Ac-
cess and Switching Platform,” http://www.ciena.com/products/k2/k2.htm.
[57] G. Beranano, et al., “Achieving UNI and NNI Interoperability,” OIF Forum, http://
www.oiforum.com/public/documents/OFC03_WP.pdf.
[58] M. Veeraraghavan, H. Lee, and R. Grobler, “A low-load comparison of TCP/IP and
end-to-end circuits for file transfers,” INET 2002, Arlington, VA, June 18-21 2002.
[59] V. Paxson and S. Floyd, “Wide-Area Traffic: The Failure of Poisson Modeling,” IEEE/
ACM Transaction on Networking, vol. 3, pp. 226-244, June 1995.
[60] H. Wang, M. Veeraraghavan, and R. Karri, “A hardware implementation of a signaling
protocol,” Proceeding of Opticomm 2002, Boston, MA, July 29-August 2, 2002.
[61] DOE Office Of Science High Performance Network Planning Workshop, http://do-
ecollaboratory.pnl.gov/meetings/hpnpw/workshopdescription.pdf, August 13-15,
2002.
[62] M. D. Brown, “Blueprint for the future of high-performance networking introduction,”
Communications of ACM, Vol. 46, No. 11, pp. 30-33, November 2003.
[63] C. de Laat, G. Gross, L. Gommans, J. Vollbrecht, and D. Spence, “Generic AAA Ar-
chitecture,” IETF RFC 2903, August 2000.
110
[64] W. Whitt, “Blocking when service is required from several facilities simultaneously,”
AT&T Technical Journal, vol. 64, pp. 1807-1856, October 1985.
[65] M. E. Crovella and A. Bestavros, “Self-similarity in World Wide Web Traffic Evi-
dence and Possible Causes,” IEEE/ACM Transaction on Networking, vol. 5, pp. 835-
846, December 1997.
[66] Y. Gu, X. Hong, M. Mazzucco, and R. L. Grossman, “SABUL: A High Performance
Data Transfer Protocol,” submitted to IEEE COMMUNICATIONS LETTERS.
[67] Y. Gu and R. L. Grossman, “End-to-End Congestion Control for High Performance
Data Transfer,” submitted to IEEE/ACM Transaction on Networking.
[68] “Tsunami,” http://www.indiana.edu/~anml/anmlresearch.html.
[69] E. He, J. Leigh, O. Yu, and T. A. DeFanti, “Reliable Blast UDP: Predictable High Per-
formance Bulk Data Transfer,” Proceeding of the IEEE Cluster Computing 2002, pp.
317-324, Chicago, Illinois, September 23-26, 2002.
[70] ANSI, “Information Technology - Scheduled Transfer Protocol (ST),” T11.1/Proj.
1245-M/Rev 4.0, October 2000.
[71] IETF, “Remote Direct Data Placement (RDDP),” http://www.ietf.org/html.charters/
rddp-charter.html.
[72] “Bonnie,” http://www.textuality.com/bonnie/.
[73] “TCPDUMP Public Repository,” http://www.tcpdump.org.
[74] B. Mah, “Pchar: A Tool for Measuring Internet Path Characteristics,” http://www.em-
ployees.org/~bmah/Software/pchar/.
[75] C. Dovrolis and R. Prasad, “Pathrate: A measurement tool for the capacity of network
paths,” http://www.cc.gatech.edu/fac/Constantinos.Dovrolis/pathrate.html.
111
[76] T. Bu, N.G. Duffield, F. Lo Presti, and D. Towsley, “Network Tomography on General
Topologies,” Proceedings of ACM SIGMETRICS 2002, pp. 21-30, Marina Del Rey,
California, 2002.
[77] Cisco ONS 15454 TL1 Command Guide, http://www.cisco.com/en/US/products/hw/
optical/ps2006/products_command_reference_book09186a00801a42d3.html.
[78] IEEE, “Standard 802.1Q: Virtual Bridged Local Area Networks,” May 2003.
[79] Extreme Networks, “Extreme Summit Ethernet switch,” http://www.extremenet-
works.com/products/summit/.
[80] “Abilene backbone network”, http://abilene.internet2.edu/.
[81] D. Awduche, L. Berger, T. Li, V. Srinivasan, and G. Swallow, “RSVP-TE: Extensions
to RSVP for LSP Tunnels,” IETF RFC 3209, December 2001.
[82] ITC UVA, “Fiber Leasing Service,” http://www.itc.virginia.edu/netops/fiber-leas-
ing.html.
[83] Switch and Data Colocation Service, http://www.switchanddata.com/Locations/
ListOfLocations/3101.
[84] M. Veeraraghavan, H. Lee, E. K. P. Chong, and H. Li, “A varying-bandwidth list
scheduling heuristic for file transfers,” Proceeding of IEEE ICC2004, Paris, France,
June 20-24, 2004.
[85] Cisco, “Cisco 12008 Gigabit Switch Router,” http://www.cisco.com/en/US/products/
hw/routers/ps167/ps191/index.html.
[86] Cisco, “Link Bundling on Cisco 12000 Series Internet Routers,” http://
www.cisco.com/en/US/products/sw/iosswrel/ps1829/
products_feature_guide09186a0080103708.html.
Recommended