View
217
Download
1
Tags:
Embed Size (px)
Citation preview
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Infrastructure and Protocols for Dedicated Bandwidth Channels
Nagi RaoNagi Rao
Computer Science and Mathematics DivisionComputer Science and Mathematics Division
Oak Ridge National LaboratoryOak Ridge National Laboratory
[email protected]@ornl.gov
March 14, 20051st Annul Workshop of
Cyber Security and Information Infrastructure Research Group (CSIIR)and
Information Operations Center (IOC)Oak Ridge, TN
Research Sponsored byDepartment of Energy
National Science FoundationDefense Advanced Research Agency
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Collaborators Steven Carter, Oak Ridge National Laboratory Leon O. Chua, University of California at Berkeley Jianbo Gao, University of Florida Qishi Wu, Oak Ridge National Laboratory William Wing, Oak Ridge National Laboratory
Department of EnergyHigh-Performance Networking Program
National Science FoundationAdvanced Network Infrastructure Program
Defense Advanced Research AgencyNetwork Modeling and Simulation Program
Oak Ridge National LaboratoryLaboratory Directed R&D Program
Sponsors
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Outline of Presentation Network Infrastructure Projects
DOE UltraScienceNet NSF CHEETAH
Dynamics and Control of Transport Protocols TCP AIMD Dynamics
Analytical Results Experimental Results
New Class of Protocols Throughput Stabilization for Control Transport Protocol
Probabilistic Quickest Path Problem Quickest path algorithm Probabilistic algorithm
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Outline of Presentation
Network Infrastructure Projects DOE UltraScienceNet NSF CHEETAH
Dynamics and Control of Transport Protocols TCP AIMD Dynamics
Analytical Results Experimental Results
New Class of Protocols Throughput Stabilization for Control Transport Protocol
Probabilistic Quickest Path Problem Quickest path algorithm Probabilistic algorithm
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Science Objective: Understand supernova evolutions DOE SciDAC Project: ORNL and 8 universities Teams of field experts across the country collaborate on computations
Experts in hydrodynamics, fusion energy, high energy physics Massive computational code
Terabyte in generated in a day currently Archived at nearby HPSS Visualized locally on clusters – only archival data
Desired network capabilities Archive and supply massive amounts of data to supercomputers and visualization engines Monitor, visualize, collaborate and steer computations
Motivation for Networking Projects:Terascale Supernova Initiative (TSI)DOE large-scale science application
Visualization channel
Visualization control channel
Steering channel
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
DOE UltraScience Net
The Need DOE large-scale science applications on supercomputers and experimental
facilities require high-performance networking Petabyte data sets, collaborative visualization and computational steering
Application areas span the disciplinary spectrum: high energy physics, climate, astrophysics, fusion energy, genomics, and others
Promising Solution High bandwidth and agile network capable
of providing on-demand dedicated channels: multiple 10s Gbps to 150 Mbps
Protocols are simpler for high throughput and control channels
Challenges: Several technologies need to be (fully) developed User-/application-driven agile control plane:
Dynamic scheduling and provisioning Security – encryption, authentication, authorization
Protocols, middleware, and applications optimized for dedicated channels
Contacts:Bill Wing ([email protected])Nagi Rao ([email protected])
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
DOE UltraScience NetConnects ORNL, Chicago, Seattle and
Sunnyvale: Dynamically provisioned dedicated
dual 10Gbps SONET links Proximity to several DOE locations:
SNS, NLCF, FNL, ANL, NERSC Peering with ESnet, NSF CHEETAH
and other networks
Data Plane User Connections:Direct connections to:
core switches –SONET channelsMSPP – Ethernet channels
Utilize UltraScience Net hosts
Funded by U. S. DOE High-Performance Networking Program at Oak Ridge National Laboratory– $4.5M for 3 years
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Control-Plane Phase I
Centralized VPN connectivity TL1-based communication with
core switches and MSPPs User access via centralized web-
based scheduler Phase II
GMPLS direct enhancements and wrappers for TL1
User access via GMPLS and web to bandwidth scheduler
Inter-domain GMPLS-based interface
Allows users to logon to website Request dedicated circuits Based on cgi scripts
Web-based User Interface and API
Computes path with target bandwidth Is bandwidth available now?
Extension of Dijkstra’s algorithm Provide all available slots
Extension of closed semi ring structure to sequences of reals
Both are polynomial-time algorithms GMPLS does not have this capability
Bandwidth Scheduler
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Objective: Develop the infrastructure and networking technologies to
support a broad class of eScience projects and specifically the Terascale Supernova Initiative.
Main Technical Components: Optical network testbed Transport protocols Middleware and applications
NSF CHEETAH:Circuit-switched High-speed End-to-End Transport ArcHitecture
Collaborative Project: $3.5M for 3 years U. Virginia, ORNL, NC State, CUNY
Sponsor: National Science Foundation
Contacts:Malathi Veeraraghavan([email protected])Nagi Rao ([email protected])
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
CHEETAH Project concept
Network: Create a network that on-demand offers end-to-end
dedicated bandwidth channels to applications Operate a PARALLE network to existing high-speed IP
networks – NOT AN ALTERNATIVE! Transport protocols:
Design to take advantage of dedicated and dual end-to-end paths IP path and dedicated channel
eScience Application Requirements: High-throughput file/data transfers Interactive remote visualization Remote computational steering Multipoint collaborative computation
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
CHEETAH: Initial Configuration
NC
GbE/10GbEEthernetSwitch
To hosts
NCSU
Controlcard
OC192card
GbE/10GbEcard
GbE/10GbEEthernetSwitch
To hosts
ORNL
MSPP
OC192card
Controlcard
GbE/10GbEcard
NCSU/MCNC/NLR
MSPP
OC192card
Controlcard
Atlanta (NLR/SOX)
MSPP
OC192card
GbE/10GbEcard
GbE/10GbEEthernetSwitch
To hosts
To DC – Dragon
ImplementsGMPLS protocols
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Peering Coast-to-coast dedicated
channels Access to ORNL
supercomputers and storage
Applications: TSI on larger scale
Peering: UltraScience Net - CHEETAH
CERN
Chicago
Sunnyvale
Atlanta
ANLFNAL
ORNL
CalTech
SLAC
LBL
NERSC
PNNL
10 Gbps
10 Gbps
DOE Science UltraNet + NSF CHEETAH
Seattle
BNL
JLab
University
DOE National Lab
Future Connections
UltraNetCHEETAH
UVa
NCSU
CUNY
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Outline of Presentation Network Infrastructure Projects
DOE UltraScienceNet NSF CHEETAH
Dynamics and Control of Transport Protocols TCP AIMD Dynamics
Analytical Results Experimental Results
New Class of Protocols Throughput Stabilization for Control Transport Protocol
Probabilistic Quickest Path Problem Quickest path algorithm Probabilistic algorithm
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Transport Dynamics are Important
Data Transport: High bandwidth for large data transfers over dedicated channels maintain suitable sending rate to achieve effective throughput
Control of end devices: Remote control of visualizations, computations and instruments Jittery dynamics will destabilize the control loops Will not be able to effectively execute interactive simulations
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Study of Transport Dynamics
Understanding of transport dynamics: Analytically showed that TCP-AIMD contains chaotic regimes
concept of w-update map Internet traces are shown to be both chaotic and stochastic
underlying process is anomalous diffusion.
Development and tuning of protocols: Protocols for stable flows of fixed rate: ONTCOU
Based on classical Robbins-Monro method Transport protocols with statistical stability: RUNAT
Combination of AIAD and Kiefer-Wolfowitz method
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Simulation Results: TCP-AIMD exhibits “complicated” trajectoriesTCP streams competing with each other (Veres and Boda 2000)TCP competing with UDP (Rao and Chua 2002)
Analytical Results (Rao and Chua 2002): TCP-AIMD has chaotic regimesDeveloped state space analysis and Poincare maps
Internet Measurements (2004): TCP-AIMD traces are a complicated mixture of stochastic and chaotic components
Complicated TCP AIMD Dynamics - History
Working Definition of Chaotic Trajectories:•Nearby starting points will result in trajectories that move far apart
at a rate determined by Lyapunov (>0) exponent•Trajectories are non-periodic for some starting points•The attractor is geometrically complicated
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Simplified View: Dynamics of TCP
Transport Control Protocol Outline Uses window mechanism to send W bytes/RTT Dynamically adjusts W to network and receiver state
Keeps increasing if no loses Keeps shrinking if losses are detected
Slow start phase: W increase exponentially until or loss
Congestion Control: Additively increase W with delivered packets Multiplicatively decrease with loss
Slow start:aCongestion control:1/w
time
Early loss slows throughput
timetime
tW
tWW
W
W
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Chaotic Dynamics of TCP
Competing TCP streams: Window dynamics are chaotic Hard to predict – resemblance to random noise Hard to conclude from experiments – nearby orbits move
faraway later Hard to characterize – chaotic attractor
Poincare map of two window sizes Two-streams case Four streams case
Veres and Boda (2000) did not rigorously establish chaos in a formal sense Attractor could have been
generated by periodic orbit with large period We repeated the simulation and found only quasi periodic trajectories
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Noisy Nature of TCP(simulation)
Simple random traffic generates complicated attractors TCP reacts to network traffic randomness
Jittery end-to-end delays
Do not need chaos to generate complicated attractors Poincare map of message delay vs. window size
TCP source Router: uniform random drops destination
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
TCP Competing with UDP (ns-2 simulation) As CBR rate is varied
TCP competing with UDP/CBR at the router generates a variety of dynamicsTCP/Reno
sink
UDP/CBR
Router
Poincare phase plot:Window-size W(t) vs. pkt end-to-end delay D(t)
2Mb, 10ms,DT
2Mb, 10ms,DT
1.7Mb, 10ms,DT
D(t)
W(t)
W(t)
time
UDP/CBR=1Mbs
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
TCP Competing with UDP
UDP/CBR: 0Mbs
UDP/CBR: 0.5Mbs
UDP/CBR: 1.7Mbs
UDP/CBR: 1.0MbsUDP/CBR: 1.7Mbs
UDP/CBR:1.75Mbs
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Summary of Our Analytical ResultsState-Space of TCP:
congestion window; packet delay including re-transmits; acknowledgements since last MD; losses inferred since last AI
TCP-AIMD dynamics have two qualitatively different regimes Regime one: high-lighted in usual TCP literature
increased with while Regime two: high-lighted by and
decreases with Its effect and duration is enhanced by network delay and high
buffer occupancy
Trajectories move back and forth between these two regimes We define Poincare that updates : w-update map M
M is 1-dimensional if Regime Two is short-lived M is 2-dimensional and complicated if Regime Two is significant
M is qualitatively similar to tent map – generates chaotic trajectories
( )w t
( )w t
( )w t
( )w t
( )e t( )a t
( )r t( )a t
( ) 0r t
( )r t( )r t
( )e t
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Dynamics of Transitions Between Regimes
map for long TCP transfers
Regime 1
tt
Regime 2
Both regimes are unstable – Eigenvalue analysis
w w
( ) ( )w t e t
( )w t
( )M w
( )w t
( )M w
1/dw dawdt dtde daudt dtda dadt dt
2drdt
wdw
dtde drndt dtdr drdt dt
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
M: w-update map
Given value, gives its next updated valuesafter some time period (not fixed)Regime 1:
Regime 2:
depends on the number of dropped packets- buffer occupancy at that time- delay between source and bottleneck buffer
Result: M is parametrized, and each piece resembles twisted version of classical tent-map
( ) 1/M w w w
( ) 1/ 2 inM w
w ( )M w
in
Rao, Gao and Chua, chapter inComplex Dynamics in Communications Networks,2004
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Question 1: How relevant are previous simulation and analytical results on chaotic trajectories?
Answer: Relevant from an analysis perspective to certain extent.
Question2: Do actual Internet TCP measurement exhibit chaotic behavior?
Answer: Yes. They are more complicated than chaotic (deterministic).
Internet Measurements – Joint work with Jianbo Gao
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Internet (net100) traces show that TCP-AIMD dynamics are complicated mixture of chaotic and stochastic regimes: Chaotic – TCP-AIMD dynamics Stochastic – TCP response to network traffic
Basic Point: TCP Traces collected on all Internet connections showed complicated dynamics classical “saw-tooth” profile is not seen even once This is not a criticism against TCP, it was not intended for
smooth dynamics
Internet Measurements
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Cwnd time series for ORNL-LSU connection
Connection: OC192 to Atlanta-Sox; Internet2 to Houston; LAnet to LSU
Time series: cwnd=x(t)Collected at 1ms (approx) resolutions collected using net100 instruments
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Time-Dependent Exponent Plots
( ) lni k j k
i j
V Vk
V V
( ), ( 1),..., ( 1)iV x i x i x i m
Lorenz – chaoticCommon envelope
Informally, a measure of how separated close-by states become in time: Exponential separation is characteristic of chaotic regime
Form state vectors of size m from time series x(t), sampled denoted by x(1), x(2), ….
For a two state vectors satisfying i jr V V r r we define time-dependent exponent as
Uniform RandomSpread out
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Internet cwnd measurements:Both Stochastic and Chaotic Parts are Dominant
TCP traces have:•Common envelope – chaotic •Spread out – stochastic
at certain scales
Observations:•From analysis, chaotic dynamics are from AIMD•Stochastic component is in response to network traffic; losses and RTT variations
Gao and Rao, IEEE Comm Letters, 2005,in press
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Observation 1: Avoid AIMD-like behavior to avoid chaotic dynamics
Challenge: Randomness is inherent in Internet connections – will not go away even if protocol is non-chaotic.
Our Solution: Explicitly account for randomness in the protocol design – stochastic approximation
Design of Transport Protocols with Smooth Dynamics
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Throughput Stabilization Niche Application Requirement: Provide stable
throughput at a target rate - typically much below peak bandwidth High-priority channels Commands for computational steering and visualization Control loops for remote instrumentation
TCP AIMD is not suited for stable throughput Complicated dynamics Underflows with sustained traffic
Important Consideration Stochasticity of Internet connections must be explicitly
accounted for
Rao, Wu and Iyengar, IEEE Comm Letters, 2004
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Stochastic Approximation: UDP window-based method
Transport control loop
Source node S Destination node D
data packets
acknowledgements
transmission rater(t)
Destinationgoodput
)(tgD
Sourcegoodput
)(tg S
Objective: adjust source rate to achieve (almost) fixed goodput at the destination application
Difficulty: data packets and acks are subject to random processes
Approach: Rely on statistical properties of data paths
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
UDP-Based Framework RUNAT sender RUNAT receiver
Sender Buffer
inter-window-delay
Receiver Buffer
)(tTIWD
)(tg
)(tl
UDP datagrams
cwin )(tW
acknowledgements
)(tr
Send datagrams and wait for periodSource Sending rate:Destination goodput:Loss rate
( )W t ( )IWDT t
( )Dg t
( )l t
( ) ( ) / ( )IWDr t W t MDS T t
( ) ( ) | ( )DG r E g t r t r
( ) ( ) | ( )L r E l t r t r
Goodput regression:
Loss regression:
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Channel Throughput profilePlot of receiving rate as a function of sending rate
Its precise interpretation depends on: Sending and receiving mechanisms Definition of rates
For protocol optimizations, it is important to use its own sending mechanism to generate the profile
Window-based sending process for UDP datagrams:
Send datagrams in a one step – window size
Wait for time called idle-time or wait-time
Sending rate at time resolution :
This is an adhoc mechanism facilitated by 1GigE NIC
( )cW t
( )ST t
( )( )
( ) ( )c
ss c
W tr t
T t T t
( )ST t
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Throughput Profile:Throughput and loss rates vs. sending rate (window size, cycle time)
Objective: adjust source rate to yield the desired throughput at destination
Typical day Christmas day
Stabilization zone Peak zone
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Adaptation of source rate
Sending process: send datagrams and wait for duration
Adjust the window size
Adjust cycle-time
Both are special cases of classical Robbins-Monroe method
*, 1 , ( )sc n c n n
a TW W g g
n
, 1*
,
1.0/1.0
( )s n
cn
s n
Ta W
g gT n
^
1 ( ) *n n n nr r g r g
20, 0, ,n n n nn n
target throughput
noisy estimate
,c nW ,s nT
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Performance Guarantees
Summary:Stabilization is achieved with a high probability with a
very simple estimation of source rate
Basic result: for the general update
We have),1min(
2
1,0*)(1 agg
n
arr nnn
,2
3if)
1(
,2
3if)
1(
])[(
)(2
2
nO
nO
rE nn
)(1
nOnn *( )nE g g
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Internet Measurements
ORNL-LSU connection (before recent upgrade) Hosts with 10 M NIC 2000 mile network distance
ORNL-NYC – ESnetNYC-DC-Hou – AbileneHOU-LSU – Local n/s
ORNL-GaTech Connection Hosts with GigE NICS ORNL-Juniper router – 1Gig link Juniper- ATL Sox – OC192 (1Gig link) Sox-GaTech – 1Gig link
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Goodput Stabilization: ORNL-LSUExperimental Results
Case 1: Target goodput = 1.0 Mbps, rate control through congestion window, a = 0.8,
Datagram acknowledging time ( ) vs. source rate (Mbps) & goodput (Mbps)
s
Datagram acknowledging time ( ) vs. source rate (Mbps) & goodput (Mbps)
• Case 2. Target goodput = 2.0 Mbps, rate control through congestion window, a = 0.8,
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Throughput Stabilization: ORNL-GaTech
Target goodput = 20.0 Mbps, a = 0.8, adjust congestion window size
8.0
Target goodput level = 2.0 Mbps, a = 0.8, , adjust sleep time
8.0
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
RUNAT: Reliable UDP-based Network Adaptive Transport
Transport protocol Maximize connection utilization: Track peak goodput Uses Keifer-Wolfowitz stochastic approximation to handle ACKs and
losses
Features: Tailored to random loss rate and RTT Segmented rate control
3 control zones: bottleneck link is underutilized, saturated, and overloaded
Explicit accounting for random components Use stochastic approximation methods based on goodput estimates
TCP-friendliness Rate-increasing and rate-decreasing coefficients are dynamically
adjusted Adaptable to diverse network environments
Measurements and control periods are not constant, but link-specific (use RTT).
Wu and Rao, INFOCOM2005
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Three Zone of Goodput Profile Three control zones
Zone I: Adaptive Increase Bottleneck link is underutilized Low packet loss due to occasional congestion or
transmission errors Fixed with increasing source rate
Zone II (transitional): dynamic KWSA method Bottleneck link is saturated Peak goodput falls within this zone SA determines whether to increase or decrease source
rate Zone III: Adaptive Decrease
Bottleneck link is overloaded Large packet loss due to network congestion Back off to recover from congestion collapse
Goodputregression
sending rate r
Zone I~zero loss
Zone IIIhigh loss
Zone II low loss
Stabilize sending rate at*( ) max ( )
rG r G r
*r
*r
( )G r
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Segmented Rate Control Algorithm
)(^
tr
)(^
tl
)(^
tg
highl
lowl
)(^
ntg )(^
ntl
o45
)(^
tr low )(^
tr high
Phase I Phase II Phase III
Zone IAdaptive Decrease
Zone IIKWSA
Zone IIIAdaptive Increase
)(*^
tr
)(max
^
tg
))()(()()( 1
^
1 lownnIWDInIWDnIWD ltltTctTtT lown ltl )( 1
^
)()(
)()(
)()()()()(
11
1
^^
^
1
^^
1
nnIWD
nn
nnnnnn
tr
MDSWtT
trtr
tgtg
n
atrGatrtr
],[)( 1
^
highlown lltl
))()(()()( 1
^
1 highnnIWDDnIWDnIWD ltltTctTtT highn ltl )( 1
^
when
when
when
Loss rate estimate:
Basic Idea: Control sending rate based on loss rate estimate to achieve peak goodput
ˆ( )l t
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Convergence Properties of RUNATInformal Statement:
If in zones I or III, it will exit to zone IIIf in zone II, it will converge to maximum throughput
Condition A1: loss statistics vary slowly Condition A2: loss regression is differentiable and its derivative is monotonically increasing with respect to r in Phase II.
Result: RUNAT in zone I or III, enters II in a finite number of steps almost surely; In zone II, RUNAT will almost surely converge to the peak goodput
ˆ( ) | ( ) ( )E l t r t r L r
*( )r t r *( ) max ( )r
G r G r
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Experimental Results on link between ozy4 (ORNL) and robot (LSU) - Illustration of microscopic RUNAT behaviors during transfer of 20MB data
Zone III (loss rate: 37.33%)
Slow Start
Zone II (loss rate: 3.33%)
When far away from the saturation (peak) point, is adjusted to large values to quickly move towards the peak point.
IcWhen approaching the saturation (peak) point, is adjusted to small values to slowly converge to and remain at the peak point.
The decrement of source rate upon packet loss is determined by congestion levels (local loss rate measurements) and : higher congestion levels result in larger rate drops.
The increment of source rate is determined by congestion levels (local loss rate measurements) and .Ic
IcDc
Zone I (loss rate: 0%)
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Experimental Results on link between ozy4 (ORNL) and robot (LSU) - RUNAT transport performance during transfer of 2GB data with concurrent TCP transfer of 50MB data
RUNAT throughput: 10.49Mbps
TCP throughput: 0.376Mbps
Single TCP throughput: 0.377Mbps
Note:
The low throughputs were due to the high traffic volume at the time of experiments.
In a normal day with regular traffic volume, TCP is able to achieve 3~6Mbps and RUNAT may reach 15~30Mbps at lower loss rates without significantly affecting concurrent TCP on this link.
Case 1: run RUNAT & TCP concurrently
Case 2: run a single TCP only
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Experimental Resultson link from ozy4 (ORNL) to orbitty (NC State)
Transport method
Data sent
(MBytes)
Throughput
(Mbps)Is TCP friendly?
iperf (TCP) 100 8.7 YES
Iperf (UDP) 1000 95.6NO
(no congestion control, no reliability)
FTP (TCP) 100 18.6 YES
SABUL 1000 80.0Seems NOT
Concurrent TCP: 18.6Mbps 10.1Mbps
RUNAT 1500 80.0Statistically YES
Concurrent TCP: 18.6Mbps 18.2Mbps
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
ORNL-Atlanta-ORNL 1Gbps Channel
Dell Dual Xeon 3.2GHz
Dual Opteron2.2 GHz
OC192
ORNL-ATLGigE
GigE
Juniper M160 Router at ORNL
Juniper M160 Router at Atlanta
GigE
blade
SO
NE
T blade
SO
NE
T blade
IP loop
Host to Router Dedicated 1GigE NIC
ORNL Router Filter-based forwarding to override both at input and
middle queues and disable other traffic to GigE interfaces IP packets on both GigE interfaces are forwarded to out-
going SONET port Atlanta-SOX router
Default IP loopback Only 1Gbps on OC192 link is used for production traffic
– 9Gbps spare capacity
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
1Gbps Dedicated IP Channel
• Non-Uniform Physical Channel:• GigE – SONET – GigE• ~500 network miles
• End-to-End IP Path• Both GigE links are dedicated to the channel• Other host traffic is handled through second NIC
• Routers, OC192 and hosts are lightly loaded • IP-based Applications and Protocols are readily executed
Dell Dual Xeon 3.2GHz
Dual Opteron2.2 GHz
OC192
ORNL-ATL
GigE
GigE
Juniper M160 Router at ORNL
Juniper M160 Router at Atlanta
GigE
blade
SO
NE
T blade
SO
NE
T blade
IP loopback
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Dedicated Hosts
• Hosts:• Linux 2.4 kernel (Redhat, Suse)• Two NICS:
• optical connection to Juniper M160 router• copper connection Ethernet switch/router
• Disks: RAID 0 dual disks (140GB SCSI)• XFS file system
• Peak disk data rate is ~1.2Gbps (IO Zone measurements)• Disk is not a bottleneck for 1Gbps data rates
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
UDP goodput and loss profile
Point in horizontal plane:
Gooput plateau~990Mbps
Non-zero and random loss rate
High gooput is received at non-trivial loss
( ), ( )c sW t T t
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
1GigE NICS Act as Rate Controllers
GigE NICApplicationBuffer
Kernelbuffer
JuniperM160
Host
Rate Limited1Gbps
Data rates could exceed 1Gbps
Rate Limited1Gbps
Our window-based method:•Flow rate from application to NIC is ON/OFF and exceeds 1Gbps at times•Flow is regulated to 1Gps: NIC rate matches the link rates
This method does not work well if NIC rate is higher than link rate or router port rate:
- NIC may send at higher rate causing losses at router port
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Best Performance of Existing Protocols
Disk-to-Disk Transfers (unet2 to unet1)
Memory-to-Memory Transfers UDT: 958Mbps Both Iperf and throughput profiles indicated 990 Mbps levels
Potentially such rates are achievable if disk access and protocol parameters are tuned
Protocol goodput
tsunami 919 Mbps
UDT 890 Mbps
FOBS 708 Mbps
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Hurricane ProtocolComposed based on principles and experiences with UDT and SABUL was not easy for us to figure out all tweaks for pushing
peak performance
UDP window-base flow-control Nothing fundamentally new but needed for fine tuning 990 Mbps on dedicated 1Gbps connection disk-to-disk No attempt for congestion control
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Hurricane Control StructureSender receiver
( )CW t
( )ST t
Receiverbuffer
Reorderingdatagrams
datagrams
Groupk NACKs
Reload lostdatagrams
TCP
disk
disk
Send datagrams
Different subtasks are handled by threads, which are woken up on demandThread invocations are reduced by clustered NCKs instead of individual ACKS
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Adhoc Optimizations
• Manual tuning of parametersWait-time parameter:
Initial value chosen from throughput profile Empirically, goodput is “unimodel” in : pairwise measurements for binary search
Group size for k for NACKs empirically, goodput is unimodel in k and is tuned
Disk-specific detailsReads done in batch – no input bufferNAKs are handled using fseek – attached to the next batch
This tuning is not likely to be transferable to other configurations and different host loads
More work needed: automatic tuning and systematic analysis
( )sT t
( )sT t
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Outline of Presentation Network Infrastructure Projects
DOE UltraScienceNet NSF CHEETAH
Dynamics and Control of Transport Protocols TCP AIMD Dynamics
Analytical Results Experimental Results
New Class of Protocols Throughput Stabilization for Control Transport Protocol
Probabilistic Quickest Path Problem Quickest path algorithm Probabilistic algorithm
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Shortest Path Problem
Classical Problem:Given a graph along with “distance” function on edges
For path we define the path distance delay: for
Compute a path with smallest path distance from source node to destination node
Solved using Dijkstra’s Algorithm with complexity
,G V E 0:d E R
logO n n e
0 1, , , pP v v v
1
( ) ( )p
ii
d P d e
1( , )i i ie v v
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Quickest Path Problem
Problem:Given a graph along with 1. “delay” function on edges 2. “bandwidth” function on edges
For path we define the total delay: for
Compute a path with smallest total delay from source node to destination node
Solved using Chen and Chin’s Algorithm with complexity
Important Observation: Subpath of a quickest path is not necessarily quickest
,G V E0:d E R
2logO ne n e
0 1, , , pP v v v
1
1
( , ) ( )min ( )
p
ipi
ii
T P d eb e
1( , )i i ie v v
0:b E R
s d5,20 5,20
15,515,20
T(60)=32
T(60)=29
T(60)=52
T(60)=57
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Quickest Path Algorithm – Chin and Chen
Let denote distinct bandwidthsLet subnetwork - edges with bandwidth smaller than b are removed
path with least delay in :
Quickest path is given by
Typically implemented using m invocations of Dijkstra’ algorithm
m could be quite large
( )G b1 2, , , mb b b
iP ( )iG b
1min ( )
( )
m
ii
i
d Pb P
1( ) min ( )
p
i ii
b P b e
1
( ) ( )p
i ii
d P d e
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Simple Probabilistic Quickest Path Algorithm
Randomly choose a fraction of `s and compute only on ib iP ( )iG b
For larger networks we only needed less than 10% shortest delay computations Question: Is there a fundamental reason for this?
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Analysis
Critical Observation:
For delay function is non-decreasing
Its Vapnik and Chervonenkis dimension is 1Makes it efficient to approximate it by random sampling
1 2 mb b b ( ) ( )i iD b d P
Rao 2004, Theoretical Computer Science
22*
2 20
/4*ˆ( ) ( ) 1 8
i
r i
p p mm D
P T P T P pe
Optimal delay
Approximation based on p shortest path computations
1b 2b mbib
ib
( )iD b
LinearApproximation with p points
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Conclusions
TCP-AIMD Dynamics: Analytically established chaotic dynamics Analyzed Internet traces: combination of
chaotic and stochastic dynamics New Classes of Protocols
ONTCOU: achieve stable target flow level RUNAT: statistical approach to congestion
control Based on Stochastic Approximation:
convergence proof under general conditions Experimental results are promising both on
Internet and dedicated connections