FAST TCP in Linux
Cheng Jin David Wei Steven Low California Institute of Technology
netlab.caltech.edu
Outline
Overview of FAST TCP. Design details. SC2002 experiments. On-going experiments.
netlab.caltech.edu
Motivation
Clear and present need in the HENP community.
High capacity network infrastructure available today.
Existing TCP protocol does not perform well.
netlab.caltech.edu
FAST vs Linux TCP
Distance = 10,037 km; Delay = 180 ms; MTU = 1500 B; Duration: 3600 sLinux TCP Experiments: Jan 28-29, 2003
1333173.182Linux TCPtxqlen=100
3909319.352Linux TCPtxqlen=10000
78 1851.86 1Linux TCPtxqlen=100
753
387
111
Transfer(GB)
1,7972FAST
19.11.2002
9259.281FAST
19.11.2002
2662.671Linux TCP
txqlen=10000
Throughput(Mbps)
BmpsPeta
Flows
18.03
netlab.caltech.edu
Aggregate Throughput
Linux TCP Linux TCP FAST
Average utilization
19%
27%
92%FAST Standard MTU of 1500 bytes Utilization averaged over one hour
txq=100 txq=10000
95%
16%
48%
Linux TCP Linux TCP FAST
2G
1G
netlab.caltech.edu
Summary of Changes
RTT estimation: high-resolution timer. Fast convergence to equilibrium. Delay monitoring in equilibrium. Pacing cwnd to smooth out traffic.
netlab.caltech.edu
FAST TCP Flow Chart
Slow Start
Fast Convergence
Equilibrium
Loss Recovery
NormalRecovery
Time-out
netlab.caltech.edu
RTT Estimation
Measure queueing delay. Kernel timestamp with s resolution. Exponential averaging of RTT samples.
netlab.caltech.edu
Fast Convergence
Monitor the per-ack queueing delay to avoid overshoot.
Window increment scales with the “distance” from equilibrium.
netlab.caltech.edu
Equilibrium
Define an equilibrium neighborhood around the target queueing delay.
Small cwnd adjustment in large time-scale -- one per RTT.
Per-ack delay monitoring to enable timely detection of changes in equilibrium.
netlab.caltech.edu
Pacing
What do we pace? Increment to window.
Time-Driven vs. event-driven. Trade-off between complexity and
performance. Timer resolution is important.
netlab.caltech.edu
Time-Based Pacing
cwnd increments are scheduled at fixed intervals.
data ack data
netlab.caltech.edu
Event-Based Pacing
Detect sufficiently large gap between consecutive bursts and delay cwnd increment until the end of each such burst.
SCinet Caltech-SLAC experiments
http://netlab.caltech.edu/FAST
SC2002 Baltimore, Nov 2002
Experiment
Sunnyvale Baltimore
Chicago
Geneva
3000km 1000km
70
00
km
C. Jin, D. Wei, S. LowFAST Team and Partners
Internet: distributed feedbacksystem Rf (s)
Rb’(s)
x
p
TCP AQM
Theory
FAST TCP Standard MTU Peak window = 14,255 pkts Throughput averaged over > 1hr 925 Mbps single flow/GE card
9.28 petabit-meter/sec 1.89 times LSR
8.6 Gbps with 10 flows 34.0 petabit-meter/sec 6.32 times LSR
21TB in 6 hours with 10 flows
Implementation Sender-side modification Delay based
Highlights
1 2
1
2
7
9
10G
enev
a-Sunnyv
ale
Baltim
ore-
Sunn
yval
eFA
ST
I2 L
SR
#flows
netlab.caltech.edu
Network
(Sylvain Ravot, Caltech/CERN)
netlab.caltech.edu
FAST BMPS
Internet2Land Speed
Record
FAST
1 2
1
2
7
9
10
Geneva-S
unnyvale
Baltim
ore-S
unnyvale
#flows
FAST Standard MTU Throughput averaged over > 1hr
netlab.caltech.edu
Aggregate Throughput
1 flow 2 flows 7 flows 9 flows 10 flows
Average utilization
95%
92%
90%
90%
88%FAST Standard MTU of 1500 bytes Utilization averaged over one hour
1hr 1hr 6hr 1.1hr 6hr
netlab.caltech.edu
Caltech-SLAC Entry
Rapid recoveryafter possiblehardware glitch
Power glitch -- Reboot
100-200Mbps ACK traffic
SCinet Caltech-SLAC experiments
http://netlab.caltech.edu/FAST
SC2002 Baltimore, Nov 2002
PrototypeC. Jin, D. Wei
TheoryD. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA)
Experiment/facilities Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S.
Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato Level(3): P. Fernes, R. Struble SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J.
Navratil, J. Williams StarLight: T. deFanti, L. Winkler TeraGrid: L. Winkler
Major sponsorsARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF
Acknowledgments
netlab.caltech.edu
Fairness Experiment I
netlab.caltech.edu
Fairness Experiment II
netlab.caltech.edu
10GbE Mystery
Frequent packet loss with 1500-byte MTU. None with larger MTUs.
Packet loss even when cwnd is capped at 300 - 500 packets.
Routers have large queue size of 4000 packets.
Packets captured at both sender and receiver using tcpdump.
netlab.caltech.edu
How Did the Loss Happen?
loss detected
netlab.caltech.edu
Future Work
Redesigning a beta version. Stability and fairness experiments. More 10 GbE WAN tests. Extensive testing on shared links.