1
Scheduling file transfers on a circuit-switched network
Student: Hojun Lee
Advisor: Professor M. Veeraraghavan
Committee: Professor E. K. P. Chong
Professor S. Torsten
Professor S. Panwar
Professor M. Veeraraghavan
Date: 5/10/04
2
Problem statement
Increasing file sizes(e.g., multimedia, eScience: particle physics)
Increasing Link rates (e.g., optical fiber)
Current protocols (e.g., tcp) do not exploit high bandwidth to decrease file transfer delay
example) Current TCP connection with:
1) 1500B (MTU), 2)100 ms round-trip time(RTT), and
3) a steady throughput of 10Gbps; would require
at most one packet drop every 5,000,000,000 packets (not realistic)
p
C
RTT
MSSThroughput
23C(p = packet loss rate and )
3
Solutions to this problem
Limit upgrades to end hosts Scalable TCP (Kelly) High speed TCP (Floyd) FAST TCP (Low, et al.)
Upgrade routers within the Internet Larger Maximum Transmission Unit (MTU) ~ proposed by
Mathis
4
Our proposed solution: Circuit-switched High-speed End-to-End Transport ArcHitecture (CHEETAH)
End-to-end circuit setup
and release dynamically CHEETAH: add-on basis
to current Internet
5
File transfers using CHEETAH Set up circuit transfer file release circuit
Do not keep circuit open during user think time
Only unidirectional circuit used Utilization reasons
Mode of operation of circuit-switched network Call-blocking mode
“All-or nothing” full-bandwidth allocation approach Attempt a circuit setup:
If it succeeds end host will enjoy a much shorter file-transfer delay than on the TCP/IP path
If it fails fall back to the TCP/IP path Call-queueing mode
6
Analytical model for blocking mode (mean delay if the circuit setup is attempted):
(1)
= call blocking probability, = the mean call-setup delay,
= time to transfer a file, ~=
: ~ Padhye et al., Cardwell et al. (Modeling TCP Latency)
~ function of RTT, bottleneck link rate r,
packet loss , and round-trip propagation delay
cheetahTE
])[][()][)(1(][ tcpfailbtransfersetupbcheetah TETEPTTEPTE
bP ][ setupTE
transferT ][ failTE
][ tcpTE
][ setupTE
lossP propT
7
Routing decision
Compare with
resort directly to the TCP/IP path
attempt circuit setup
][ cheetahTE ][ tcpTE
][][1
][~
][][ If
cheetahtcpb
setup
tcpcheetah
TETEP
TE
TETE
][][1
][~
][][ If
cheetahtcpb
setup
tcpcheetah
TETEP
TE
TETE
8
File transfer delays for large files( 1GB and 1TB) over TCP/IP path
9
Numerical results for transfer delays of file size [5MB – 1GB]
Link rate = 100 Mbps, , k = 207.0 spsig
Should always attempt a circuit setup for these parameters
10
Numerical results for transfer delays of file size [5MB – 1GB] Con’t
Link rate = 1 Gbps, , k = 207.0 spsig
Cross over file size exists for small propagation delay environment
11
Crossover file sizes
Measure of loading on
Pb= 0.01 Pb= 0.1 Pb= 0.3
Ploss = 0.001
22 MB 24 MB 30 MB
Ploss = 0.001
9 MB 10 MB 12 MB
Ploss = 0.01
< 5MB < 5MB < 5MB
Measure of loading on
Pb= 0.01 Pb= 0.1 Pb= 0.3
Ploss = 0.001
2.4 MB 2.65 MB 3.4 MB
Ploss = 0.001
2 MB 2.2 MB 2.8 MB
Ploss = 0.01
500 KB 550 KB 650 kB
ckt. Sw.
network
TCP/IP path TCP/IP path
ckt. Sw.
network
r = 1 Gbps, Tprop = 0.1 ms, k = 20 r = 100 Mbps, Tprop = 0.1 ms, k = 20
For high propagation-delay environment, always attempt a circuit (utilization implications) This work was presented at PFLDNET2003 Workshop [1] and Opticomm2003 [2].
12
Motivation for call queueing Example: Large file transfer (1TB)
NetworkH DTCP/IP path
Circuit path
Call setup attempt Delay = 4 days 14.9 hours
Ploss = 0.0001, Tprop = 50ms
r = 1 Gbps,
1TB/1Gbps = 2.2 hours
13
Problem with call queueing Low bandwidth utilization
Reason: up-stream switches hold resources while for waiting for down-stream switches to admit a call instead of using the wait period to admit short calls that only traverse upstream segments
Host A Host BSwitch 1 Switch 2link 1 link 2
setup setup
The call waits (queues) until resources become
available on link 1, reserves and holds bandwidth
for this call until the call is setup all through
While call is being queue
for link 2 resources, link 1 resources
are idle
14
Idea! Use knowledge of file sizes to “schedule” calls Network knows
File sizes of admitted calls Bandwidth of admitted calls
When a new call arrives: The network can figure out when resources will become available
for the new call Network can schedule the new call for a delayed start and provide
this information to the requesting end host End host can then compare this delay with the expected delay on
TCP/IP path
15
Call scheduling on a single link
Main question: Since files can be transferred at any rate, what rate should the
network assign to a given file transfer?
16
One simple answer
In circuit switched networks, use fixed bandwidth allocation for the duration of a file transfer
TDM/FDM scheme Transmission capacity C (bits/sec) divided among n streams Transmission of a file of L(bits) will take Ln/C sec Even if other transfers complete before this transfer, bandwidth
cannot be increased for this particular transfer
Packet-switched system Statistical multiplexing
17
Our answer Greedy scheme ~ allocates maximum bandwidth available that is less
than of equal to , which is the maximum rate requested for call
Varying-Bandwidth List Scheduling (VBLS): End host specifies the file size, maximum bandwidth limit and
a desired start time, and the network returns a time-range capacity allocation vector assigning varying bandwidth levels in different time ranges for the transfer
VBLS with Channel Allocation (VBLS/CA): Special case of practical interest Tracks actual channel allocations in different time ranges
iRmax i
18
Notation
Specified in call request i
Switch’s response
19
VBLS algorithm
Initialization step: set time , remaining file size
check for available bandwidth at (if find next change
point in curve), set “next change point”
Case 1 ( and can be transmitted before the next change point in curve)
set , , , ; Terminate loop
Case 2 ( and cannot be transmitted before the next
change point in curve)
set , “next change point” in curve,
set , “next change point” ,
continue repeat loop (go to Initialization step)
ireqTv iF
)(v
iRv max)(
v
0)( v
vB ik ))(/( vvE i
k )(vC ik ki
iRv max)(
)(t
)(tvB ik i
kE
)(t
)(vC ik
)()( vxvE ik v 1 kk
)(t
)(t
20
VBLS algorithm con’t
Case 3 ( and can be transmitted before the next change pint in curve)
set , , , ; Terminate loop
Case 4 ( and cannot be transmitted before the nextchange in curve)
set , “next change point” in curve,
set , “next change point” ,
continue repeat loop (go to Initialization step)
)(t
vB ik )/( max ii
k RvE
iik RC max
ki
iRv max)(
ikE
iik RC max
iik xRvE max)( v
1 kk
iRv max)(
)(t
vB ik )(t
21
Example of VBLS by figure
CircuitSwitch
S3
Shared single link
Ch. 1
Ch. 2
S1
S2 Ch. 3
Ch. 4
D
t=1 t=2 t=3 t=4 t=5)(t
time
1
2
3
4
:Available time ranges
)2,1,2( 1max
11 RTF req
TRC1
)2,1,2( 2max
22 RTF req
TRC2
)3,3,5( 3max
33 RTF req
TRC3
22
VBLS/CA algorithm Four additions:
1) Track channel availability with time for each channel in addition to tracking total available bandwidth curve ( ). Furthermore, track the channel availability in each time change point in
2) Track the set of open channels
- to save the switch programming time
3) If multiple channels are allocated with the same time range, we count each allocation as a separate entry in the Time-Range-channeL (TRL) vector
4) For many candidate channels, there are two rules: 1’st rule: If file transfer completes within a time range, choose
the channel with smallest leftover time 2nd rule: If file transfer does not complete within a time range,
choose the channel with largest leftover time
)(t)(t
23
Example of VBLS/CA
Parameter Value
10
2
75 MB
1 Gbps
ireqTiRmaxiF
leftover
(MB)
Copen TRLi
Round 0 75 { } { }
Round 1 50 {1,4} (10,20,1)
(10,20,4)
Round 2 25 {4} (20,30,1)
(20,30,4)
Round 3 12.5 { } (30,40,4)
Round 4 0 {1} (40,50,1)
),,( ik
ik
ik LEB
12.5 MB can
be sent
24
Traffic model
File arrival request ~ Poisson process File size( ) distribution ~ the bounded Pareto distribution,
where , the shape parameter (for the entire simulation 1.1), k , and p are the lower and upper bounds, respectively, of allowed file-size range
~ varies depending on the simulation settings
,
1
)(1
p
k
xkxf X
pxk
iRmax
iF
25
Validation of simulation against analytical results
Assumptions:
1. All file requests set their to match the link capacity, C.
2. Arrival rate is Poisson process
3. Service rate is bounded Pareto distribution
4. k = 500MB and p = 100GB
iRmax
)1(2
][)(
2
XE
WEAnalytical model: ,
2222 11
2)/(1(
1][
pkpk
k
CXEwhere
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
System load
File
late
ncy
(sec
)
Analytical model Simulation
26
Sensitivity analysis
We carry out four experiments:
(1) To understand the impact of when all calls request the constant
(2) To understand the impact of the allowed file-size range (i.e., parameters (k
and p))
(3) To understand the impact of the when calls request the three different
values of (i.e., (1,2,4) and (1,5,10))
(4) To understand the impact of the size of
iRmaxiRmax
iRmax
iRmax
discreteT
27
Sensitivity analysis Con’t
First Experiment: k = 500MB, p = 100GB, and (1, 5, 10, and 100 channels)
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
System load
File
late
ncy (
sec)
Rimax
= 100 channels
Rimax
= 1 channel
Rimax
= 5 channels
Rimax
= 10 channels
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
System load
Mean f
ile t
ransfe
r dela
y (
sec)
Rimax
= 100 channels
Rimax
= 1 channel
Rimax
= 5 channels
Rimax
= 10 channels
File latency: the mean waiting time across all files transferred Mean file transfer delay: file latency + mean service time (transmission delay)
File latency comparison Mean file transfer delay comparison
iRmax
28
Sensitivity analysis Con’t
Second Experiment:
Case 1: k = 500MB, p = 100GB, = 1.1
Case 2: k = 10GB, p = 100GB, = 1.1
Question: In which case is the variance is larger at first glance?
22
2
))(()(
)1(2
)()(
XEXEXE
XEWE
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
System load
File
late
ncy (
sec)
Rimax
= 10 channels (k = 10GB)
Rimax
= 5 channels (k = 10GB)
Rimax
= 1 channel (k = 10GB)
Rimax
= 10 channels (k = 500MB)
Rimax
= 5 channels (k = 500MB)
Rimax
= 1 channel (k = 500MB)
0 10 20 30 40 5010
0
101
102
103
104
105
106
|
bin-number
Fre
quen
cy
|
Bin 9 (Mean file size = 24.6 GB)
Bin 1 (Mean file size = 2.27 GB)
k = 10GB; p = 100GB; alpha = 1.1
k = 500MB; p = 100GB; alpha = 1.1
Case 1
Case 2
Case 1
Case 2
29
Sensitivity analysis Con’t
Third ExperimentCase 1: (per-channel rate) = 10Gbps, C = 1Tbps (100 channels),
Case 2: (per-channel rate) = 1Gbps, C =100Gbps (100 channels),
File throughput: long-run average of the file size divided by the file transfer delay
channelsorR i 4,2,1max
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 15
10
15
20
25
30
35
40
System load
File
thr
ough
put,
Gbp
s
Rimax
= 1 channel
Rimax
= 2 channels
Rimax
= 4 channels
Case 1 Case 2
channelsorR i 10,5,1max
0 10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7
8
9
10
System load
File
thro
ughp
ut, G
bps
Rimax
= 1 channel
Rimax
= 5 channels
Rimax
= 10 channels
30
Sensitivity analysis Con’t
1max iR
Fourth ExperimentAssumptions: 1. all calls request the same and the link capacity C = 100 channels
2. vary the value of the discrete time unit ( Tdiscrete) as 0.05, 0.5, 1, and 2 sec.
0 0.2 0.4 0.6 0.8 10
10
20
30
40
50
60
70
80
90
100
System load
File
thro
ughput,
Gbps
Tdiscrete
= 50msec
Tdiscrete
= 0.5sec
Tdiscrete
= 1secTdiscrete
= 1sec
Tdiscrete
= 2sec
0 0.2 0.4 0.6 0.8 10
10
20
30
40
50
60
70
80
90
100
System load
Utiliz
ation (
%)
Tdiscrete
= 50msec
Tdiscrete
= 0.5sec
Tdiscrete
= 1sec
Tdiscrete
= 2sec
1max iR
31
Comparison of VBLS with FBLS and PSBasic simulation setup:
File arrival requests ~ Poisson process Per channel rate = 10Gbps, = 1, 5, or 10 channels (30%,
30%, and 40%) Bound Pareto input parameters: = 1.1, k = 5MB, and p = 1GB Packet-switched system: files are divided into packets (1500B), and
arrive at the infinite packet buffer at a constant packet rate equal to divided by the packet length
iRmax
iRmax
32
Comparison of VBLS with FBLS and PS Con’t
1max iR 5max
iR 10max iR
• The performance of the VBLS scheme has proved to be much better than the FBLS scheme
• The throughput performance of VBLS is indistinguishable from packet switching. This serves to illustrate our main point – that by taking into account file sizes and varying the bandwidth allocation for each transfer over its transfer duration, we mitigate the performance degradation usually associated with circuit-based methods.
• This work was presented at GAN’04[3] and PFLDNET’04[4] and will be published in ICC2004 [5].
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11
2
3
4
5
6
7
8
9
10
System load
File
thro
ughp
ut, G
bps
FBLS
VBLS
PS
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
5
10
15
20
25
30
35
40
45
50
System load
File
Thr
ough
put,
Gbp
s
FBLS
VBLS
PS
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
10
20
30
40
50
60
70
80
90
100
System load
File
thro
ughp
ut, G
bps
FBLS
VBLS
PS
33
Call scheduling on multiple-link case
Multiple-link cases Centralized online greedy scheme
Create a new reflecting the available bandwidth for all links
Distributed online greedy scheme Needs some kind of mechanism to merge
TRC and TRL vectors for multiple switches Practical issues
Clock synchronization Propagation delay
)(tnew
34
Some additional notations for multiple-link case
Symbol Meaning
Time-range-capacity allocation: Capacity is assigned to call i in time range k starting at and ending at by switch n . Since the number of time ranges can change from link to link, we add the subscript n to .
Time range capacity allocation: Capacity is to be released i starting at and ending at at switch (n-1)
M Multiplicative factor used in reserving TRCs; if 5, then TRC vector reserved is 5 times the TRC allocation needed to transfer the file
)},,2,1(),,,1(),,,{( NnkCEB
TRCin
ifkn
ifkn
ifkn
inf
ikC
ifknB if
knE
in
)},,2,1(),,,1(),,,{( NnkCEB
TRCin
irkn
irkn
irkn
inr
ikC
irknB ir
knE
35
VBLS example for M = 1 by figure
SW1
Ch. 1
Ch. 2S1
D1Ch. 3
Ch. 4
t=1t=2 t=3t=4 t=5
)(1 t
time
1
2
3
4
:Available time ranges
SW2 SW3
time
1
2
3
4
t=1t=2t=3t=4 t=5
)(2 t
t=6
t=6
)1,2,1,3( 1max
11 MRTF req fTRC1
X (blocked)
36
VBLS example for M = 2 by figure
SW1
Ch. 1
Ch. 2S1
D1Ch. 3
Ch. 4
t=1t=2t=3t=4 t=5
)(1 t
time
1
2
3
4
:Available time ranges
SW2 SW3
time
1
2
3
4
t=1t=2t=3t=4 t=5
)(2 t
t=6 t=6
)2,2,1,3( 1max
11 MRTF req fTRC1fTRC2
rTRC1
37
Traffic model
Source DestSW 1 SW 2 SW 3
Src1 Src2
Dest1 Dest2
Interference traffic Interference traffic
Study traffic
Bounded Pareto input parameters: = 1.1, k = 500MB, and p = 100GB
Study traffic: the mean call interarrival time
used by Source is 10 files/sec (constant)
Interference traffic: the mean call interarrival
times used for the interference traffic are
varied (5, 10, 15, 20, 25, 30, 35, and 40
file/sec)
38
Sensitivity analysis
We carry out two experiments:
(1) To understand the impact of M (Multiplicative factor)
M = 2, 3, and 4
(2) To understand the impact of the discrete time unit (Tdiscrete)
Tdiscrete = 0.01, 0.1, and 1 sec
39
Sensitivity analysis Con’t
First experiment (Impact of M): varies the size of M as 2, 3, and 4, but fixes propagation delay and Tdiscrete as 5ms and 10ms respectively.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
10
20
30
40
50
60
70
80
90
100
Interference traffic load
Blo
cked
cal
ls(%
)
M = 2 M = 3
M = 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
1
2
3
4
5
6
7
8
9
10
Interference traffic load
File
thro
ughp
ut, G
bps
M = 2
M = 3
M = 4
Percentages of blocked calls comparison File throughput comparison
40
Sensitivity analysis Con’t
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
10
20
30
40
50
60
70
80
90
100
Interference traffic load
Blo
cked
cal
ls(%
)
Tdiscrete
= 0.01sec
Tdiscrete
= 0.1sec
Tdiscrete
= 1sec
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
1
2
3
4
5
6
7
8
9
10
Interference traffic load
File
thro
ughp
ut, G
bps
Tdiscrete
= 0.1sec
Tdiscrete
= 1sec
Tdiscrete
= 0.01sec
Second experiment (Impact of Tdiscrete): varies the size of Tdiscrete as 0.01, 0.1, and 1 sec, but fixes propagation delay and M as 5ms and 3 respectively.
Percentages of blocked calls comparison File throughput comparison
41
Future work
We can include a second class of user requests specifically targeted at interactive applications (long-holding time applications), i.e., remote visualization and simulation steering. Such requests will be specified as
The simulation results for the multiple-link case are only preliminary More possible sets of comparison via simulations Varying the propagation delays for the links, but fixing other
parameters such as M and Tdiscrete
Comparison between TCP/IP(FAST TCP) and VBLS scheme Assume finite buffer instead of infinite buffer Take into account the effect of congestion control,
retransmission mechanism when the packet loss exists due to the buffer overflow
Might degrade the performance of packet-switched system
),,,( maxmini
reqiii TRRH
42
References
1. M. Veeraraghavan, H. Lee and X. Zheng, “File transfers across optical circuit-switched networks,” PFLDnet 2003, Feb. 3-4, 2003, Geneva, Switzerland.
2. M. Veeraraghavan, X. Zheng, H. Lee, M. Gardner, W. Feng, "CHEETAH: Circuit-switched High-speed End-to-End Transport ArcHitecture,” accepted for publication in the Proc. of Opticomm 2003, Oct. 13-17, Dallas, TX.
3. H. Lee, M. Veeraraghavan, , E.K.P. Chong, H. Li, “Lambda scheduling algorithm for file transfers on high-speed optical circuits,” Workshop of Grids and Advanced Networks (GAN’04), April 19-22, 2004, Chicago, Illinois.
4. M. Veeraraghavan, X. Zheng, W. Feng, H. Lee, E. K. P. Chong, and H. Li, “Scheduling and Transport for File Transfers on High-speed Optical Circuits,” PFLDNET 2004, Feb. 16-17, 2004, Argonne, Illinois, http://www.-didc.lbl.gov/PFLDnet2004/.
5. M. Veeraraghavan, H. Lee, E.K.P. Chong, H. Li, “A varying-bandwidth list scheduling heuristic for file transfers,” in Proc of ICC2004, June 20-24, Paris, France.
6. M. Veeraraghavan, X. Zheng, W. Feng H. Lee, E.K.P. Chong, H. Li, “Scheduling and Transport for File Transfers on High-speed Optical Circuits,” JOGC 2004 (Journal of Grid Computing)
43
Thank you!