Download ppt - Scheduling file transfers on a circuit-switched network

1

Scheduling file transfers on a circuit-switched network

Student: Hojun Lee

Advisor: Professor M. Veeraraghavan

Committee: Professor E. K. P. Chong

Professor S. Torsten

Professor S. Panwar

Professor M. Veeraraghavan

Date: 5/10/04

2

Problem statement

Increasing file sizes(e.g., multimedia, eScience: particle physics)

Increasing Link rates (e.g., optical fiber)

Current protocols (e.g., tcp) do not exploit high bandwidth to decrease file transfer delay

example) Current TCP connection with:

1) 1500B (MTU), 2)100 ms round-trip time(RTT), and

3) a steady throughput of 10Gbps; would require

at most one packet drop every 5,000,000,000 packets (not realistic)

p

C

RTT

MSSThroughput

23C(p = packet loss rate and )

3

Solutions to this problem

Limit upgrades to end hosts Scalable TCP (Kelly) High speed TCP (Floyd) FAST TCP (Low, et al.)

Upgrade routers within the Internet Larger Maximum Transmission Unit (MTU) ~ proposed by

Mathis

4

Our proposed solution: Circuit-switched High-speed End-to-End Transport ArcHitecture (CHEETAH)

End-to-end circuit setup

and release dynamically CHEETAH: add-on basis

to current Internet

5

File transfers using CHEETAH Set up circuit transfer file release circuit

Do not keep circuit open during user think time

Only unidirectional circuit used Utilization reasons

Mode of operation of circuit-switched network Call-blocking mode

“All-or nothing” full-bandwidth allocation approach Attempt a circuit setup:

If it succeeds end host will enjoy a much shorter file-transfer delay than on the TCP/IP path

If it fails fall back to the TCP/IP path Call-queueing mode

6

Analytical model for blocking mode (mean delay if the circuit setup is attempted):

(1)

= call blocking probability, = the mean call-setup delay,

= time to transfer a file, ~=

: ~ Padhye et al., Cardwell et al. (Modeling TCP Latency)

~ function of RTT, bottleneck link rate r,

packet loss , and round-trip propagation delay

cheetahTE

])[][()][)(1(][ tcpfailbtransfersetupbcheetah TETEPTTEPTE

bP ][ setupTE

transferT ][ failTE

][ tcpTE

][ setupTE

lossP propT

7

Routing decision

Compare with

resort directly to the TCP/IP path

attempt circuit setup

][ cheetahTE ][ tcpTE

][][1

][~

][][ If

cheetahtcpb

setup

tcpcheetah

TETEP

TE

TETE

][][1

][~

][][ If

cheetahtcpb

setup

tcpcheetah

TETEP

TE

TETE

8

File transfer delays for large files( 1GB and 1TB) over TCP/IP path

9

Numerical results for transfer delays of file size [5MB – 1GB]

Link rate = 100 Mbps, , k = 207.0 spsig

Should always attempt a circuit setup for these parameters

10

Numerical results for transfer delays of file size [5MB – 1GB] Con’t

Link rate = 1 Gbps, , k = 207.0 spsig

Cross over file size exists for small propagation delay environment

11

Crossover file sizes

Measure of loading on

Pb= 0.01 Pb= 0.1 Pb= 0.3

Ploss = 0.001

22 MB 24 MB 30 MB

Ploss = 0.001

9 MB 10 MB 12 MB

Ploss = 0.01

< 5MB < 5MB < 5MB

Measure of loading on

Pb= 0.01 Pb= 0.1 Pb= 0.3

Ploss = 0.001

2.4 MB 2.65 MB 3.4 MB

Ploss = 0.001

2 MB 2.2 MB 2.8 MB

Ploss = 0.01

500 KB 550 KB 650 kB

ckt. Sw.

network

TCP/IP path TCP/IP path

ckt. Sw.

network

r = 1 Gbps, Tprop = 0.1 ms, k = 20 r = 100 Mbps, Tprop = 0.1 ms, k = 20

For high propagation-delay environment, always attempt a circuit (utilization implications) This work was presented at PFLDNET2003 Workshop [1] and Opticomm2003 [2].

12

Motivation for call queueing Example: Large file transfer (1TB)

NetworkH DTCP/IP path

Circuit path

Call setup attempt Delay = 4 days 14.9 hours

Ploss = 0.0001, Tprop = 50ms

r = 1 Gbps,

1TB/1Gbps = 2.2 hours

13

Problem with call queueing Low bandwidth utilization

Reason: up-stream switches hold resources while for waiting for down-stream switches to admit a call instead of using the wait period to admit short calls that only traverse upstream segments

Host A Host BSwitch 1 Switch 2link 1 link 2

setup setup

The call waits (queues) until resources become

available on link 1, reserves and holds bandwidth

for this call until the call is setup all through

While call is being queue

for link 2 resources, link 1 resources

are idle

14

Idea! Use knowledge of file sizes to “schedule” calls Network knows

File sizes of admitted calls Bandwidth of admitted calls

When a new call arrives: The network can figure out when resources will become available

for the new call Network can schedule the new call for a delayed start and provide

this information to the requesting end host End host can then compare this delay with the expected delay on

TCP/IP path

15

Call scheduling on a single link

Main question: Since files can be transferred at any rate, what rate should the

network assign to a given file transfer?

16

One simple answer

In circuit switched networks, use fixed bandwidth allocation for the duration of a file transfer

TDM/FDM scheme Transmission capacity C (bits/sec) divided among n streams Transmission of a file of L(bits) will take Ln/C sec Even if other transfers complete before this transfer, bandwidth

cannot be increased for this particular transfer

Packet-switched system Statistical multiplexing

17

Our answer Greedy scheme ~ allocates maximum bandwidth available that is less

than of equal to , which is the maximum rate requested for call

Varying-Bandwidth List Scheduling (VBLS): End host specifies the file size, maximum bandwidth limit and

a desired start time, and the network returns a time-range capacity allocation vector assigning varying bandwidth levels in different time ranges for the transfer

VBLS with Channel Allocation (VBLS/CA): Special case of practical interest Tracks actual channel allocations in different time ranges

iRmax i

18

Notation

Specified in call request i

Switch’s response

19

VBLS algorithm

Initialization step: set time , remaining file size

check for available bandwidth at (if find next change

point in curve), set “next change point”

Case 1 ( and can be transmitted before the next change point in curve)

set , , , ; Terminate loop

Case 2 ( and cannot be transmitted before the next

change point in curve)

set , “next change point” in curve,

set , “next change point” ,

continue repeat loop (go to Initialization step)

ireqTv iF

)(v

iRv max)(

v

0)( v

vB ik ))(/( vvE i

k )(vC ik ki

iRv max)(

)(t

)(tvB ik i

kE

)(t

)(vC ik

)()( vxvE ik v 1 kk

)(t

)(t

20

VBLS algorithm con’t

Case 3 ( and can be transmitted before the next change pint in curve)

set , , , ; Terminate loop

Case 4 ( and cannot be transmitted before the nextchange in curve)

set , “next change point” in curve,

set , “next change point” ,

continue repeat loop (go to Initialization step)

)(t

vB ik )/( max ii

k RvE

iik RC max

ki

iRv max)(

ikE

iik RC max

iik xRvE max)( v

1 kk

iRv max)(

)(t

vB ik )(t

21

Example of VBLS by figure

CircuitSwitch

S3

Shared single link

Ch. 1

Ch. 2

S1

S2 Ch. 3

Ch. 4

D

t=1 t=2 t=3 t=4 t=5)(t

time

1

2

3

4

:Available time ranges

)2,1,2( 1max

11 RTF req

TRC1

)2,1,2( 2max

22 RTF req

TRC2

)3,3,5( 3max

33 RTF req

TRC3

22

VBLS/CA algorithm Four additions:

1) Track channel availability with time for each channel in addition to tracking total available bandwidth curve ( ). Furthermore, track the channel availability in each time change point in

2) Track the set of open channels

- to save the switch programming time

3) If multiple channels are allocated with the same time range, we count each allocation as a separate entry in the Time-Range-channeL (TRL) vector

4) For many candidate channels, there are two rules: 1’st rule: If file transfer completes within a time range, choose

the channel with smallest leftover time 2nd rule: If file transfer does not complete within a time range,

choose the channel with largest leftover time

)(t)(t

23

Example of VBLS/CA

Parameter Value

10

2

75 MB

1 Gbps

ireqTiRmaxiF

leftover

(MB)

Copen TRLi

Round 0 75 { } { }

Round 1 50 {1,4} (10,20,1)

(10,20,4)

Round 2 25 {4} (20,30,1)

(20,30,4)

Round 3 12.5 { } (30,40,4)

Round 4 0 {1} (40,50,1)

),,( ik

ik

ik LEB

12.5 MB can

be sent

24

Traffic model

File arrival request ~ Poisson process File size( ) distribution ~ the bounded Pareto distribution,

where , the shape parameter (for the entire simulation 1.1), k , and p are the lower and upper bounds, respectively, of allowed file-size range

~ varies depending on the simulation settings

,

1

)(1

p

k

xkxf X

pxk

iRmax

iF

25

Validation of simulation against analytical results

Assumptions:

1. All file requests set their to match the link capacity, C.

2. Arrival rate is Poisson process

3. Service rate is bounded Pareto distribution

4. k = 500MB and p = 100GB

iRmax

)1(2

][)(

2

XE

WEAnalytical model: ,

2222 11

2)/(1(

1][

pkpk

k

CXEwhere

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

System load

File

late

ncy

(sec

)

Analytical model Simulation

26

Sensitivity analysis

We carry out four experiments:

(1) To understand the impact of when all calls request the constant

(2) To understand the impact of the allowed file-size range (i.e., parameters (k

and p))

(3) To understand the impact of the when calls request the three different

values of (i.e., (1,2,4) and (1,5,10))

(4) To understand the impact of the size of

iRmaxiRmax

iRmax

iRmax

discreteT

27

Sensitivity analysis Con’t

First Experiment: k = 500MB, p = 100GB, and (1, 5, 10, and 100 channels)

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

System load

File

late

ncy (

sec)

Rimax

= 100 channels

Rimax

= 1 channel

Rimax

= 5 channels

Rimax

= 10 channels

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

System load

Mean f

ile t

ransfe

r dela

y (

sec)

Rimax

= 100 channels

Rimax

= 1 channel

Rimax

= 5 channels

Rimax

= 10 channels

File latency: the mean waiting time across all files transferred Mean file transfer delay: file latency + mean service time (transmission delay)

File latency comparison Mean file transfer delay comparison

iRmax

28


Second Experiment:

Case 1: k = 500MB, p = 100GB, = 1.1

Case 2: k = 10GB, p = 100GB, = 1.1

Question: In which case is the variance is larger at first glance?

22

2

))(()(

)1(2

)()(

XEXEXE

XEWE

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

System load

File

late

ncy (

sec)

Rimax

= 10 channels (k = 10GB)

Rimax

= 5 channels (k = 10GB)

Rimax

= 1 channel (k = 10GB)

Rimax

= 10 channels (k = 500MB)

Rimax

= 5 channels (k = 500MB)

Rimax

= 1 channel (k = 500MB)

0 10 20 30 40 5010

0

101

102

103

104

105

106

|

bin-number

Fre

quen

cy

|

Bin 9 (Mean file size = 24.6 GB)

Bin 1 (Mean file size = 2.27 GB)

k = 10GB; p = 100GB; alpha = 1.1

k = 500MB; p = 100GB; alpha = 1.1

Case 1

Case 2

Case 1

Case 2

29


Third ExperimentCase 1: (per-channel rate) = 10Gbps, C = 1Tbps (100 channels),

Case 2: (per-channel rate) = 1Gbps, C =100Gbps (100 channels),

File throughput: long-run average of the file size divided by the file transfer delay

channelsorR i 4,2,1max

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 15

10

15

20

25

30

35

40

System load

File

thr

ough

put,

Gbp

s

Rimax

= 1 channel

Rimax

= 2 channels

Rimax

= 4 channels

Case 1 Case 2

channelsorR i 10,5,1max

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

7

8

9

10

System load

File

thro

ughp

ut, G

bps

Rimax

= 1 channel

Rimax

= 5 channels

Rimax

= 10 channels

30


1max iR

Fourth ExperimentAssumptions: 1. all calls request the same and the link capacity C = 100 channels

2. vary the value of the discrete time unit ( Tdiscrete) as 0.05, 0.5, 1, and 2 sec.

0 0.2 0.4 0.6 0.8 10

10

20

30

40

50

60

70

80

90

100

System load

File

thro

ughput,

Gbps

Tdiscrete

= 50msec

Tdiscrete

= 0.5sec

Tdiscrete

= 1secTdiscrete

= 1sec

Tdiscrete

= 2sec

0 0.2 0.4 0.6 0.8 10

10

20

30

40

50

60

70

80

90

100

System load

Utiliz

ation (

%)

Tdiscrete

= 50msec

Tdiscrete

= 0.5sec

Tdiscrete

= 1sec

Tdiscrete

= 2sec

1max iR

31

Comparison of VBLS with FBLS and PSBasic simulation setup:

File arrival requests ~ Poisson process Per channel rate = 10Gbps, = 1, 5, or 10 channels (30%,

30%, and 40%) Bound Pareto input parameters: = 1.1, k = 5MB, and p = 1GB Packet-switched system: files are divided into packets (1500B), and

arrive at the infinite packet buffer at a constant packet rate equal to divided by the packet length

iRmax

iRmax

32

Comparison of VBLS with FBLS and PS Con’t

1max iR 5max

iR 10max iR

• The performance of the VBLS scheme has proved to be much better than the FBLS scheme

• The throughput performance of VBLS is indistinguishable from packet switching. This serves to illustrate our main point – that by taking into account file sizes and varying the bandwidth allocation for each transfer over its transfer duration, we mitigate the performance degradation usually associated with circuit-based methods.

• This work was presented at GAN’04[3] and PFLDNET’04[4] and will be published in ICC2004 [5].

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11

2

3

4

5

6

7

8

9

10

System load

File

thro

ughp

ut, G

bps

FBLS

VBLS

PS

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

5

10

15

20

25

30

35

40

45

50

System load

File

Thr

ough

put,

Gbp

s

FBLS

VBLS

PS

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

10

20

30

40

50

60

70

80

90

100

System load

File

thro

ughp

ut, G

bps

FBLS

VBLS

PS

33

Call scheduling on multiple-link case

Multiple-link cases Centralized online greedy scheme

Create a new reflecting the available bandwidth for all links

Distributed online greedy scheme Needs some kind of mechanism to merge

TRC and TRL vectors for multiple switches Practical issues

Clock synchronization Propagation delay

)(tnew

34

Some additional notations for multiple-link case

Symbol Meaning

Time-range-capacity allocation: Capacity is assigned to call i in time range k starting at and ending at by switch n . Since the number of time ranges can change from link to link, we add the subscript n to .

Time range capacity allocation: Capacity is to be released i starting at and ending at at switch (n-1)

M Multiplicative factor used in reserving TRCs; if 5, then TRC vector reserved is 5 times the TRC allocation needed to transfer the file

)},,2,1(),,,1(),,,{( NnkCEB

TRCin

ifkn

ifkn

ifkn

inf

ikC

ifknB if

knE

in

)},,2,1(),,,1(),,,{( NnkCEB

TRCin

irkn

irkn

irkn

inr

ikC

irknB ir

knE

35

VBLS example for M = 1 by figure

SW1

Ch. 1

Ch. 2S1

D1Ch. 3

Ch. 4

t=1t=2 t=3t=4 t=5

)(1 t

time

1

2

3

4


SW2 SW3

time

1

2

3

4

t=1t=2t=3t=4 t=5

)(2 t

t=6

t=6

)1,2,1,3( 1max

11 MRTF req fTRC1

X (blocked)

36

VBLS example for M = 2 by figure

SW1

Ch. 1

Ch. 2S1

D1Ch. 3

Ch. 4

t=1t=2t=3t=4 t=5

)(1 t

time

1

2

3

4


SW2 SW3

time

1

2

3

4

t=1t=2t=3t=4 t=5

)(2 t

t=6 t=6

)2,2,1,3( 1max

11 MRTF req fTRC1fTRC2

rTRC1

37

Traffic model

Source DestSW 1 SW 2 SW 3

Src1 Src2

Dest1 Dest2

Interference traffic Interference traffic

Study traffic

Bounded Pareto input parameters: = 1.1, k = 500MB, and p = 100GB

Study traffic: the mean call interarrival time

used by Source is 10 files/sec (constant)

Interference traffic: the mean call interarrival

times used for the interference traffic are

varied (5, 10, 15, 20, 25, 30, 35, and 40

file/sec)

38

Sensitivity analysis

We carry out two experiments:

(1) To understand the impact of M (Multiplicative factor)

M = 2, 3, and 4

(2) To understand the impact of the discrete time unit (Tdiscrete)

Tdiscrete = 0.01, 0.1, and 1 sec

39


First experiment (Impact of M): varies the size of M as 2, 3, and 4, but fixes propagation delay and Tdiscrete as 5ms and 10ms respectively.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

10

20

30

40

50

60

70

80

90

100

Interference traffic load

Blo

cked

cal

ls(%

)

M = 2 M = 3

M = 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

1

2

3

4

5

6

7

8

9

10


File

thro

ughp

ut, G

bps

M = 2

M = 3

M = 4

Percentages of blocked calls comparison File throughput comparison

40


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

10

20

30

40

50

60

70

80

90

100


Blo

cked

cal

ls(%

)

Tdiscrete

= 0.01sec

Tdiscrete

= 0.1sec

Tdiscrete

= 1sec

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

1

2

3

4

5

6

7

8

9

10


File

thro

ughp

ut, G

bps

Tdiscrete

= 0.1sec

Tdiscrete

= 1sec

Tdiscrete

= 0.01sec

Second experiment (Impact of Tdiscrete): varies the size of Tdiscrete as 0.01, 0.1, and 1 sec, but fixes propagation delay and M as 5ms and 3 respectively.

Percentages of blocked calls comparison File throughput comparison

41

Future work

We can include a second class of user requests specifically targeted at interactive applications (long-holding time applications), i.e., remote visualization and simulation steering. Such requests will be specified as

The simulation results for the multiple-link case are only preliminary More possible sets of comparison via simulations Varying the propagation delays for the links, but fixing other

parameters such as M and Tdiscrete

Comparison between TCP/IP(FAST TCP) and VBLS scheme Assume finite buffer instead of infinite buffer Take into account the effect of congestion control,

retransmission mechanism when the packet loss exists due to the buffer overflow

Might degrade the performance of packet-switched system

),,,( maxmini

reqiii TRRH

42

References

1. M. Veeraraghavan, H. Lee and X. Zheng, “File transfers across optical circuit-switched networks,” PFLDnet 2003, Feb. 3-4, 2003, Geneva, Switzerland.

2. M. Veeraraghavan, X. Zheng, H. Lee, M. Gardner, W. Feng, "CHEETAH: Circuit-switched High-speed End-to-End Transport ArcHitecture,” accepted for publication in the Proc. of Opticomm 2003, Oct. 13-17, Dallas, TX.

3. H. Lee, M. Veeraraghavan, , E.K.P. Chong, H. Li, “Lambda scheduling algorithm for file transfers on high-speed optical circuits,” Workshop of Grids and Advanced Networks (GAN’04), April 19-22, 2004, Chicago, Illinois.

4. M. Veeraraghavan, X. Zheng, W. Feng, H. Lee, E. K. P. Chong, and H. Li, “Scheduling and Transport for File Transfers on High-speed Optical Circuits,” PFLDNET 2004, Feb. 16-17, 2004, Argonne, Illinois, http://www.-didc.lbl.gov/PFLDnet2004/.

5. M. Veeraraghavan, H. Lee, E.K.P. Chong, H. Li, “A varying-bandwidth list scheduling heuristic for file transfers,” in Proc of ICC2004, June 20-24, Paris, France.

6. M. Veeraraghavan, X. Zheng, W. Feng H. Lee, E.K.P. Chong, H. Li, “Scheduling and Transport for File Transfers on High-speed Optical Circuits,” JOGC 2004 (Journal of Grid Computing)

43

Thank you!