28
Scheduling Mix-flows in Commodity Datacenters with Karuna Li Chen, Kai Chen, Wei Bai, Mohammad Alizadeh (MIT) SING Group, CSE Department Hong Kong University of Science and Technology

Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Embed Size (px)

Citation preview

Page 1: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Scheduling Mix-flows in Commodity Datacenterswith Karuna

Li Chen, Kai Chen, Wei Bai, Mohammad Alizadeh (MIT)SING Group, CSE Department

Hong Kong University of Science and Technology

Page 2: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Datacenter Transport

• Deadline flows• Meeting deadlines• D3, D2TCP, …

• General (non-deadline) flows• Reduce flow completion time (FCT).• pFabric, PDQ, PASE, PIAS, …

We investigate a practical, yet neglected, problem: Coexistence of deadline and non-deadline flowsMix-flow Scheduling

2

Page 3: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Prior solutions do not work for mix-flowsShortest Job First (SJF) Scheduling – pFabric, PASE, PIAS, PDQ

Deadline flows

Non-deadline flows

End-host End-host

t0

0.1

0.2

0.3

0.4

0.5

0 5 10 15 20

Frac

tion

Percentage of non-deadline flows smaller than deadline flows.

Deadline Miss Rate

Scheduling only with sizes hurts deadline flowsProblem: unawareness of deadlines.

3

Flow Priority = Remaining size

Page 4: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Prior solutions do not work for mix-flowsEarliest Deadline First Scheduling – pFabric, PASE, PIAS, PDQ

Deadline flows

Non-deadline flows

End-host End-host

t

Prioritizing deadline flows hurts non-deadline flows, especially short ones.Problem: Existing transports for deadline flows unnecessarily takes all bandwidth.

0

5

10

15

20

0 1 2 3 4 5 6 7 8

ms

Percentage of deadline flows in overall traffic

99 Percentile FCT

Non-deadline: Overall Non-deadline: Size<10KB

4

Flow Priority = Time till Deadline

DeadlineDeadline

Page 5: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

How to schedule mix-flows?

5

Deadline Flows•Meet deadlines•Flow deadline à Priority

Non-deadline Flows•Reduce FCT•Flow Size à Priority

Page 6: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Karuna

• Deadline flows • High priority with minimal bandwidth to complete just before

deadlines.• Non-deadline flows

• Low priority but take all available bandwidth to reduce FCT.

Deadline Flows

Non-deadline Flows

MCP

Work Conserv.Transport

Highest Priority

Priority 2

Priority 3

Priority K

End-host Network Fabric 6

SJF

Key Insight: Deadline flows should minimally impact non-deadline flows.

Page 7: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

MCP for deadline flows: Completing deadlines with minimal bandwidthMinimal-impact Congestion control Protocol

7

Deadline flows Non-deadline flows Implementation Evaluation

Page 8: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

MCP: Formulation and solution

• Objective à Minimal impact• Per-packet latency

• Constraints:• Meet deadlines• Network capacity

Stochastic Optimization

LyapunovOptimization

Framework [1]Convex

Optimization Primal SolutionPer-flow

congestion window update

function

[1] M. J. Neely. Stochastic Network Optimization with Application to Communication and Queueing Systems, Morgan & Claypool, 2010.8

Page 9: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

MCP: Formulation and solution

• Objective à Minimal impact• Per-packet latency

• Constraints • Meet deadlines• Network capacity

• Solutionà Near-deadline completion

9

Rate

t

Link CapRate

tDeadline

Stochastic Optimization

LyapunovOptimization

Framework [1]Convex

Optimization Primal SolutionPer-flow

congestion window update

function

Page 10: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Reducing FCT for non-deadline flowsMimicking SJFNon-deadline flows with/out known sizes

10

Deadline flows Non-deadline flows Implementation Evaluation

Page 11: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Non-deadline flows with unknown size

• PIAS [2] is best known scheme.

[2] Wei Bai, et. al., Information-Agnostic Flow Scheduling for Commodity Data Centers, USENIX NSDI 2015

Highest Priority

2nd

Highest Priority

Lowest Priority

Send packets tagged with the highest priority untilα#bytes sent.

Send packets tagged with 2nd highest priority untilα$bytes sent.

Send packets tagged with the lowest priority.

Flows

11

Page 12: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Karuna for non-deadline flows

• Non-deadline flows with unknown size ç PIAS• Non-deadline flows with known size

• Karuna extends PIAS to schedule flows with/out known sizes.

Sum of Linear Ratios Problem

(PIAS)

Reformulation to include flows with known

sizes

Quadratic Sum of Ratios Problem(Karuna)

Demotion Thresholds: {𝛼'} Demotion Thresholds: {𝛼'}Splitting Thresholds: {𝛽'}

12

Page 13: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Karuna for non-deadline flows: mimicking SJF

PIASPriority 2

Priority 3

Priority KSize ≤ 𝛽#

𝛽# < Size ≤ 𝛽$

𝛽,-# < Size

Flow with known sizes

Flow without known sizes

13

…End-host Network Fabric

Page 14: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Implementation

14

Deadline flows Non-deadline flows Implementation Evaluation

Page 15: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Implementation

Flow size

Socket

Deadline

SO_MARK

setsockopt()

Information passing

Pass flow information (deadline, size) to the kernel using SO_MARK

End-host Network Fabric15

Page 16: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Implementation

Flow size

Socket

Deadline

SO_MARK

setsockopt()

Tc module

pkt

Tagged pkt

Information PassingPacket tagging

TC module at the sender-side.Tag DSCP fields in packet headers based on thresholds.

End-host Network Fabric16

Page 17: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Implementation

Flow size

Socket

Deadline

SO_MARK

setsockopt()

Tc module

pkt

Tagged pkt

Modulate Congestion

Window with MCP

Information PassingPacket taggingRate control

TC module.Non deadline flows use DCTCPModifies window size using MCP.

End-host Network Fabric17

DCTCP

Page 18: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Implementation

Flow size

Socket

Deadline

SO_MARK

setsockopt()

Tc module

pkt

Tagged pkt

Strict Priority QueueingModulate Congestion

Window with MCP

Information PassingPacket taggingRate controlSwitch configuration

ECN marking.Priority Queueing(priorities mapped to DSCP fields).

End-host Network Fabric18

Strict Priority Queueing

Strict Priority Queueing

DCTCP

Page 19: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

EvaluationTestbed ExperimentsSimulations

19

Deadline flows Non-deadline flows Implementation Evaluation

Page 20: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Evaluation: Testbed Experiments

• Setup• 16 servers• A Gigabit Pronto-3295 switch• 8 Priority queues mapped to DSCP• RTT ~100us• Karuna kernel module

• Traffic trace• Web search (DCTCP [3])• Data mining (VL2 [4])

20

[3] Alizadeh, Mohammad, et al. "Data center tcp (dctcp)." ACM SIGCOMM computer communication review. Vol. 40. No. 4. ACM, 2010.[4] Greenberg, Albert, et al. "VL2: a scalable and flexible data center network."ACM SIGCOMM computer communication review. Vol. 39. No. 4. ACM, 2009.

Page 21: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Testbed Experiments: Deadline FlowsFlow Size Deadline Start Time1 14.4MB 20ms 0ms2 48MB 120ms 0ms3 3MB 5ms 50ms4 0.5MB 10ms 80ms

0

200

400

600

800

1000

1200

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101

103

105

107

109

111

113

115

117

119

121

Mbp

s

Time (ms)

DCTCP

Flow 1 Flow 2 Flow 3 Flow 4Deadline Missed

for Flow 1

21

Page 22: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Testbed Experiments: Deadline FlowsFlow Size Deadline Start Time1 14.4MB 20ms 0ms2 48MB 120ms 0ms3 3MB 5ms 50ms4 0.5MB 10ms 80ms

0

200

400

600

800

1000

1200

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101

103

105

107

109

111

113

115

117

119

121

Mbp

s

Time (ms)

pFabric – Earliest Deadline First

Flow 1 Flow 2 Flow 3 Flow 4

22

Flow 1 deadline Flow 2 deadlineFlow 4 deadlineFlow 3 deadline

Page 23: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Testbed Experiments: Deadline FlowsFlow Size Deadline Start Time1 14.4MB 20ms 0ms2 48MB 120ms 0ms3 3MB 5ms 50ms4 0.5MB 10ms 80ms

0

200

400

600

800

1000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101

103

105

107

109

111

113

115

117

119

121

Mbp

s

Time (ms)

Karuna

Flow 1 Flow 2 Flow 3 Flow 4

23

Page 24: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Testbed Experiments: Deadline FlowsFlow Size Deadline Start Time1 14.4MB 20ms 0ms2 48MB 120ms 0ms3 3MB 5ms 50ms4 0.5MB 10ms 80ms

Karuna completes deadline flow just before deadline, leaving bandwidth for non-deadline flows.

0100200300400500600700800900

1000

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

Karuna

0

200

400

600

800

1000

1200

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

pFabric – Earliest Deadline First

0

200

400

600

800

1000

1200

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

DCTCP

24

Page 25: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Testbed Experiments: Non-deadline Flows

Mimics shortest job first scheduling for non-deadline flows.

0.9162.7211.716

5.5548.04

28.725

0

5

10

15

20

25

30

35

0-100KB (Avg) 0-100KB (99th)

ms

Flow Size

FCT

Karuna DCTCP TCP

81.729

104.629

120.286

0

20

40

60

80

100

120

140

100KB-10MB (Avg)

ms

Flow Size

FCT

Karuna DCTCP TCP

851.72 718.3

69 608.04

0100200300400500600700800900

1000

>10MB (Avg)

ms

Flow Size

FCT

Karuna DCTCP TCP

61.133

67.019

73.634

0

10

20

30

40

50

60

70

80

Overall

ms

FCT

Karuna DCTCP TCP

25

Page 26: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Evaluation: Simulations

• Simulation Setup• Spine-leaf with 144 servers• 10G Server-ToR links• 40G ToR-Spine links

• Compare with:• D3• D2TCP• pFabric - EDF

26

Page 27: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Large-scale Simulations: Key Benefit of Karuna

0

2

4

6

8

10

0.8 0.85 0.9 0.95

%

Deadline traffic load

Deadline Miss Rate

D3 D2TCPpFabric (EDF) Karuna

0

5

10

15

20

0.8 0.85 0.9 0.95m

s

Deadline traffic load

95th Percentile FCT

D3 D2TCPpFabric Karuna

0100002000030000400005000060000

0.8 0.85 0.9 0.95

#

Deadline traffic load

# Completed non-deadline flows

D3 D2TCP pFabric Karuna

Reducing completion times of non-deadline flows while completing deadline flows.27

>100x

Page 28: Scheduling Mix-flows in Commodity Datacenters with Karunaconferences.sigcomm.org/sigcomm/2016/files/program/sigcomm/Session... · Scheduling Mix-flows in Commodity Datacenters with

Concluding remarks

• Filling a gap in datacenter flow scheduling• Karuna

• Prioritizes deadline flows but control their rates.• Uses the remaining bandwidth to schedule non-deadline flows

based on size.

• Thank you! Q & A!

28