28
Iroko A Data Center Emulator for Reinforcement Learning Fabian Ruffy, Michael Przystupa, Ivan Beschastnikh University of British Columbia https://github.com/dcgym/iroko

A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

IrokoA Data Center Emulator

for Reinforcement Learning

Fabian Ruffy, Michael Przystupa, Ivan Beschastnikh

University of British Columbia

https://github.com/dcgym/iroko

Page 2: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

2

Reinforcement Learning and Networking

Page 3: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

3

Reinforcement Learning and Networking

Page 4: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

4

Reinforcement Learning and Networking

Page 5: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

5

Reinforcement Learning and Networking

Page 6: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

6

Reinforcement Learning and Networking

Page 7: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

7

Reinforcement Learning and Networking

Page 8: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

8

• DC challenges are optimization problems

• Traffic control

• Resource management

• Routing

• Operators have complete control

• Automation possible

• Lots of data can be collected

The Data Center: A perfect use case

Cho, Inho, Keon Jang, and Dongsu Han.

"Credit-scheduled delay-bounded congestion control for datacenters.“

SIGCOMM 2017

Page 9: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

9

• Typical reinforcement learning is not viable for data center operators!

• Fragile stability

• Questionable reproducibility

• Unknown generalizability

• Prototyping RL is complicated

• Cannot interfere with live production traffic

• Offline traces are limited in expressivity

• Deployment is tedious and slow

Two problems…

Page 10: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

10

• Iroko: open reinforcement learning gym for data center scenarios

• Inspired by the Pantheon* for WAN congestion control

• Deployable on a local Linux machine

• Can scale to topologies with many hosts

• Approximates real data center conditions

• Allows arbitrary definition of

• Reward

• State

• Actions

Our work: A platform for RL in Data Centers

*Yan, Francis Y., et al. "Pantheon: the training ground for Internet congestion-control research.“ ATC 2018

Page 11: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

11

Iroko in one slide

Page 12: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

12

Iroko in one slide

Topology

Fat-TreeRack Dumbbell

Page 13: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

13

Iroko in one slide

Topology

Fat-TreeRack Dumbbell

Traffic Pattern Action Model

Page 14: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

14

Iroko in one slide

Topology

Fat-TreeRack Dumbbell

Data Collectors

Traffic Pattern Action Model

Page 15: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

15

Iroko in one slide

Topology

Fat-TreeRack Dumbbell

Data Collectors

Reward Model State Model

Traffic Pattern Action Model

Page 16: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

16

Iroko in one slide

OpenAI Gym

Topology

Fat-TreeRack Dumbbell

Data Collectors

Reward Model State Model

Traffic Pattern Action Model

Page 17: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

17

Iroko in one slidePolicy

OpenAI Gym

Topology

Fat-TreeRack Dumbbell

Data Collectors

Reward Model State Model

Traffic Pattern Action Model

Page 18: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

18

• Ideal data center should have:

• Low latency, high utilization

• No packet loss or queuing delay

• Fairness

• CC variations draw from the reactive TCP

• Queueing latency dominates

• Frequent retransmits reduce goodput

• Data center performance may be unstable

Use Case: Congestion Control

Page 19: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

19

1010 10

10 10 10

Bandwidth

AllocationData CollectionFlow Pattern

Switch

Policy10

Predicting Networking Traffic

Page 20: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

20

1010 10

10 10 10

Bandwidth

AllocationData CollectionFlow Pattern

Switch

Policy10

Predicting Networking Traffic

Page 21: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

21

1010 10

10 10 10

Bandwidth

AllocationData CollectionFlow Pattern

Switch

Policy10

Predicting Networking Traffic

Page 22: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

22

3.3

3.3 3.4

1010 10

10 10 10

Bandwidth

AllocationData CollectionFlow Pattern

Switch

Policy10

Predicting Networking Traffic

Page 23: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

23

103.3 10

3.3 3.4 10

Bandwidth

AllocationData CollectionFlow Pattern

Switch

Policy10

Predicting Networking Traffic

Page 24: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

24

• Two environments:

• env_iroko: centralized rate limiting arbiter

• Agent can set the sending rate of hosts

• PPO, DDPG, REINFORCE

• env_tcp: raw TCP

• Contains implementations of TCP algorithms

• TCP Cubic, TCP New Vegas, DCTCP

• Goal: Avoid congestion

Can we learn to allocate traffic fairly?

Page 25: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

25

• 50000 timesteps

• Linux default UDP as base transport

• 5 runs (~7 hours per run)

• Bottleneck at central link

Experiment Setup

Page 26: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

Results – Dumbbell UDP

Page 27: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

27

• Challenging real-time environment

• Noisy observation

• Exhibits strong credit assignment problem

• RL algorithms show expected behavior for our gym

• Achieve better performance than TCP New Vegas

• More robust algorithms required to learn good policy

• DDPG and PPO achieve near optimum

• REINFORCE fails to learn good policy

Results - Takeaways

Page 28: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

28

• Data center reinforcement learning is gaining traction

…but it is difficult to prototype and evaluate

• Iroko is

• a platform to experiment with RL for data centers

• intended to train on live traffic

• early stage work

• but experiments are promising

• available on Github:

https://github.com/dcgym/iroko

Contributions