Upload
sangjin-han
View
533
Download
0
Embed Size (px)
DESCRIPTION
Presented at the 12th ACM Workshop on Hot Topics in Networks (HotNets XII) Datacenters have traditionally been a collection of individual servers, each of which aggregates a fixed amount of compute, memory, storage, and network resources as an independent physical entity. Extrapolating from recent trends, we envisage that future datacenters will be architected in a drastically different manner: all computational resources within a server will be disaggregated into standalone blades, and the datacenter network will directly interconnect them. This is what we call "Disaggregated Datacenters". This presentation briefly sketches why and how this transition will happen. In particular, we focus on the role of network fabric in such disaggregated datacenters, as it will face a set of challenges brought by the new datacenter architecture. The paper can be found here: http://www.eecs.berkeley.edu/~sangjin/static/pub/hotnets2013_ddc.pdf
Citation preview
Network Support���for Resource Disaggregation ���
in Next-Generation Datacenters
Sangjin Han Norbert Egi Aurojit Panda���Sylvia Ratnasamy Guangyu Shi Scott Shenker ���
���UC Berkeley Futurewei Technologies ICSI
1
Future datacenters will look fundamentally different.
There will be no “servers”.
Like it or not. 2
Outline:
1. What will happen? 2. Why will it happen? 3. How will it happen?
3
Today: Server-Centric Architecture
4
Datacenter���Network
Intra-server���Network
Datacenter���Network
Tomorrow: Resource-Centric Architecture
5
All resources are ���individually addressable
The Trends: Disaggregation
6
1. HP MoonShot – Shared cooling/casing/power/mgmt for server blades
The Trends: Disaggregation
7
1. HP MoonShot 2. AMD SeaMicro
– Virtualized I/O
Datacenter���Network
The Trends: Disaggregation
8
1. HP MoonShot 2. AMD SeaMicro 3. Intel Rack Scale Architecture
The Trends: Disaggregation
9
1. HP MoonShot 2. AMD SeaMicro 3. Intel Rack Scale Architecture 4. Open Compute Project
Disaggregated Datacenter
10
Datacenter���network
Resource as a���standalone blade
Why will it happen?
: Extreme resource modularity
11
Benefits of Resource Modularity
1. Easier to build & evolve – Resources have different cycles/trends/constraints.
• Tight integration in a server is a huge pain • E.g., “Memory capacity per core drops 30% ���
for every 2 years” [Lim et al., ISCA ’09]
– Disaggregation enables independent evolution – The biggest driving force from vendor’s viewpoint
12
Benefits of Resource Modularity
1. Easier to build & evolve 2. Fine-grained resource provisioning
– Current practice: replace/buy an entire ���server, rack, or even datacenter.
– e.g., “I want just more processors, not servers!” • Go buy some CPU blades at Best Buy® and plug them in.
– e.g., “I want to try the new NVRAM technology!” • Again, go for it.
13
Benefits of Resource Modularity
1. Easier to build & evolve 2. Fine-grained resource provisioning 3. Operational efficiency
– Datacenter as a single giant computer – Higher utilization with statistical multiplexing – (I will get back to this)
14
How will it happen?
: Incrementally and radically.
15
THIS IS NOT A CRAZY IDEA
16
• Do we need to change everything? NO. – HW change is minimal. – SW change is minimal, too.
Unified���Interconnect
HW Requires Minimal Modification
• The internals don’t need to change. • All we need is embedded network controller.
– They already have: QPI, HT, PCIe, SATA, … – Can be very cheap
• E.g., a whole graphics card w/ 128Gbps for only $50
17
Resource as a standalone blade
How About SW?
• Existing SW infrastructure heavily relies on ���the concept of “server” – We don’t want to rewrite it from scratch. – How to utilize the “giant computer”?
18
VM
Disaggregated VM
No modification for App/OS Minor changes in VMM. Much higher utilization!
Elastic VMs Achieve High Utilization! Task 1 Task 2 Task 3 Task 4
Servers 19
Elastic VMs Achieve High Utilization!
Servers
40% of resources are wasted
20
Elastic VMs Achieve High Utilization!
Servers
40% of resources are wasted
Disaggregated datacenter
vs.
21
1. No “server boundary” 2. Statistical multiplexing at a larger scale 3. Higher utilization!
(tasks vary greatly)
22
THIS IS NOT A CRAZY IDEA, part 2
23
• We don’t need to change everything. – HW change is minimal. – SW change is minimal, too.
• A unified network is plausible – The intra-/inter-server networks can be unified. – Bandwidth/latency requirements are within reach.
Two Different(?) Types of Network
24
Inter-server���Network
Intra-server���Network
Intra- vs. Inter-Server Networking
Aren’t they two different things?
Not really.
E.g., PCIe and 10GbE
• Serial • Point-to-point • Full duplex • Packet-switched • Variable packet size • Supports both message and ���
read/write semantics
No fundamental™ difference! 25
Making Memory Traffic Manageable
Registers
CPU Cache
Memory
SSD / HDD
10,000 Gbps
500 Gbps
1-10 Gbps
26
1 ns
50 ns
50,000 ns
Making Memory Traffic Manageable
Registers
CPU Cache
Local memory
SSD / HDD
10,000 Gbps
500 Gbps
1-10 Gbps
Remote memory
?? Gbps
A small amount of ���local memory as a “cache”
27
1 ns
50 ns
50,000 ns
?? ns
Desirable Network Speed?
• A quick-and-dirty experiment
Core Core
Core Core
Artificial delay for���bandwidth/latency
Local memory Emulated���
“remote” memory Server
28
Desirable Network Speed?
• 4 CPU cores, 8GB working set size • GraphLab, memcached, Pig
• Findings (read the paper) 1. 10-40 Gbps is enough.
1. Feasible even today! 2. Average link utilization: < 1-5Gbps
2. Latency matters.
29
WANTED: Low Latency memcached with varying latency
< 10µs���latency, < 20% ���overhead
30
Research Questions
31
Questions
• Answered: – How fast should it be? >10-40Gbps, <1-10µs.
• Unanswered: – Scalability? – Reliable transfer? – QoS? – Packet? Circuit? – …
32
1. “Right” Scale of Disaggregation
• Disaggregation scale: where is the sweet spot?
datacenter-scale���(flat)
chassis/rack/pod-scale���(two-tier)
TradiIonal inter-‐server network
33
2. Realizing Low Latency
• It’s time for low latency [Rumble et al., HotOS ’11] – “5-10µs latency is possible in the short term” – “1µs round-trip times cannot be achieved off-processor”
• Congestion avoidance/control should be ���“close to the metal”. – A lot of research efforts are ongoing. – Will they be still valid in disaggregated datacenters?
34
3. Unified Scheduler
• We will need a unified scheduler : – Job scheduler + network controller
35
VM1
VM2 VM1
VM2
VM1
3. Unified Scheduler
• We will need a unified scheduler : – Job scheduler + network controller
36
VM1
VM2
VM1
VM2
VM1
3. Unified Scheduler
• We will need a unified scheduler : – Job scheduler + network controller
37
VM1
VM2
VM1
VM2
VM1
Closing Remarks
• Disaggregated datacenter will be “the next big thing” – Already happening. We need to catch up!
• We are working on a small-scale prototype. – Disaggregated resource blades on existing SW/HW
• 40 CPUs • 6 remote memory blades • 8 GPU/NIC/storage blades
– PCIe as the “unified interconnect” 38