Upload
elmer-hoxworth
View
218
Download
2
Tags:
Embed Size (px)
Citation preview
1
VirtualKnotter: Online Virtual Machine Shuffling for Congestion Resolving in
Virtualized DatacenterXitao Wen, Kai Chen, Yan Chen,
Yongqiang Liu, Yong Xia, Chengchen Hu
2
Datacenter as Infrastructure
3
Congestion in DatacenterCore
Aggregation
Edge
Pod 0 Pod 1 Pod 2 Pod 3
10:1~100:1
2:1~10:1
Packet loss! Queuing
delay!
Degrading Throughput!
4
Congestion in the WildGeneral ApproachesProblem FormulationMain DesignEvaluation
5
Spatial Pattern
• Unbalanced utilization– Hotspot: Hot links
account for <10% core links [IMC10]
– Spatially unbalanced utilization
Sender
Rece
iver
6
Temporal Pattern
• Long congestion event– lasts for 10s of minutes– Individual event has clear spatial pattern
Core
Lin
k In
dex
7
Traffic Stability
• Bursty at a fine granularity– Not predictable at 10s or
100s or milliseconds [IMC10][SIGCOMM09]
• Predictable at timescale of 10s of minutes– 40% to 70% pairwise traffic
can be expected stable– 90%+ predictable traffic
aggregated at core links
8
General ApproachesProblem FormulationMain DesignEvaluation
Congestion in the Wild
9
General Approaches
• Network Layer– Increase network bandwidth
• Fat-tree, BCube, OSA…
– Optimize flow routing• Hedera, MicroTE
• Application Layer– Optimize VM placement
• Expensive• Requires to
upgrade entire DC network
• Not scalable• Requires
hardware support
• Depends on rich path diversity
• Scalable• Lightweight deployment• Suitable for existing
over-subscribed network
10
• Virtualization Layer• VM Live Migration
– Keep continuous service while migrating– 1.1x – 1.4x VM memory transfer
Server
VM
Server
DC Network
VM VM VM
Major Cost!
Background on Virtualized DC
11
Optimize VM Placement
• Offload traffic from congested link
active VM
idle VM
12
Congestion in the WildGeneral ApproachesProblem FormulationMain DesignEvaluation
13
Design Goal
• Mitigate congestion– Maximum link utilization (MLU)
• Controllable migration traffic (i.e. moving VM)– Less than reduced traffic
• Reasonable runtime overhead– Far less than target timescale (10s of mins)
Objective
Constraint
Problem Statement
• Input– Topology and routing of
physical servers– Traffic matrix among VMs– Current Placement
• Variable & Output– Optimized Placement
• NP-hardness– Proof: reduced from
Quadratic Bottleneck Assignment Problem
14
15
Related Work
• Optimize VM placement– Server consolidation [SOSP’07]– Fault tolerance [ICS’07]– Network scalability [INFOCOM’10]
16
Main DesignEvaluation
Congestion in the WildGeneral ApproachesProblem Formulation
17
Inspiration
Stretch the tie violently, making it loose and less tangled.
Solve each tie gently, by carefully reeving the end out of the tie.
Two-step Algorithm
• Fast and greedy• Search for localizing
overall traffic • May stuck in local
minimum
• Fine-grained and randomized
• Search for mitigating traffic on the most congested links
• Help avoid local minimum
Simulated Annealing
Multiway θ-Kernighan-Lin
Topology & Routing
Traffic Matrix
Current VM Placement
Optimized VM placement
18
19
Multiway Θ-Kernighan-Lin (KL)
• Top-down graph cut improvement
• Introduce Θ to limit # of moves
• O(n2log(n))
20
Multiway Θ-Kernighan-Lin (KL)
• Top-down graph cut improvement
• Introduce Θ to limit # of moves
• O(n2log(n))
21
Multiway Θ-Kernighan-Lin (KL)
• Top-down graph cut improvement
• Introduce Θ to limit # of moves
• O(n2log(n))
22
MLU=.60MLU=.53
Simulated Annealing Searching (SA)
• Randomized global searching
• Terminate when obtains satisfied solution, or predefined max depth is reached
23
Evaluation
Congestion in the WildGeneral ApproachesProblem FormulationMain Design
24
Methodology
• Baseline Algorithm– Clustering-based algorithm– Pro: best-known static optimality– Con: high runtime and migration overhead
• Metrics– MLU reduction without migration overhead– Overhead
• Migration traffic• Runtime overhead
– Simulation results
25
MLU Reduction without Overhead
VirtualKnotter demonstrates similar static performance as that of Clustering.
26
Migration Traffic
VirtualKnotter shows significantly less migration traffic than that of Clustering.
27
Runtime Overhead
VirtualKnotter demonstrates reasonable runtime overhead.
28
Simulation Results
53% less congestion
Altogether, VirtualKnotter obtains significant gain on congestion resolving.
29
Conclusions
• Collaborative VM migration can substantially resolve long-term congestion in DC
• Trade-off between optimality and migration traffic is essential to harvest the benefit
DC networking projects of Northwestern LIST: http://list.cs.northwestern.edu/dcn
30
Thank you!
31
Backup
32
General Approaches
Cost
Hardware
Support
Scalability
Other Dependen
cy
Increase Bandwidth
High Yes Varies
Optimize Routing Low Yes Low Rich path
diversity
Optimize VM
PlacementLow No High
VM deployme
nt
33
Problem Statement
• Objective– Minimize Maximum Link Utilization (MLU)– “Cool down the hottest spot”
• Constraints– Migration traffic – Server hardware capacity– Inseparable VM
• NP-hardness– Proof: reduced from Quadratic Bottleneck Assignment
Problem
34
Observation Summary
• Unbalanced jam (spatial)• Long-term congestion (temporal)• Predictable at 10s of minutes scale (stability)
35
Two-step Algorithm
Multiway Θ-Kernighan-Lin Algorithm (KL)• Fast search for approximation
Simulated Annealing Searching (SA)• Fine search for better
solution