Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Scalable Rule Management for
Data Centers
Masoud Moshref, Minlan Yu,
Abhishek Sharma, Ramesh Govindan
4/3/2013
Motivation Motivation Design EvaluationIntroduction
Introduction: Definitions
2
Datacenters use rules to implement management policiesDatacenters use rules to implement management policies
• Access control• Rate limiting• Traffic measurement • Traffic engineering
Motivation Motivation Design EvaluationIntroduction
Introduction: Definitions
3
Datacenters use rules to implement management policiesDatacenters use rules to implement management policiesDatacenters use rules to implement management policies
An action on a set of ranges on flow fieldsAn action on a set of ranges on flow fields
Examples:
• Deny
• Accept
• Enqueue
Flow fields examples:
• Src IP / Dst IP
• Protocol
• Src Port / Dst Port
Motivation Motivation Design EvaluationIntroduction
Introduction: Definitions
4
Datacenters use rules to implement management policies
R2
R1
Src IP R1: Accept
� SrcIP: 12.0.0.0/7
� DstIP: 10.0.0.0/8
An action on a set of ranges on flow fields
Dst
IP
R2: Deny
� SrcIP: 12.0.0.0/8
� DstIP: 8.0.0.0/6
Motivation Motivation Design EvaluationIntroduction
Current practice
5
Rules are saved on predefined fixed machines
On hypervisors On switches
Motivation Motivation Design EvaluationIntroduction
Machines have limited resources
6
Top-of-Rack switch
Network Interface Card
Software switches on servers
Motivation Motivation Design EvaluationIntroduction
Future datacenters will have many fine-grained rules
7
Regulating VM pair communication
• Access control (CloudPolice)
• Bandwidth allocation (Seawall)
Per flow decision
• Flow measurement for
traffic engineering (MicroTE, Hedera)
VLAN per server
• Traffic management (NetLord, Spain)
1B – 20B rules
10M – 100M rules
1M rules
Introduction ArchitectureMotivation Design Evaluation
Rule location trade-off (resource vs. bandwidth usage)
8
Storing rules at hypervisor incurs CPU overhead
R0
Introduction ArchitectureMotivation Design Evaluation
Rule location trade-off (resource vs. bandwidth usage)
9
Storing rules at hypervisor incurs CPU overheadMove the rule to ToR switch and forward traffic
R0
Introduction ArchitectureMotivation Design Evaluation
Rule location trade-off: Offload to servers
10
R1
Introduction ArchitectureMotivation Design Evaluation
Challenges: Concrete example
11
Src IP
Dst
IP
Agg1 Agg2
ToR1 ToR2
S1 S2 S3 S4 S5 S6
ToR3
R0
R1 R5
R6
R2
R3
R4
0 1 2 3 4 5 6 7
1
2
3
4
5
6
7
0
VM0
VM1
VM2
VM3
VM4
VM5
VM6
VM7
Introduction ArchitectureMotivation Design Evaluation
Challenges: Overlapping rules
12
R0
R1 R5
R6
R2
R3
R4
0 1 2 3 4 5 6 7
1
2
3
4
5
6
7
0
Agg1 Agg2
ToR1 ToR2
S1 S2 S3 S4 S5 S6
ToR3
VM2
VM6
R0
R1
R2
R3
R4
0 1 2 3 4 5 6 7
1
2
3
4
5
6
7
0
Source Placement: Saving rules on the source machine means
minimum overhead
Src IP
Dst
IP
R0
R1
R2
R3
R4
0 1 2 3 4 5 6 7
1
2
3
4
5
6
7
0
Introduction ArchitectureMotivation Design Evaluation
Agg1 Agg2
ToR1 ToR2
S1 S2 S3 S4 S5 S6
ToR3
Challenges: Overlapping rules
13
R0
R1
R2
R3
R4
0 1 2 3 4 5 6 7
1
2
3
4
5
6
7
0
VM2
VM6
R4
R0
R1
R2
R3
0 1 2 3 4 5 6 7
1
2
3
4
5
6
7
0
If Source Placement is not feasible
Src IP
Dst
IP
Introduction ArchitectureMotivation Design Evaluation
Heterogeneous devices
Challenges
14
Respect resource constraints
Minimize traffic overhead
Traffic changesRule changesVM Migration
Preserve the semantics of overlapping rules
Handle Dynamics
Introduction DesignMotivation Evaluation
Contribution: vCRIB, a Virtual Cloud Rule Information Base
15
vCRIB
Proactive rule placement abstraction layer
Optimize traffic given resource constraints & changes
Rules
R1 R2
R3R4 R3
Introduction DesignMotivation Evaluation
vCRIB design
16
Source Partitioning with Replication
Topology &Routing
Rules
Partitions
Overlapping
Rules
Minimum Traffic Feasible Placement
Introduction DesignMotivation Evaluation
R0
R 1
R5
R7
R8
R2
R3 R6R
4
Partitioning with cutting
17
P1
Smaller partitions have more flexibility
Cutting causes rule inflation
0 1 2 3 4 5 6 7
1
2
3
4
5
6
7
0
R0
R 1
R8
R3
R4 R
0
R5
R 1
R7
R2
R3R6
Introduction DesignMotivation Evaluation
Partitioning with replication
18
R0
R3
R1
A7
R6R
0
R 1
R8
R3
R4 R0
R1
R5R
2
R3
R0
R1R
5R
2
R3
R7
R6
Introduce the concept of
similarity to mitigate inflation
��� ��, �� � �� ∩ ��� R0, R1, R3 � 3
P1 (5 rules) P3 (5 rules)P2(5 rules)
0 1 2 3 4 5 6 7
1
2
3
4
5
6
7
00 1 2 3 4 5 6 7
1
2
3
4
5
6
7
0
0 1 2 3 4 5 6 7
1
2
3
4
5
6
7
0
P1 ∪ P2(7 rules)
Introduction DesignMotivation Evaluation
Per-source partitions
19
R0
R1 R5
R6
R2
R3
R4
0 1 2 3 4 5 6 7
1
2
3
4
5
6
7
0
• Limited resource for forwarding
• No need for replication to
approximate source-placement
• Closer partitions are more similar
Src IP
Dst
IP
Introduction DesignMotivation Evaluation
vCRIB design: Placement
20
Source Partitioning with Replication
Topology &Routing
Rules
Partitions
PlacementT11
T21 T22
T23
T32
T33
Traffic Overhead
Resource
Constraints
Minimum Traffic Feasible Placement
Introduction DesignMotivation Evaluation
vCRIB design: Placement
21
Source Partitioning with Replication
Topology &Routing
Rules
Partitions
Minimum Traffic Feasible Placement
Feasible Placement
Traffic Overhead
Resource
Constraints
Traffic Overhead
Resource-Aware Placement
Traffic-Aware Refinement
Traffic-Aware Refinement
Introduction DesignMotivation Evaluation
FFDS (First Fit Decreasing Similarity)
22
1. Put a random partition on an empty device
2. Add the most similar partitions to the initial partition
until the device is full
Find the lower bound for optimal solution for rules
Prove the algorithm is a 2-approximation of the lower
bound
Introduction DesignMotivation Evaluation
vCRIB design: Heterogeneous resources
23
Source Partitioning with Replication
Topology &Routing
Rules
Partitions
Minimum Traffic Feasible Placement
Resource-Aware Placement
Feasible Placement
Resource
Heterogeneity
Resource Usage
FunctionTraffic-Aware Refinement
Introduction DesignMotivation Evaluation
vCRIB design: Traffic-Aware Refinement
24
Source Partitioning with Replication
Resource-Aware Placement
Partitions
Feasible Placement
Minimum Traffic Feasible Placement
Resource Usage
FunctionTraffic Overhead
Topology &Routing
Rules
Traffic-Aware Refinement
Introduction DesignMotivation Evaluation
Traffic-aware refinement
� Overhead greedy approach
1. Pick maximum overhead partition
2. Put it where minimizes the overhead and maintains feasibility
25
Agg1 Agg2
ToR1 ToR2
S1 S2 S3 S4 S5 S6
ToR3
VM2 VM4
P2
P4
Agg1 Agg2
ToR1 ToR2
S1 S2 S3 S4 S5 S6
ToR3
VM2 VM4
P2P4
Introduction DesignMotivation Evaluation
Traffic-aware refinement
� Overhead greedy approach
1. Pick maximum overhead partition
2. Put it where minimizes the overhead and maintains feasibility
�Problem: Local minima
� Our approach: Benefit greedy
26
Agg1 Agg2
ToR1 ToR2
S1 S2 S3 S4 S5 S6
ToR3
VM2
VM4
P2
P4
Agg1 Agg2
ToR1 ToR2
S1 S2 S3 S4 S5 S6
ToR3
VM2
VM4
P2
P4
Introduction DesignMotivation Evaluation
vCRIB design: Dynamics
27
Source Partitioning with Replication
Resource-Aware Placement
Rule/VM Dynamics
Partitions
Feasible Placement
Minimum Traffic Feasible Placement
Resource Usage
Function
Major Traffic Changes
Dynamics
Topology &Routing
Rules
Traffic-Aware Refinement
Introduction DesignMotivation Evaluation
vCRIB design
28
Source Partitioning with Replication
Resource-Aware Placement
Rule/VM Dynamics
Partitions
Feasible Placement
Minimum Traffic Feasible Placement
Resource Usage
FunctionTraffic Overhead
Major Traffic Changes
Dynamics
Topology &Routing
Rules
Resource
Constraints
Resource
Heterogeneity
Overlapping
Rules
Traffic-Aware Refinement
EvaluationIntroduction Motivation Design
Evaluation
� Comparing vCRIB vs. Source-Placement
� Parameter sensitivity analysis
� Rules in partitions
� Traffic locality
� VMs per server
� Different memory sizes
� Where is the traffic overhead added?
� Traffic-aware refinement for online scenarios
� Heterogeneous resource constraints
� Switch-only scenarios
29
EvaluationIntroduction Motivation Design
Simulation setup
� 1k servers with 20 VMs per server in a Fat-tree network
� 200k rules generated by ClassBench and random action
� IPs are assigned in two ways:
� Random
� Range
� Flows
� Size follows long-tail distribution
� Local traffic matrix (0.5 same rack, 0.3 same pod, 0.2 interpod)
30
0 1 2 3 4 5 6 70 1 2 3 4 5 6 7
EvaluationIntroduction Motivation Design
Comparing vCRIB vs. Source-Placement
31
4k_0 4k_4k 4k_6k0
0.05
0.1
0.15
0.2
0.25
0.3
Server memory_Switch memory
Tra
ffic
over
he
ad ra
tio
RangeRandom
Maximum Load is 5K Capacity is 4K
Range is better as similar partitions are from the same source
Random: Average load is 4.2K
Adding more resources helps vCRIB reduce traffic overhead
vCRIB finds low traffic feasible solution
EvaluationIntroduction Motivation Design
Parameter sensitivity analysis: Rules in partitions
32
Total space
Defined by maximum load on a server
Source placement
EvaluationIntroduction Motivation Design
0 1000 20000
1000
2000
Average Partition Size
Ave
rag
e S
imila
rity
Parameter sensitivity analysis: Rules in partitions
33
Total space
vCRIB
Source placement
EvaluationIntroduction Motivation Design
0 1000 20000
1000
2000
Average Partition Size
Ave
rag
e S
imila
rity
Total space
vCRIB
Parameter sensitivity analysis: Rules in partitions
34
Lower traffic overhead for smaller partitions and
more similar ones
vCRIB<10% Traffic
Source placement
A
A
EvaluationIntroduction Motivation Design
Future work
Conclusion
Conclusion and future work
35
vCRIB allows operators and users to specify rules, and
manages their placement in a way that respects
resource constraints and minimizes traffic overhead.
• Support reactive placement by adding the
controller in the loop
• Break a partition for large number of rules per VM
• Test for other rulesets
Scalable Rule Management for
Data Centers
Masoud Moshref, Minlan Yu,
Abhishek Sharma, Ramesh Govindan
4/3/2013