Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Realizing a Self-Adaptive Network Architecture for HPC CloudsFeroz Zahid ([email protected]) Advisors: Ernst Gunnar Gran, Tor Skeie
For publication list, please visit: https://www.simula.no/people/feroz
Abstract
Project Overview
Efficient Load Balancing Tenant Performance Isolation
Fast and Compact Network Reconfiguration
Conclusion
e Big Picture - A Self-Adaptive IB Network
Clouds offer significant advantages over traditional cluster computing architectures including easy of deployment, rapid elasticity, and an economically attractive pay-as-you-go business model. However, effective use of cloud computing for HPC systems still remains questionable. When clouds are deployed on lossless interconnection networks, like InfiniBand (IB), challenges such as load-balancing, low-overhead virtualization, and performance isolation hinder full potential utilization of the underlying interconnect. In this work, we attack these challenges and propose a novel holistic framework of a self-adaptive IB subnet for HPC clouds. Our solution consists of a feedback control loop that effectively incorporate optimizations based on multidimensional objective function using current resource configuration and provider-defined policies. We build our system using a bottom-up approach, starting by prototyping solutions taking up individual research challenges associated, and later combining them together to build a working self-adaptive cloud prototype.
This work is a part of the ERAC project which investigates methods for reducing overheads associated with the cloud management on high-performance interconnection networks. We address the specific challenges associated with the optimization of the underlying network fabric for cloud applications, enabling high-level of predictability and performance guarantees. We design a holistic self-adaptive framework for IB subnets realizing HPC clouds based on the fat-trees. We build our system using a bottom-up approach, starting by prototyping solutions taking up individual research challenges associated with HPC clouds with an proactive plan to combine them later in an integrated cloud prototype. The implementation of the integrated cloud prototype in the ERAC project involves designing methods for each part of the monitor-analyze-optimize loop. After running the current configuration, monitors will collect statistics about various states of the components. The collected data will be analyzed and a new optimized configuration is proposed that can potentially make the system more efficient. This new configuration is applied and the loop continues.
Monitor
Optimiz
e
An
alyze
OS
Networking
Storage/Servers
Virtualization
Orchestration
Applications
Architecture
Infiniband
HCAs, CPUs, ...
KVM
OpenStackThroughput, Latency, Faults
Hardware, Power consumption, Failu
res
Utilization, Resource distribution
SLA compliance
Link lo
ad
s, Ho
t spots
Server con
solid
atio
n, R
eso
urce o
vercomm
itment
Po
wer m
etrics, H
ard
ware stats
Clie
nt p
rofile
, SLA
violatio
ns
Service perform
ance
Virtual machine placem
ent,
Live
mig
rati
on
Server sele
ctio
n
Load balancing, Route
reco
nfi
gura
tio
n
Test-bed
More details about the ERAC project are found at: https://www.simula.no/research/communication-systems/cloud
In this work, we address several challenges impeding realization of efficient HPC clouds based on IB interconnect technology. We present prototype solutions to achieve efficient load-balancing, tenant performance isolation, fast and compact network reconfiguration, and improved routing for virtualized environments. Furthermore, based on our prototype solutions, we present the design of a self-adaptive network architecture for IB subnets.
020
40
60
80
0.25 0.50 0.75 1.0
Hot Spot Percentage
1 rcv/sw2 rcv/sw
3 rcv/sw4 rcv/sw
5 rcv/sw1 rcv/sw2 rcv/sw
3 rcv/sw4 rcv/sw
5 rcv/sw
31 2
1 5
72
4
8
1
7
96
3
64 5 97 8
14
100 11 111wts: 100 11
020
40
60
80
BW
Im
pro
vem
ent (%
)
0.25 0.50 0.75 1.0
Hot Spot Percentage
1 rcv/sw2 rcv/sw
3 rcv/sw4 rcv/sw
5 rcv/sw
31 2
1 4
72
5
8
1
7
96
3
64 5 97 8
14
020
40
60
80
BW
Im
pro
vem
ent (%
)
0.25 0.50 0.75 1.0
Hot Spot Percentage
1 rcv/sw2 rcv/sw
3 rcv/sw4 rcv/sw
5 rcv/sw
De facto fat-tree routing
The weighted fat-tree routing considers node traffic characteristics to balance load across the network links more evenly, and with predictable network performance.
Weighted fat-tree Routing
g he fa db c
R1 R2
L1 L2
a
P1bmax = 2
dwn = 2
max = 2
dwn = 2
P2
d
c e
f
gh
max = 2
dwn = 2
max = 2
dwn = 2
g he fa db c
R1 R2
L1 L2
a
P1bmax = 2
dwn = 2
max = 2
dwn = 2 P2
d
c
g he fa db c
R1 R2
L1 L2
a
P1bmax = 2
dwn = 1
max = 2
dwn = 1 P2
g he fa db c
R1 R2
L1 L2
max = 2
dwn = 0
max = 2
dwn = 0
Port selection in pFTree Routing
The pFTree algorithm utilizes several mechanisms to provide network-wide isolation of partitions belonging to different tenant groups.
0 10 20 30 40 50 60 70 80 90 100
Time (seconds)
Averag
e B
an
dw
idth
(M
B/s
ec)
01
00
02
00
03
00
04
00
05
00
0
Flow 3 → 5 (Fat-Tree)Flow 3 → 5 (pFTree)
Flow 6 → 4 (Fat-Tree)Flow 6 → 4
1 → 7 Started7 → 2 Started2 → 7 Started8 → 2 Started(pFTree)
0 10 20 30 40 50 60 70 80 90 100
Time (seconds)
Averag
e B
an
dw
idth
(M
B/s
ec)
01
00
02
00
03
00
04
00
05
00
0
Flow 3 → 5 (Fat-Tree)Flow 3 → 5 (pFTree)
Flow 6 → 4 (Fat-Tree)Flow 6 → 4
1 → 7 Started7 → 2 Started2 → 7 Started8 → 2 Started(pFTree)
Non-oversubscribedTopology
OversubscribedTopology
InfiniBand Fabric
Subnet
Manager
PatternsSimulation
Engine
Overhead
EstimationStrategy
Routing Evaluator
New Routing Tables
Subnet Model
Subnet
Model
Topology
Information
Routing
Information
Traffic
Profiler
Virtualization
Manager Tenant
Database
Adaptation Engine
Constraints
EvaluatorRules
Optimization
EnginePolicies
Downtime
RequesterWorkload
Block-by-Block
Difference EngineCurrent
Routing
Adaptation Executor
Subnet Model
Current Routing Tables
Subnet Model
Adaptation Layer
System Layer
Subnet Interaction API
Effectors Monitors
Adaptation Planner
We now combine contributions of this work attacking different challenges together in the form of a holistic self-adaptive network architecture for HPC clouds. The architecture is supported by a domain specific language which user uses to write policies and actions in an Event-Condition-Action based rule format. The user-defined policies, together with the configurable system variables and monitored traffic profiles, are given as an input to a consolidated fat-tree routing algorithm to configureand adapt routing automatically.