1
Realizing a Self-Adaptive Network Architecture for HPC Clouds Feroz Zahid ([email protected]) Advisors: Ernst Gunnar Gran, Tor Skeie For publication list, please visit: https://www.simula.no/people/feroz Clouds offer significant advantages over traditional cluster computing architectures including easy of deployment, rapid elasticity, and an economically attractive pay-as-you-go business model. However, effective use of cloud computing for HPC systems still remains questionable. When clouds are deployed on lossless interconnection networks, like InfiniBand (IB), challenges such as load-balancing, low-overhead virtualization, and performance isolation hinder full potential utilization of the underlying interconnect. In this work, we attack these challenges and propose a novel holistic framework of a self-adaptive IB subnet for HPC clouds. Our solution consists of a feedback control loop that effectively incorporate optimizations based on multidimensional objective function using current resource configuration and provider-defined policies. We build our system using a bottom-up approach, starting by prototyping solutions taking up individual research challenges associated, and later combining them together to build a working self-adaptive cloud prototype. This work is a part of the ERAC project which investigates methods for reducing overheads associated with the cloud management on high-performance interconnection networks. We address the specific challenges associated with the optimization of the underlying network fabric for cloud applications, enabling high-level of predictability and performance guarantees. We design a holistic self-adaptive framework for IB subnets realizing HPC clouds based on the fat-trees. We build our system using a bottom-up approach, starting by prototyping solutions taking up individual research challenges associated with HPC clouds with an proactive plan to combine them later in an integrated cloud prototype. The implementation of the integrated cloud prototype in the ERAC project involves designing methods for each part of the monitor-analyze-optimize loop. After running the current configuration, monitors will collect statistics about various states of the components. The collected data will be analyzed and a new optimized configuration is proposed that can potentially make the system more efficient. This new configuration is applied and the loop continues. M o n i t o r O p t i m i z e A n a l y z e OS Networking Storage/Servers Virtualization Orchestration Applications Architecture Infiniband HCAs, CPUs, ... KVM OpenStack T h r o u g h p u t, L a t e n c y, F a u l t s H a rd w a r e , P o w e r c o n s u mp t i on , F a i l u r e s Utiliza ti o n , R e s o u r c e d i s t r i b u t i o n S L A c o m p lia nc e L i n k l o a d s , H o t s p o ts S e r v e r c o n s o l i d a t i o n , R e s o ur c e o v e r c o m m it m e n t P o w e r m e t r i cs , H a r d w a r e s t a t s C l i e n t p r o f i l e , S L A v i o l a t i o n s S e r v ic e p e r f o r m a n c e V ir t u a l ma c h i n e p l a c e m e nt , L i v e m i g r a t i o n S e r v e r s e l e c t i on L o a d b a l a n c i n g , R o u t e r e c o n f i gu r a t i o n Test-bed More details about the ERAC project are found at: https://www.simula.no/research/communication-systems/cloud In this work, we address several challenges impeding realization of efficient HPC clouds based on IB interconnect technology. We present prototype solutions to achieve efficient load-balancing, tenant performance isolation, fast and compact network reconfiguration, and improved routing for virtualized environments. Furthermore, based on our prototype solutions, we present the design of a self-adaptive network architecture for IB subnets. 0 20 40 60 80 0.25 0.50 0.75 1.0 Hot Spot Percentage 1 rcv/sw 2 rcv/sw 3 rcv/sw 4 rcv/sw 5 rcv/sw 1 rcv/sw 2 rcv/sw 3 rcv/sw 4 rcv/sw 5 rcv/sw 3 1 2 1 5 7 2 4 8 1 7 9 6 3 6 4 5 9 7 8 1 4 100 1 1 1 1 1 wts: 100 1 1 0 20 40 60 80 BW Improvement (%) 0.25 0.50 0.75 1.0 Hot Spot Percentage 1 rcv/sw 2 rcv/sw 3 rcv/sw 4 rcv/sw 5 rcv/sw 3 1 2 1 4 7 2 5 8 1 7 9 6 3 6 4 5 9 7 8 1 4 0 20 40 60 80 BW Improvement (%) 0.25 0.50 0.75 1.0 Hot Spot Percentage 1 rcv/sw 2 rcv/sw 3 rcv/sw 4 rcv/sw 5 rcv/sw De facto fat-tree routing The weighted fat-tree routing considers node traffic characteristics to balance load across the network links more evenly, and with predictable network performance. Weighted fat-tree Routing InniBand Fabric Subnet Manager Patterns Simulation Engine Overhead Estimation Strategy R o u t i n g E v a l u a t o r New Routing Tables S u b n e t M o d e l Subnet Model Topology Information Routing Information Trac Proler Virtualization Manager Tenant Database A d a p t a t i o n E n g i n e Constraints Evaluator Rules Optimization Engine Policies Downtime Requester Workload Block-by-Block Dierence Engine Current Routing A d a p t a t i o n E x e c u t o r Subnet Model Current Routing Tables Subnet Model Adaptation Layer System Layer Subnet Interaction API Eectors Monitors A d a p t a t i o n P l a n n e r

Realizing a Self-Adaptive Network Architecture for HPC Cloudssc16.supercomputing.org/sc-archive/doctoral...Clouds offer significant advantages over traditional cluster computing architectures

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Realizing a Self-Adaptive Network Architecture for HPC Cloudssc16.supercomputing.org/sc-archive/doctoral...Clouds offer significant advantages over traditional cluster computing architectures

Realizing a Self-Adaptive Network Architecture for HPC CloudsFeroz Zahid ([email protected]) Advisors: Ernst Gunnar Gran, Tor Skeie

For publication list, please visit: https://www.simula.no/people/feroz

Abstract

Project Overview

Efficient Load Balancing Tenant Performance Isolation

Fast and Compact Network Reconfiguration

Conclusion

e Big Picture - A Self-Adaptive IB Network

Clouds offer significant advantages over traditional cluster computing architectures including easy of deployment, rapid elasticity, and an economically attractive pay-as-you-go business model. However, effective use of cloud computing for HPC systems still remains questionable. When clouds are deployed on lossless interconnection networks, like InfiniBand (IB), challenges such as load-balancing, low-overhead virtualization, and performance isolation hinder full potential utilization of the underlying interconnect. In this work, we attack these challenges and propose a novel holistic framework of a self-adaptive IB subnet for HPC clouds. Our solution consists of a feedback control loop that effectively incorporate optimizations based on multidimensional objective function using current resource configuration and provider-defined policies. We build our system using a bottom-up approach, starting by prototyping solutions taking up individual research challenges associated, and later combining them together to build a working self-adaptive cloud prototype.

This work is a part of the ERAC project which investigates methods for reducing overheads associated with the cloud management on high-performance interconnection networks. We address the specific challenges associated with the optimization of the underlying network fabric for cloud applications, enabling high-level of predictability and performance guarantees. We design a holistic self-adaptive framework for IB subnets realizing HPC clouds based on the fat-trees. We build our system using a bottom-up approach, starting by prototyping solutions taking up individual research challenges associated with HPC clouds with an proactive plan to combine them later in an integrated cloud prototype. The implementation of the integrated cloud prototype in the ERAC project involves designing methods for each part of the monitor-analyze-optimize loop. After running the current configuration, monitors will collect statistics about various states of the components. The collected data will be analyzed and a new optimized configuration is proposed that can potentially make the system more efficient. This new configuration is applied and the loop continues.

Monitor

Optimiz

e

An

alyze

OS

Networking

Storage/Servers

Virtualization

Orchestration

Applications

Architecture

Infiniband

HCAs, CPUs, ...

KVM

OpenStackThroughput, Latency, Faults

Hardware, Power consumption, Failu

res

Utilization, Resource distribution

SLA compliance

Link lo

ad

s, Ho

t spots

Server con

solid

atio

n, R

eso

urce o

vercomm

itment

Po

wer m

etrics, H

ard

ware stats

Clie

nt p

rofile

, SLA

violatio

ns

Service perform

ance

Virtual machine placem

ent,

Live

mig

rati

on

Server sele

ctio

n

Load balancing, Route

reco

nfi

gura

tio

n

Test-bed

More details about the ERAC project are found at: https://www.simula.no/research/communication-systems/cloud

In this work, we address several challenges impeding realization of efficient HPC clouds based on IB interconnect technology. We present prototype solutions to achieve efficient load-balancing, tenant performance isolation, fast and compact network reconfiguration, and improved routing for virtualized environments. Furthermore, based on our prototype solutions, we present the design of a self-adaptive network architecture for IB subnets.

020

40

60

80

0.25 0.50 0.75 1.0

Hot Spot Percentage

1 rcv/sw2 rcv/sw

3 rcv/sw4 rcv/sw

5 rcv/sw1 rcv/sw2 rcv/sw

3 rcv/sw4 rcv/sw

5 rcv/sw

31 2

1 5

72

4

8

1

7

96

3

64 5 97 8

14

100 11 111wts: 100 11

020

40

60

80

BW

Im

pro

vem

ent (%

)

0.25 0.50 0.75 1.0

Hot Spot Percentage

1 rcv/sw2 rcv/sw

3 rcv/sw4 rcv/sw

5 rcv/sw

31 2

1 4

72

5

8

1

7

96

3

64 5 97 8

14

020

40

60

80

BW

Im

pro

vem

ent (%

)

0.25 0.50 0.75 1.0

Hot Spot Percentage

1 rcv/sw2 rcv/sw

3 rcv/sw4 rcv/sw

5 rcv/sw

De facto fat-tree routing

The weighted fat-tree routing considers node traffic characteristics to balance load across the network links more evenly, and with predictable network performance.

Weighted fat-tree Routing

g he fa db c

R1 R2

L1 L2

a

P1bmax = 2

dwn = 2

max = 2

dwn = 2

P2

d

c e

f

gh

max = 2

dwn = 2

max = 2

dwn = 2

g he fa db c

R1 R2

L1 L2

a

P1bmax = 2

dwn = 2

max = 2

dwn = 2 P2

d

c

g he fa db c

R1 R2

L1 L2

a

P1bmax = 2

dwn = 1

max = 2

dwn = 1 P2

g he fa db c

R1 R2

L1 L2

max = 2

dwn = 0

max = 2

dwn = 0

Port selection in pFTree Routing

The pFTree algorithm utilizes several mechanisms to provide network-wide isolation of partitions belonging to different tenant groups.

0 10 20 30 40 50 60 70 80 90 100

Time (seconds)

Averag

e B

an

dw

idth

(M

B/s

ec)

01

00

02

00

03

00

04

00

05

00

0

Flow 3 → 5 (Fat-Tree)Flow 3 → 5 (pFTree)

Flow 6 → 4 (Fat-Tree)Flow 6 → 4

1 → 7 Started7 → 2 Started2 → 7 Started8 → 2 Started(pFTree)

0 10 20 30 40 50 60 70 80 90 100

Time (seconds)

Averag

e B

an

dw

idth

(M

B/s

ec)

01

00

02

00

03

00

04

00

05

00

0

Flow 3 → 5 (Fat-Tree)Flow 3 → 5 (pFTree)

Flow 6 → 4 (Fat-Tree)Flow 6 → 4

1 → 7 Started7 → 2 Started2 → 7 Started8 → 2 Started(pFTree)

Non-oversubscribedTopology

OversubscribedTopology

InfiniBand Fabric

Subnet

Manager

PatternsSimulation

Engine

Overhead

EstimationStrategy

Routing Evaluator

New Routing Tables

Subnet Model

Subnet

Model

Topology

Information

Routing

Information

Traffic

Profiler

Virtualization

Manager Tenant

Database

Adaptation Engine

Constraints

EvaluatorRules

Optimization

EnginePolicies

Downtime

RequesterWorkload

Block-by-Block

Difference EngineCurrent

Routing

Adaptation Executor

Subnet Model

Current Routing Tables

Subnet Model

Adaptation Layer

System Layer

Subnet Interaction API

Effectors Monitors

Adaptation Planner

We now combine contributions of this work attacking different challenges together in the form of a holistic self-adaptive network architecture for HPC clouds. The architecture is supported by a domain specific language which user uses to write policies and actions in an Event-Condition-Action based rule format. The user-defined policies, together with the configurable system variables and monitored traffic profiles, are given as an input to a consolidated fat-tree routing algorithm to configureand adapt routing automatically.