Cloud Auto-Scaling Control Engine Based on …1263590/...Cloud Auto-Scaling Control Engine Based on Machine Learning Yantian You 2018-10-29 Master Thesis Examiner Gerald Q. Maguire

IN DEGREE PROJECT ,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2018

Cloud Auto-Scaling Control Engine Based on Machine Learning

YANTIAN YOU

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ARCHITECTURE AND THE BUILT ENVIRONMENT

Cloud Auto-Scaling ControlEngine Based on Machine

Learning

Yantian You

2018-10-29

Master Thesis

ExaminerGerald Q. Maguire Jr

Academic adviserAnders Vastberg

Industrial adviserToni Satola

Roberto Muggianu

KTH Royal Institute of TechnologySchool of Electrical Engineering and Computer Science(EECS)

Department of Communication SystemsSE-100 44 Stockholm, Sweden

Abstract

With the development of modern data centers and networks, many serviceproviders have moved most of their computing functions to the cloud. Consider-ing the limitation of network bandwidth and hardware or virtual resources, howto manage different virtual resources in a cloud environment so as to achievebetter resource allocation is a big problem. Although some cloud infrastruc-tures provide simple default auto-scaling and orchestration mechanisms, suchas OpenStack Heat service, they usually only depend on a single parameter,such as CPU utilization and cannot respond to the network changes in a timelymanner.

This thesis investigates different auto-scaling mechanisms and designs an on-line control engine that cooperates with different OpenStack service APIs basedon various network resource data. Two auto-scaling engines, Heat orchestra-tion based engine and machine learning based online control engine, have beendeveloped and compared for different client requests patterns. Two machinelearning methods, neural network, and linear regression have been consideredto generate a control signal based on real-time network data. This thesis alsoshows the network’s non-linear behaviors for heavy traffic and proposes a scalingpolicy based on deep network analysis.

The results show that for offline training, the neural network and linear re-gression provide 81.5% and 84.8% accuracy respectively. However, for onlinetesting with different client request patterns, the neural network results are dif-ferent than we expected, while linear regression provided us with much betterresults. The model comparison showed that these two auto-scaling mechanismshave similar behavior for a SMOOTH-load Pattern. However, for the SPIKEY-load Pattern, the linear regression based online control engine responded fasterto network changes while heat orchestration service shows some delay. Com-pared with the proposed scaling policy with fewer web servers in use and ac-ceptable response latency, both of the two auto-scaling models waste networkresources.

Keywords: Cloud Computing, Virtualization, Orchestration, OpenStack,Auto-scaling, Machine learning

i

Sammanfattning

Med utvecklingen av moderna datacentraler och natverk har manga tjanstelever-antorer flyttat de flesta av sina datafunktioner till molnet. Med tanke pabegransningen av natverksbandbredd och hardvara eller virtuella resurser, ar detett stort problem att hantera olika virtuella resurser i en molnmiljo for att uppnabattre resursallokering. Aven om vissa molninfrastrukturer tillhandahaller enklastandardskalnings- och orkestrationsmekanismer, till exempel OpenStack Heatservice, beror de vanligtvis bara pa en enda parameter, som CPU-utnyttjandeoch kan inte svara pa natverksandringarna i tid.

Denna avhandling undersoker olika auto-skaleringsmekanismer och desig-nar en online-kontrollmotor som samarbetar med olika OpenStack-service API-skivor baserat pa olika natverksresursdata. Tva auto-skalermotorer, varme-orkestreringsbaserad motor- och maskininlarningsbaserad online-kontrollmotor,har utvecklats och jamforts for olika klientforfrag-ningsmonster. Tva mask-ininlarningsmetoder, neuralt natverk och linjar regression har ansetts genereraen styrsignal baserad pa realtids natverksdata. Denna avhandling visar ocksanatverkets olinjara beteenden for tung trafik och foreslar en skaleringspolitikbaserad pa djup natverksanalys.

Resultaten visar att for natutbildning, ger neuralt natverk och linjar regres-sion 81,5% respektive 84,8% noggrannhet. For online-test med olika klientforfrag-ningsmonster ar de neurala natverksresultaten dock annorlunda an vad vi forvant-ade oss, medan linjar regression gav oss mycket battre resultat. Modellenjamforelsen visade att dessa tva auto-skala mekanismer har liknande beteendefor ett SMOOTH-load monster. For SPIKEY-load monster svarade den linjararegressionsbaserade online-kontrollmotorn snabbare an natverksforandringar me-dan varme-orkestrationstjansten uppvisar viss fordrojning. Jamfort med denforeslagna skaleringspolitiken med farre webbservrar i bruk och acceptabel svarsf-ordrojning, sloser bada de tva auto-skalande modellerna natverksresurser.

Nyckelord: Molnet Datoranvandning, Virtualisering, Orkestration, Open-Stack, Auto-skalering, Maskininlarning

ii

Acknowledgment

I would like to thank my main academic supervisor prof. Gerald Q. Maguirefrom KTH for his guidance during this thesis project. He always providedthorough feedback when I found myself in trouble. I consider myself very luckyto have had the chance to work under his guidance.

I want to thank my supervisor Toni Satola and Roberto Muggianu, whoprovided me with a good working environment at Telia Company. They haveprovided me with a lot of great ideas and useful suggestions and made me feelwelcome at Telia.

Finally, thanks to my parents for their endless support and encouragement.Stockholm, October 2018Yantian You

iii

Contents

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.6 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.7 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background 72.1 Data Center and Network . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Data Center . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.2 Data Center Network . . . . . . . . . . . . . . . . . . . . 8

2.2 Telia Company and Strategy . . . . . . . . . . . . . . . . . . . . 92.3 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Virtulization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.5 Network Function Virtualization Framework . . . . . . . . . . . . 112.6 Machine Learning in Networking . . . . . . . . . . . . . . . . . . 122.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.7.1 Auto Network Management and Orchestration . . . . . . 132.7.2 Auto-scaling Technique in Cloud Orchestration . . . . . . 152.7.3 Machine Learning Based Network Analysis and Manage-

ment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Testing Environment Establishment 183.1 Setting Up a Cloud Platform . . . . . . . . . . . . . . . . . . . . 18

3.1.1 OpenStack Platform . . . . . . . . . . . . . . . . . . . . . 183.1.2 Introduction of Different OpenStack Services . . . . . . . 183.1.3 Relationship Between OpenStack Services . . . . . . . . . 19

3.2 Testing Environment Structure . . . . . . . . . . . . . . . . . . . 203.2.1 Network Topology . . . . . . . . . . . . . . . . . . . . . . 213.2.2 Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2.3 Load Balancer . . . . . . . . . . . . . . . . . . . . . . . . 243.2.4 HTTP Web Server . . . . . . . . . . . . . . . . . . . . . . 24

3.3 Default Auto-scaling Mechanism . . . . . . . . . . . . . . . . . . 253.3.1 Heat Template . . . . . . . . . . . . . . . . . . . . . . . . 263.3.2 AutoScaling File . . . . . . . . . . . . . . . . . . . . . . . 263.3.3 How Default Auto-scaling Works . . . . . . . . . . . . . . 27

iv

3.4 New Auto-scaling Mechanism . . . . . . . . . . . . . . . . . . . . 27

4 Data Collection and Offline Model Training 304.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1.1 HTTP Request Pattern . . . . . . . . . . . . . . . . . . . 314.1.2 Features Within Data Set . . . . . . . . . . . . . . . . . . 314.1.3 Data Pre-processing . . . . . . . . . . . . . . . . . . . . . 334.1.4 Database Operations . . . . . . . . . . . . . . . . . . . . . 344.1.5 How Resource Data Changes Over Time . . . . . . . . . . 35

4.2 Offline Model Training . . . . . . . . . . . . . . . . . . . . . . . 374.2.1 Neural Network Structure . . . . . . . . . . . . . . . . . . 384.2.2 Hidden Layer Structure . . . . . . . . . . . . . . . . . . . 394.2.3 Parameter Tuning . . . . . . . . . . . . . . . . . . . . . . 394.2.4 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . 41

5 Online Testing and Result Analysis 445.1 Single Web Server . . . . . . . . . . . . . . . . . . . . . . . . . . 445.2 Heavy Network Traffic with Unlimited Web Servers . . . . . . . . 465.3 Client Request Pattern with Light Traffic . . . . . . . . . . . . . 48

5.3.1 SMOOTH-load Pattern . . . . . . . . . . . . . . . . . . . 485.3.2 SPIKEY-load Pattern . . . . . . . . . . . . . . . . . . . . 48

5.4 Online Control Engine . . . . . . . . . . . . . . . . . . . . . . . . 505.4.1 Auto-scaling service . . . . . . . . . . . . . . . . . . . . . 505.4.2 Signal Generating Service . . . . . . . . . . . . . . . . . . 51

5.5 Model Chosen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.6 Scaling for Heavy Traffic . . . . . . . . . . . . . . . . . . . . . . . 52

5.6.1 Heat Orchestration Based Policy . . . . . . . . . . . . . . 525.6.2 Response Time Sensitive Policy . . . . . . . . . . . . . . 535.6.3 Machine Learning Based Policy . . . . . . . . . . . . . . 55

5.7 Scaling for Light Traffic . . . . . . . . . . . . . . . . . . . . . . . 565.7.1 Result on SMOOTH-load Pattern . . . . . . . . . . . . . 575.7.2 Scaling Policy Based on Deep Network Analysis . . . . . 585.7.3 Result on SPIKEY-load Pattern . . . . . . . . . . . . . . 59

6 Conclusion and Future Work 646.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646.2 Problems and limitations . . . . . . . . . . . . . . . . . . . . . . 666.3 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.4 Ethics and Sustainability . . . . . . . . . . . . . . . . . . . . . . 67

v

List of Figures

2.1 Three Tier network architecture([1]) . . . . . . . . . . . . . . . . 92.2 Cloud Computing Service Models . . . . . . . . . . . . . . . . . . 102.3 ETSI MANO Framework([2]) . . . . . . . . . . . . . . . . . . . . 122.4 AFI GANA Reference Model([3]) . . . . . . . . . . . . . . . . . . 14

3.1 Relationship Between Openstack Service . . . . . . . . . . . . . . 203.2 Structure of Testing Environment . . . . . . . . . . . . . . . . . . 213.3 Network Topology of Load Balance Tenant . . . . . . . . . . . . 223.4 Network Topology of autoManage Tenant . . . . . . . . . . . . . 223.5 Load Balancer Configuration . . . . . . . . . . . . . . . . . . . . 253.6 The Data Flow Among Different Modules for Default Auto-scaling

Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.7 The Data Flow Among Different Models for New Auto-scaling

Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1 The Pattern of Client Request Sending Rate (16 hours) . . . . . 314.2 Meters for Ceilometer . . . . . . . . . . . . . . . . . . . . . . . . 324.3 Features in Training Data Set . . . . . . . . . . . . . . . . . . . . 334.4 Data Set Before Pre-processing . . . . . . . . . . . . . . . . . . . 344.5 Data Set After Pre-processing . . . . . . . . . . . . . . . . . . . . 344.6 Change Pattern for Cpu util . . . . . . . . . . . . . . . . . . . . . 364.7 Change Pattern for Network Incoming Rate . . . . . . . . . . . . 364.8 Change Pattern for Network Outgoing Rate . . . . . . . . . . . . 364.9 Change Pattern for Memory Usage . . . . . . . . . . . . . . . . . 364.10 Change pattern of Device Write Rate . . . . . . . . . . . . . . . . 364.11 Change pattern of number of Web Servers . . . . . . . . . . . . 364.12 Neural Network Structure . . . . . . . . . . . . . . . . . . . . . . 38

5.1 Request Rate for a Single Webserver . . . . . . . . . . . . . . . . 455.2 Response Time for a Single Webserver . . . . . . . . . . . . . . . 455.3 Accumulated Error Responses for a single WebServer . . . . . . . 455.4 CPU Utilization for a Single WebServer . . . . . . . . . . . . . . 455.5 Request Rate for Unlimited Webservers . . . . . . . . . . . . . . 475.6 Response Time for Unlimited Webservers . . . . . . . . . . . . . 475.7 Accumulated Error Responses for Unlimited Webservers . . . . . 475.8 Number of Sessions for Unlimited WebServers . . . . . . . . . . . 475.9 Average CPU Utilization vs Request Rate for Unlimited Web-

Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.10 Average CPU Utilization of Load Balancer . . . . . . . . . . . . 47

vi

5.11 SMOOTH-load Pattern for testing . . . . . . . . . . . . . . . . . 495.12 SPIKEY-load Pattern for testing . . . . . . . . . . . . . . . . . . 495.13 Neural Network Based Auto-scaling . . . . . . . . . . . . . . . . 525.14 Reponse time for Neural Network Based Auto-scaling . . . . . . 525.15 Hear Orchestration Based Auto-scaling with Heavy Traffic . . . . 535.16 Average CPU Utilization for Hear Orchestration Based Auto-

scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.17 Total Response time for Hear Orchestration Based Auto-scaling . 535.18 Single Thread Response time for Hear Orchestration Based Auto-

scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.19 Statistic Data of Response Time for Each Server . . . . . . . . . 545.20 Response Time Sensitive Auto-scaling with Heavy Traffic . . . . 555.21 Average CPU Utilization for Response Time Sensitive Auto-scaling 555.22 Total Response time for Response Time Sensitive Auto-scaling . 555.23 Single Thread Response time for Response Time Sensitive Auto-

scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.24 Statistic Data of Response Time for Each Server . . . . . . . . . 565.25 Machine Learning Based Auto-scaling with Heavy Traffic . . . . 565.26 Response Time for Machine Learning Based Auto-scaling . . . . 565.27 The results for SMOOTH-load Pattern . . . . . . . . . . . . . . . 575.28 Response Lantency for Heat Orchestration Based Auto-scaling . 585.29 Response Time for Machine Learning Based Auto-scaling . . . . 585.30 Scaling Policy Based on Deep Network Analysis . . . . . . . . . . 595.31 The results for SMOOTH-load Pattern . . . . . . . . . . . . . . . 595.32 The results for SPIKEY-load Pattern . . . . . . . . . . . . . . . . 615.33 Zoom in Figure for Online control engine and Heat Orchestration

(at SPIKEY point) . . . . . . . . . . . . . . . . . . . . . . . . . 61

vii

List of Tables

3.1 OpenStacke Lab Hardware configuration . . . . . . . . . . . . . . 23

4.1 Parameter Tuning(3 neurons in the hidden layer) . . . . . . . . . 394.2 Parameter Tuning(4 neurons in the hidden layer) . . . . . . . . . 404.3 Parameter Tuning(5 neurons in the hidden layer) . . . . . . . . . 404.4 Parameter Tuning(6 neurons in the hidden layer) . . . . . . . . . 414.5 Data Set for Linear Regression Model . . . . . . . . . . . . . . . 424.6 Coefficients for Linear Regression Model . . . . . . . . . . . . . . 42

viii

List of acronyms andabbreviations

API Application Programming InterfaceAWS Amazon Web ServicesBSS Business System SupportDCN Data Center NetworkDEs Decision ElementsEM Element ManagementETSI European Telecommunication Standards InstituteGANA Generic Autonomic Networking ArchitectureHTTP HyperText Transfer ProtocolIaaS Infrastructure as a ServiceLB Load BalancerOSS Operation System SupportONIX Overlay Network for Information eXchangePaaS Platform as a ServicePUE Power Usage EffectivenessSaaS Software as a ServiceSDN Software Defined NetworkSLA Service Level AgreementSLO Service Level ObjectivesMBTS Model-Based Translation ServiceMANO Management and OrchestrationNF Network FunctionNFV Network Function VirtualizationNFVO Network Function Virtualization OrchestratorVM Virtual MachineVNF Virtual Network FunctionVNFM VNF ManagerVNFD Virtualised Network Function DescriptorVIM Virtualized Infrastructure Manager

ix

Chapter 1

Introduction

Today, people are communicating with each other through a huge networkand we can easily get in touch with other people. Different people owndifferent resources and information, hence resource transactions and sharingbetween people may bring large benefits and efficiency to today’s industry. Totransfer this data, a network has been developed which interconnects the world.However, as more and more people are utilizing this network, the aggregate dataflow has experienced an explosive growth. The development of the traditionalnetwork structure has fallen behind the increased network data flow. Modernnetworks that perform data flow control and network resource management havebeen developed.

This chapter addresses a specific problem in today’s’ cloud network of dataflow control and resource management and gives a description of NetworkFunction Virtualization (NFV) based Virtual Network Function (VNF) orches-tration. This chapter also describes the goals of this thesis project and outlinesthe structure of the thesis.

1.1 Motivation

With the development of data centers and cloud computing, today’s networkinfrastructure has become more complex and flexible. Techniques such asNetwork Function Virtualization (NFV) and Software Defined Networking(SDN) have changed the way network services are provided which has resultedin more flexible and dynamic service delivery [4]. Although these virtualizedtechniques help to create network functions in a more convenient way dueto the decoupling of function from particular hardware, it also increases theproblem of resource management. To manage the network, the concept ofclosed loop control has been introduced by Telia Company to realize networkauto-management. The closed loop control creates a feedback loop where thenetwork data generated by the system is input to the system management model.In this way, the real-time network data modifies the network control model andthus enables resource management to be both timely and automatic.

1

Cloud orchestration is used to manage the interconnections and resourceamong servers in a cloud infrastructure environment, thus enabling a system torespond to changes in workload. This orchestration help to coordinate networkresource management and network function control[4, 5]. By exploiting NFV,cloud orchestration can be implemented in a more convenient way and offersmore powerful functionality. This convenience and functionality can be used innext-generation network infrastructures [5].

Auto-scaling is a very important orchestration service which automaticallyadjusts the number of servers according to real-time network traffic. Thistechnique helps service providers switch off unnecessary servers and enablesthem to releases occupied network resources in a timely manner when thenetwork traffic is light. It can also switch on servers when the network trafficincreases. With auto-scaling, users not only save network resources but are alsoprovided with a high quality of service.

1.2 Problem

Network infrastructure has become more complex and flexible, and resources arepotentially shared among Virtualized Network Functions. Many cloud platformssuch as OpenStack have been used in conjunction with NFV technology todeploy various cloud services. Network virtualization bring a lot of benefits.For example, developers can develop services with more flexible functions ina convenient way. However, NFV has introduced difficulty in managing thisvirtualized resource. That is to say, efficient auto management of networkresources becomes more and more important.

On the other hand, machine learning has been widely used in data analysis,especially for pattern recognition and data prediction. However, it remainsdifficult to be applied in networking as network traffic usually varies with timeand is difficult to predict.

Many cloud infrastructures provide default simple auto-scaling and orches-tration mechanisms to better realize data flow control and resource management.For example, OpenStack’s heat orchestration service can provide auto-scalingby setting a threshold for a single resource data sush as CPU utilization. Asservers usually take some time to react to network traffic changes, a singleresource (especially CPU utilization) based heat orchestration service may havesome delay in responding to the changes. Since delay is always a serious problemand we want our system to have a quick response, especially when there is asudden growth of network traffic, we need to consider the following questions:

• Can we consider various types of network data to realize better networkauto-management and prediction?

• Can we combine different auto-scaling techniques and create a real-timecontrol engine for different network traffic patterns?

2

1.3 Purpose

The purpose of this thesis is to develop an online control engine for (web) serverauto-scaling in the OpenStack platform to realize better data flow managementfor NFV based cloud orchestration. This control engine will leverage statisticalanalysis or a machine learning technique. By investigating several naıve cloudorchestration techniques (such as heat orchestration), we can gain a basicunderstanding of how current Network Function Virtualization Orchestration(NFVO) has developed with respect to its use in a cloud environment. Thisdegree project intends to compare these orchestration techniques (naıve and anew online control engine) to find the best solution concerning data flow control.By using raw data from a network, this project intends to achieve automaticnetwork management. Hence functions such as resource allocation can beadjusted automatically based on real-time data. This auto adjustment shouldmake network management more convenient and flexible since the system’sbehavior is learned from observations rather than requiring a priori networkknowledge.

1.4 Goals

The goal of this degree project is to develop an OpenStack based onlineorchestration engine to solve the auto-scaling problem and compare it withan existing cloud orchestration service.

Firstly, this thesis project investigated several NFVO techniques used in themarket to better understand their data flow control mechanisms.

Secondly, based on the previous investigation and an OpenStack basedcloud environment, a simple network structure with clients, load balancer, andseveral web servers (VMs) was built. A machine learning based control modelwas developed to cooperate with the Nova API, Ceilometer API, and Heatorchestration API to realize web server auto-scaling mechanism by leveragingnetwork traffic data collected by Ceilometer.

Finally, a comparison is made between the new machine learning based onlinecontrol engine and the naıve Heat orchestration function in order to determinewhat method provides a better solution for the server auto-scaling problem.

1.5 Delimitations

This thesis only considers applying the proposed new online orchestration serviceto the auto-scaling problem. There are various other orchestration problemsfor NFV based cloud resource management, such as the data flow redirectionproblem for a load balancer. However, auto-scaling is a widely used techniquein cloud computing, hence it is worth investigating. This application can beconsidered a typical example of network auto-management.

3

The testing environment is an OpenStack based cloud system with only oneload balancer and few clients and web servers. Therefore, the competition forresources only exists at the server side, hence we do not need to consider complexrouting or forwarding protocols. The available VMs may become overloadedwhen the request rate grows which triggers the auto-scaling process.

For the auto-scaling mechanism, all the web servers are built by using thesame image which means the memory size, CPU processing rate, and allocateddisk space are the same for all the web servers. During the scaling down theprocess, we assume the choice of which web server is selected will not affect theresults or performance of the system.

Additionally, security and HTTP request checking are not considered in thiswork in order to simplify the testing set up.

1.6 Methodology

Quantitative and experimental research methods are used in this project todevelop the model and to propose a solution that achieves better results. Thequantitative research method is useful when performing experiments or testingsystems with a large database. A conclusion and all the derivation in this projectwill be drawn based on an analysis of the experiments and well-establishedtheories.

The development of a new online control engine is based on existingorchestration services and well-developed machine learning models. A lot ofinvestigation was done before starting to develop a service on the OpenStackCloud Platform.

This master degree project follows the development process of a ML-basedsolution as described in [6]. Starting from testing environment set up, then tothe data collection and processing, then to the model training, after that is themodel testing step by using new data generated by the real system and finallythe testing outcome analysis step which can be used as a feedback to improvethe model.

As this project is to be carried out in the Telia Company, it is importantthat everything that is supposed to be confidential remains confidential.

1.7 Outline

This thesis studies basic network data, specifically incoming and outgoingnetwork request rates and CPU utilization of VMs by leveraging the OpenStackbased cloud platform. The study was performed in the following steps:

1. A literature study was made of existing orchestration frameworks and dataflow control methods, including the advantages and disadvantages of these

4

methods. Chapter 2 describes the analysis of these models.

2. The next step was to configure a lab environment based on the OpenStackcloud platform, including understanding how different OpenStack modelswork with each other and how to interact with the models’ APIs. Chapter3 introduces the OpenStack lab.

3. The next step was to establish a basic offline training model by investigat-ing two different machine learning models: Neural Networks and LinearRegression. Chapter 4 describes the process of offline training.

4. Based on the offline training, real-time network data is collected byceilometer and used as feedback to create an online control engine.The testing results are analyzed and a comparison made between heatorchestration and the proposed new online control engine based on twoclients’ request patterns. This is described in Chapter 5.

5. Chapter 6 states a conclusion with a summary of the results andlimitations of this project. This chapter also suggests some future work.

5

Chapter 2

Background

This chapter provides basic background information about cloud computing andNFVO based auto network management. Section 2.1 gives an introduction totoday’s data center network. Section 2.2 introduces the Telia Company andTelia’s strategy of closed-loop control. Section 2.3 gives an overview of cloudcomputing. Then, Section 2.4 discusses virtualization and what advantagesand disadvantages virtualization can bring to a software-based network. Next,Section 2.5 describes the basic NFV framework according to the ETSI standard,while Section 2.6 shows some existing auto network management techniques.Finally, Section 2.7 discusses the reason why we considered machine learning asa powerful tool for network management along with summarizing some previouswork in this area.

2.1 Data Center and Network

Today, few workloads execute on a single computer. Clients, servers, ap-plications, and middleware may be distributed over many different nodes.Cooperation between different data centers provides even more powerful networkfunctions and services. However, distributed computing frameworks require thenetwork to transfer information. This section describes the concept of a datacenter and modern data networks, as well as the relationship between them.

2.1.1 Data Center

The concept of a data center was introduced to house a diversity of computersystems and associated components (such as storage devices, power supplies,cables, and data connections). We consider a data center to be a pool ofresources. As data centers house large and complex server clusters, theyare commonly run by large companies or government agencies. With thedevelopment of cloud technology, data centers are increasingly used to providecloud services and virtual computing resource for small business or individualusers. However, it should be noted that there are data centers run by smallcompanies and even individual users.

7

Because a data center is typically very large and usually has many backupsystems and storage, it consumes a large amount of power. A very importantfactor is Power Usage Effectiveness (PUE). PUE describes the power efficiencyof a data center. A lot of effort has been made to realize high PUE values, thuscreating energy efficient data centers.

2.1.2 Data Center Network

The components within a data center frequently share data or functions witheach other. This sharing means information transfer is very important withindata centers in order to realize different applications or functions.

A data center network (DCN) plays an important role as it interconnects thecomponents within a data center. As data centers are usually large-scale clusterscontaining thousands of nodes or even more, DCNs are usually very complexand are not easy to build or manage. A successful DCN architecture provideshigh scalability, high fault tolerance, and high-speed connectivity. There aremany types of DCNs (such as three-tier DCNs, Fat tree DCNs, and DCell).These different types of DCNs aim to realize a more stable or powerful networkstructure.

Figure 2.1 shows the structure of a three-tier DCN described in [1]. A three-tier DCN consists of three layers, each with its own type of network. Thesenetworks are called access network, aggregation network, and the core network.These three networks are connected via switches. The lowest layer, the accessnetwork layer, consisted of servers and layer 2/layer 3 (L2/L3) top of rackswitches. Each server in the access layer is connected directly to one of theseL2/L3 tops of rack switches. The aggregation layer contains higher layer L3switches which connect the L2/L3 tops of rack switches. The core switches inthe core layer are responsible for connecting aggregation layer switches as wellas connecting the data center to the Internet. Although the three-tier DCNarchitecture is the most common network architecture used in datacenters today, it has poor scalability. Hence it cannot deal with thegrowing demands of cloud computing.

The fat tree DCN is an improved version of the classic three-tier architecturewhich also realizes a hierarchical network structure of access layer, aggregationlayer, and a core layer. However, it contains more network switches and isdivided into k pods. Each pod contains (k/2)*2 = k servers, k/2 access layerswitches, k/2 aggregate layer switches, and k core switches. Each core switchconnects to one aggregation layer switch in each of the pods. By using k coreswitches, the fat three DCN overcomes the oversubscribed problem by realizinga 1:1 subscription ratio. However, scalability remains a big problem forfat tree DCN.

C.Guo and et al.[7] introduce DCell as a distributed DCN architecturewhere servers directly connect to other servers. The result is a highly scalablearchitecture that can potentially provide greater network capacity. In a DCell

8

Figure 2.1: Three Tier network architecture([1])

structure, low-level Dcells are a fully connected network consisting of servers.Higher level DCells are formed from several low-level Dcells. Instead of usinghigh-end core switch, DCell uses a mini-switch to scale out, thus providinggreater scalability. The major issue of DCell is the network latency andcross-section bandwidth. Additionally, how to direct network trafficin and between layers is another problem.

2.2 Telia Company and Strategy

Telia company is an international company with employees and customers allover the world. The company intends to develop the next generation networkwhich could provide more powerful communication service. They also work likea hub which connect digital ecosystem, people, companies and societies together.The headquarters, which located in Stockholm, act as heart of innovation andtechnology.

This master thesis has a close link to Telia strategies within the GSONetwork department where technology innovation matched with efficient re-source utilization and accelerated time to market. For example, based on thecloud-native requirement some sophisticated load balancing algorithm shouldbe investigated to achieve better data flow control. Several cloud orchestrationservices, such as auto-scaling, have been considered for better cloud resourcemanagement. A concept called closed loop operation has been introduced whichintends to realize network auto-management based on the system’s feedback.

2.3 Cloud Computing

Cloud computing is a technique which enables clients to access a pool of systemresources or service over a network. These shared resources could be a well-developed application or simply a virtual resource, such as storage. Clients useand only pay for the service or resource they use, hence they do not need to be

9

concerned with how the underlying system manages the software and hardwareresources. Data centers make cloud-based services and applications possible, asa large data center is logically a big resource pool which can provide a widediversity of services.

Additionally, by sharing resources, it is easier for a given service to havejust the correct amount of resources that it needs at a given time. Cloudcomputing suppliers usually provide their services via one of three types ofmodels: Infrastructure-as-a service (IaaS), Platform as a Service (PaaS), andSoftware as a service (SaaS), these models offer increasing abstraction to users[8, 9]. Figure 2.2 shows these service models.

Figure 2.2: Cloud Computing Service Models

Many cloud platforms are built by leveraging an open-source softwareplatform (such as OpenStack) to provide IaaS, thus making virtual resources(such as virtual servers) available to users. As most of the services can betreated as a workflow through a chain of network functions, the managementof a cloud platform can be split into two parts: resource allocation to differentfunctions and data flow control through the allocated resources and deployedfunctions.

2.4 Virtulization

A very important concept in cloud computing is virtualization. Virtualizationhas been used in many areas in addition to cloud computing to provideconvenience and flexibility. As PUE is always a very important factor whenevaluating a data center, virtualization helps to save energy as the resourcesof a VM can be provisioned dynamically as the workload changes; hence theworkload can have just enough resources for its needs, but not more.

Server level virtualization can save a lot of resources as these resources areshared among VMs and these VMs are dynamically allocated, hence avoiding

10

unused hardware. Taking NFV as an example, virtualization helps to implementa network function by using software (i.e., the network functions are no longercombined by connecting particular hardware devices). In this way, the requirednumber of VMs with the appropriate software deployed on them can be usedto realize the equivalent chain of network functions, while being able to bedynamically scaled up or down. However, as proved by Georgios P. Katsikas[10],the NFV service chain could result in performance degradation and high latency.As service chain heavily relies on CPU performance and existing NFV systemsusually use multiple CPUs in parallel to realize these service chain, NFV servicechains still face performance problems even though some new techniques suchas fast network drivers have been used.

Another type of virtualization which has been widely used is network levelvirtualization. Take SDN as an example; network-level virtualization providesflexible and logically centralized management through a central controller[11].

2.5 Network Function Virtualization Framework

Network function virtualization has been widely used in cloud computing as thisdecouples the network functions (NFs) from particular physical infrastructure.Breaking the binding between NFs and hardware can provide a lot of flexibilityin network management and improved resource utilization. That is to say,network functions and service could be achieved in a more efficient way [12].

To develop standards for NFV, the European Telecommunication StandardsInstitute (ETSI) has proposed a management and orchestration (MANO)framework [2]. This model is also the basic framework used by Teliacompany to realize network auto-management. MANO consists of threemain components: NFV orchestrator (NFVO), VNF manager (VNFM), andVirtualized infrastructure manager (VIM). These components are connectedto network elements through reference points. VNF and NFV infrastructure(NFVI) consists of the basic NFV architecture layer within the network.Element management (EM) and Operation system support (OSS)/BusinessSystem Support (BSS) consist of the network management system. Figure 2.3shows this architecture.

Each model in the MANO framework has been introduced in [13] and[2]. VIM is connected to the NFVI model which contains both software andhardware resources. VIM can manage and control the resources in NFVI, usuallywithin one operator’s infrastructure domain. MANO may contain several VIMs.Each VIM is allocated to a particular service to manage the resources forthis service. A VIM could also be used to support the management of VNFforwarding graphs by creating and maintaining virtual networks.

A VNFM is connected to VNF and EM elements within a networkarchitecture to manage the lifecycle of VNF elements. The matching modelbetween VNF instances and VNFM is many-to-one, which means one VNFinstance is associated with one VNFM, while one VNFM could manage multiple

11

Figure 2.3: ETSI MANO Framework([2])

VNF instances. A Virtualized Network Function Descriptor (VNFD) is akind of template which describes the deployment and behavior of each VNFinstance and can be used to create and manage VNF instances. The matchingmodel between VNFD and VNF package is one-to-one, which means one VNFDdescribes only the attributes and requirements of one VNF instance. A VNFMis maintained and controlled by NFV orchestrator.

The NFVO is responsible for orchestrating multiple NFVI hardware orsoftware resources through VIMs and managing the lifecycle of Network Services(NSs) through VNFMs. Four data repositories have been connected to NFVO:NS catalog, VNF catalog, NFV instances, and NFVI resources. These are usedto store information about NSs, VNF packages, VNF instances, and NFVIresource, respectively. NFVO could make use of the information from thesefour data repositories to provide end-to-end services.

2.6 Machine Learning in Networking

As network auto-management usually depends on the current network state,this means that real-time network traffic data should be collected and analyzedfor future decision making. Machine learning has been widely used in dataanalysis and prediction. Machine learning usually has two steps: training andtesting. The training step is based on a large number of samples of observeddata used to build a model. The testing step is an evaluation of the model andcan provide predictions based upon future input data.

12

Machine learning within the network area remains a fresh concept worthinvestigation. Many machine learning based applications in various key areasof networking have been introduced, compared, and evaluated in [6]. Machinelearning has been extensively applied to several problems for the network, suchas pattern recognition, understanding network traffic, predicting service metrics,outlier detection, and so on. Machine learning techniques have mainly been usedfor network operations and management

Although there is a dire need for machine learning based network man-agement and operation, it still remains a big challenge for this area[6]. Thereasons lie in two aspects: Firstly, networks differ from each other, hence itis difficult to find a standard to attain uniformity across networks. Therefore,the trained model which proved to work in one network may be unsuitable foranother network. Secondly, the network continues to evolve which means anapplication developed by using a fixed set of patterns may not be useful fornetwork operation and management in the near future.

Several techniques such as SDN and NFV have been developed to promotethe applicability of machine learning in networking. These techniques providea new way to program network by leveraging well-developed software structure.They also leverage the concept of virtualization to realize network operationand management in a more efficient way

2.7 Related Work

Many research related to auto network management and orchestration havebeen done in previous work. This section presents previous work in areas ofmodern NFV based auto-management structure, cloud auto-scaling techniqueand machine learning-based network analysis.

2.7.1 Auto Network Management and Orchestration

A very important feature of MANO is auto-management with all the networkfunctions and resources managed automatically depending upon the currentnetwork state at the cloud provider’s side without clients’ interference.

A concept called an autonomic network was introduced together with a newarchitecture called Generic Autonomic Networking Architecture (GANA) Asdescribed in [3] and [14], GANA is a reference model for autonomic networkingand self-management of networks and services. GANA is a hierarchical controlloops framework consisting of four levels: Protocol Level, Function Level, NodeLevel, and Network Level. Each level has Decision elements (DEs) and controlloops to detect the current network state. DEs on a higher level can control andmanage the DEs on a lower level [15]. If a certain decision cannot be made by alower level DE, it will be forwarded to a higher level which has more informationabout the state of the whole network. Figure 2.4 shows the basic structure of

13

GANA.

Figure 2.4: AFI GANA Reference Model([3])

As an automatic DE, a DE can auto-discover network instances and policiesor other DEs it may collaborate with. Each DE is assigned one or more ManagedEntities (MEs). DEs can automatically discover the required network resourcefor these MEs. Based on these discoveries, a DE can perform auto-performancetuning of its assigned MEs (such as self-configuration, self-optimization, self-repair, and so on). An ME is a managed resource which can vary dependingon what kinds of management the GANA requires. The ME could manage anindividual network element or a complex application.

The Overlay Network for Information eXchange (ONIX) is a distributedscalable overlay system and enables the auto discovery of information or entitiesfor DEs. A Model-Based Translation Service (MBTS) is the middle layerbetween GANA Knowledge Plane DEs and a knowledge plane. MBTS translatethe raw data provided by vendors’ equipment to a common data format whichcan be used by network-level DEs.

14

2.7.2 Auto-scaling Technique in Cloud Orchestration

The auto-scaling technique is a simple NFV based orchestration service whichprovides resource management. Many rule-based auto-scaling approaches havebeen developed. For example, the Auto-scaling service provided by AWS[16],Azure Autoscale service provided by Windows Azure[17], and Scalr[18] arewidely known.

However, whether auto-scaling based resource management is useful and howmuch benefit it can bring have been widely debated and studied. Ming Mao andMarty Humphrey have discussed how auto-scaling can be used to reduce cost[19].In [20], an auto-scaling framework called SmartScale was developed which bringsa lot of benefits by minimizing resource usage cost. SmartScale combines verticalscaling (adding more resources to existing VM instances) and horizontal scaling(adding more VM instances) to ensure applications’ scalability. Deadline ofapplication is another constraint that needs to be carefully considered for auto-scaling, Y. Ahn and Y. Kim [21] investigated various workflow patterns andextended a task-based auto-scaling algorithm[22] to support workflows. Thisauto-scaling method can detect delay and deadline violations by comparingactual finish time and estimated finish time of running tasks and adjusts thenumber of VMs appropriately.

Many open source cloud platforms, such as OpenStack, support various auto-scaling approaches. For the OpenStack based cloud platform, Heat orchestration[23] can provide auto-scaling service in co-operating with the ceilometer datacollection service. A heat orchestration template is used to manage differentOpenStack resources, such as AutoScalingGroup and ScalingPolicy resourcefor a Heat service and an Alarm resource for a ceilometer service. Anotheropen-source platform called Docker[24] can also be used to develop applicationson hybrid hosts and realize some auto-scaling functions. Y. Li and Y. Xia[25] designed a platform which can auto-scale web applications in a Docker-based cloud environment. A scheduling controller has been built in thisproject(introduced in [25]) to realize application management by combining aprediction and reaction algorithm.

2.7.3 Machine Learning Based Network Analysis andManagement

Many people consider machine learning a powerful tool to realize data flowcontrol or to realize auto-management of a network. However, the traffic withina network changes over time which makes machine learning in this area difficultto implement.

A self-adaptive controller combined with two reinforcement learning ap-proaches, Q-Learning and SARSA, was developed in [26] for fuzzy cloud auto-scaling. The OpenStack platform was used in this paper, and the controller canbe considered as an extension of the fuzzy controller described in [27] whichexploit fuzzy logic for cloud-based software to realize cloud elasticity.

15

On-line learning can be considered for streaming data and real-time analysis.In [28], a real-time analytics engine was introduced to process real-timenetwork traffic rather than require a priori detailed knowledge of the system’scomponents. The project focuses on a critical part of service assurance,namely, the capability of a provider to estimate the service quality basedon measurements of the provider’s infrastructure. Several machine learningbased statistical models such as lasso regression, regression tree, and randomforest were compared. The result shows that the random forest has the bestperformance. This project also designed a real-time analytics engine whichperforms online training and prediction which can be considered as a buildingblock for future real-time network auto-management.

Sabidur Rahman and et al.[29] provide a better way to auto-scale VNFsbased upon dynamic network traffic changes. Traffic load and traffic loadchanges from the recent past are considered as input data, and the requirednumber of VNF instances are generated as output. As this is a supervisedclassification problem, seven machine learning methods were considered. Theresult shows that the random forest has the highest precision and lowest falsepositive rate, thus it can help containers to improve their QoS and achieve lowerleasing cost.

16

Chapter 3

Testing EnvironmentEstablishment

This chapter describes the procedure of how we set up the testing environment.Section 3.1 gives an introduction to the OpenStack cloud platform we usedin this project including a detailed description of each OpenStack service andhow they cooperate with each other. Section 3.2 introduces the overall testingenvironment structure we designed for online resource management along witha detailed introduction of each element. Section 3.3 gives an overview of howdefault auto-scaling mechanism provided by OpenStack Heat OrchestrationService works. Then, Section 3.4 gives an introduction to our new auto-scalingmechanism.

3.1 Setting Up a Cloud Platform

This section starts with a brief introduction to the OpenStack platform. Thengives more detailed information about each OpenStack service and how theycooperated with each other.

3.1.1 OpenStack Platform

The OpenStack platform used in this project is based on the Mirantis CloudPlatform (MCP) maintained by Huawei. MCP includes individual VM artifactsfor core services and provides a suite of open source Operations Support Systemswhich can help users to log, control, and monitor OpenStack services in a betterway[30]. MCP uses the DriveTrain toolchain[31] to continuously delivered theseservices to a cloud environment.

3.1.2 Introduction of Different OpenStack Services

The core services supported by MCP are Keystone, Glance, Nova, Cinder,Neutron, Horizon, Heat, Ironic, Designate, and Ceilometer. These are described

18

in more detail below:

Keystone Keystone is the OpenStack Identity Service which could provide API clientauthentication, service discovery, and distribute multi-tenant authoriza-tion.

Glance Glance is the OpenStack Image Service via which users create or discoverVM image metadata.

This metadata can be used by another service through a RESTful API.For example, Nova can launch new instances based on the image providedby Glance.

Nova Nova is the most important services supported by OpenStack as it providesa way to create and manage a compute instance. Nova can help to createvirtual machines, bare metal servers, and even provides limited supportfor system containers. Nova runs as a set of daemons on top of existingLinux servers.

Cinder Cinder is the OpenStack Block Storage Service which provides blockbackups and makes OpenStack platform fault-tolerant, recoverable, andhighly available.

Neutron Neutron provides network connectivity between interface devices, suchas virtual network interfaces (vNICs). It helps to maintain the networkbetween instances and manages the routers and interface for each network.

Horizon Horizon provides a web-based user interface to OpenStack services andis implemented as OpenStack’s Dashboard. Horizon provides a graphicaluser interface (GUI) enabling users to manage and maintain OpenStackservices in a more convenient way.

Heat Heat provides OpenStack’s Orchestration service to create a human-accessible and machine-accessible service for managing the entire lifecycleof the cloud infrastructure and applications within OpenStack clouds.

Ceilometer Ceilometer is the OpenStack data collection Service which provides aservice to transform and normalize data across all current OpenStack corecomponents. Work is underway to support future OpenStack components.Ceilometer has meters to collect data from different resources and storethese data into a Mongo Database. Ceilometer can provide data for theHeat service to realize the auto-management of stack resources.

3.1.3 Relationship Between OpenStack Services

Figure 3.1 shows the relationships between each of the OpenStack services.This project mainly based on four OpenStack services API: Nova, Neutron,Ceilometer, and Heat.

19

Figure 3.1: Relationship Between Openstack Service

3.2 Testing Environment Structure

The structure of the testing environment is introduced in this section along witha detailed description of each model. We built our own testing environmentbased on MCP provided by Telia Company. Figure 3.2 shows the structure ofthis testing environment.

This project was divided into two tenants: an autoManage tenant and aLoad balance tenant. Within the autoManage tenant, we created a server poolwith limited network resources (such as IP address, virtual CPUs, and so on).These resources are subsequently allocated to new web server instances. Thenumber of web server instances within the server pool is flexible and depends onthe requirements needed to service the current load. Within the Load balancetenant, a load balancer and several HTTP traffic generators have been created.These HTTP traffic generators act as if they were real-world clients, while theload balancer directs incoming traffic to different web servers.

We also have a controller for overall control of all the instances via OpenStackservices which provides a higher layer for this project. The ceilometer datacollection model, offline Machine Learning training model, and Online Auto-Scaling control engine are located in the controller. The data collected by theceilometer is stored in a MongoDB in the controller and fetched by the offlinetraining model via a database interface. After model training, the trained modelis integrated into the online control engine which generates a control signal to

20

Figure 3.2: Structure of Testing Environment

realize the auto-scaling mechanism. More details will be given for each modelin the following sections.

The hardware configuration for OpenStack Lab is shown in Table 3.1.

3.2.1 Network Topology

Using the testing environment structure, a network topology was created viathe OpenStack Neutron service for use in this project. The resulting topologyis shown in Figure 3.3 and Figure 3.4.

The number of clients in client network shown in Figure 3.3 is flexible anddepends on the testing requirements. The webServer shown in Figure 3.4 in theserver network represents one instance from the server pool. These instancescan be automatically added or removed by applying an auto-scaling mechanism.

These two tenants are connected via a shared load balancer network. Theload balancer receives incoming HTTP requests from the client network anddistributes these requests to web servers in a server network. This meansthat the actual IP address of the web servers is private and invisible to theclients, only the IP address of the load balancer is visible. Therefore, clientscommunicate with the load balancer via this single IP address.

The admin floating network represents the public network. The controllerand another OpenStack services communicate with each instance and performnetwork resource management through this network by using an ssh tunnel.

21

Figure 3.3: Network Topology of Load Balance Tenant

Figure 3.4: Network Topology of autoManage Tenant

22

Table 3.1: OpenStacke Lab Hardware configuration

Type Function Model Configuration

Server

Management Node CH222 v3

CPU: Intel CPU E5-2658A v3 x 2Memory: 256GB ( 16GB x 16 )Disk: 900GB 10K 2.5’ SAS Hard Disk x 6 + 1.6TB 2.5’ SSD Hard DiskRaid Card: LSI3108 Raid CardMezz Card: MZ310 + MZ310 ( Dual 10Gb Ethernet Ports per Mezz Card)

FusionStorage Node CH222 v3

CPU: Intel CPU E5-2658A v3 x 2Memory: 256GB ( 16GB x 16 )Disk: 900GB 10K 2.5’ SAS Hard Disk x 12 + 1.6TB 2.5’ SSD Hard DiskRaid Card: LSI3108 Raid CardMezz Card: MZ310 + MZ310 ( Dual 10Gb Ethernet Ports per Mezz Card)

Compute Nodes CH121 v3

CPU: Intel CPU E5-2658A v3 x 2Memory: 256GB ( 16GB x 16 )Disk: 900GB 10K 2.5’ SAS Hard Disk x 2Raid Card: LSI3108 Raid CardMezz Card: MZ312 ( Four 10Gb Ethernet Ports ) + MZ710 ( Dual 40GbEthernet Ports )

Blade Chassis E9000

Form Factor: 12UEmbedded Switch: CX310( 16 x 10GE uplink, 32 x 10GE downlink ) x 2 + CX710( 8 x 40GE uplink, 16 x 40GE downlink )Fan Module: 14 hot-swappable fan modules in N+1 redundancy modePower Supply Module: Maximum six 3000W/2000W AC or six 2500W DChot-swappable PSUs, N+N or N+M redundant

Stroage IP-SAN Storage OceanStor 5500 v3

Dual Controllers128G High-speed Cache10G iSCSI port x 8600G 15K 2.5’ SAS Hard Disk x 25

3.2.2 Clients

Each client instance in this project simulates a user’s behavior in the real world.Each client instance is a centos OS-based virtual machine created and managedby the Nova service. Httperf [32, 33] has been used to generate various HTTPworkloads for measuring web server performance. The operation of httperf canbe controlled through options such as rate, max-connections, timeout, and so on.

When a client is created, it sends HTTP requests to the load balancer viahttperf commands using a shell script. Within this shell script, loops are createdfor clients to create HTTP requests and to change the request rate. The HTTPrequest rate is determined by a rate option and changes over time following agiven load pattern [28]. These load patterns are:

Constant-load pattern A fixed number of clients will be created. These clients send requests ata constant rate.

SMOOTH-load pattern The HTTP request rate starts at an initial rate and increases by certainincrement over a range after each loop. After certain loops, the incrementwill be a negative value, hence the HTTP request rate will decrease in thefollowing loops and return to the initial rate.

For example, the initial rate could be 500 requests/s and the incrementis 20 requests/s, then after the first loop, the HTTP requests rate willbecome 520 requests/s.

SPIKEY-load pattern This pattern follows similar increasing and decreasing rules as theSMOOTH-load pattern, but with a flash event. During this flash event,

23

the HTTP requests rate will suddenly surge to a very high value and thendrop back to the original value after a very short time.

The above load pattern allows us to test the web server performance for mostsituations. Timestamps and the HTTP requests rate at each timestamp will beoutput to a file for further analysis.

3.2.3 Load Balancer

Load balancers are needed to distribute incoming HTTP requests from theclients to web servers within the server pool in the autoManage tenant and tohide the private IP address of web server from the public network for securitypurposes. As described earlier, the load balancers used in this project are locatedin a shared network between two OpenStack tenants. The load balancers cancommunicate with the web servers and clients through their private IP addresses.The service we used to realize load balancing is HAProxy [34]. HAProxy is opensource software which can be used to build a proxy server to provide a highlyavailable load balancing service for TCP and HTTP-based applications.

The load balancer in this project is a virtualized network function which canbe treated as an instance in the OpenStack Platform and this created throughthe Nova service. This instance is a Centos OS-based virtual machine withthe HAProxy service[34] configured in it. A round-robin policy was used fortraffic distribution. An IP address pool of web servers was created for the loadbalancer to choose from. The private IP address within this IP address pool wasallocated to each web server during instance initialization. Figure 3.5 shows thebackend and frontend set up for the load balancer service used in this project.The size of the web server pool is ten, hence there are ten available IP addressesfor web servers to choose from.

3.2.4 HTTP Web Server

Each web server in the server pool is an individual instance running on CentosOS. An instance can be launched in two different ways, through the Heat servicefor a traditional auto-scaling mechanism or via the Nova service for our newonline control engine. More details will be given for the two different auto-scaling mechanisms in Sections 3.3 and 3.4, respectively.

After the instance has been built, we configure some web services in itand make it a running web server. The web server program we used in thisproject is the Apache HyperText Transfer Protocol (HTTP) server programcalled httpd[35] This server handles HTTP requests through a pool of childprocesses or threads. It will be configured in each instance during initializationvia the user data option by using the commands below:

1 sudo sys t emct l enable httpd . s e r v i c e2 sudo sys t emct l s t a r t httpd . s e r v i c e

24

Figure 3.5: Load Balancer Configuration

3 sudo s e t s e b o o l −P httpd can network connect db=1

We assume that the network resources in a web server pool are limited; hencethe maximum number of web servers is also limited. For example, if we havean IP address pool with only ten private IP addresses in it, then we can createat most ten web servers. How many network resources should be allocated toeach server pool is the question we need to consider carefully given the requiredservice demand and the capacity of each server.

3.3 Default Auto-scaling Mechanism

To realize network resource auto management, OpenStack has developed a lotof services for orchestration. Heat service is one of these services. As a well-developed project, heat works with ceilometer to realize a basic auto-scalingfunction which is considered the traditional auto-scaling mechanism is thisthesis.

25

3.3.1 Heat Template

To realize cloud orchestration, heat creates a component call stack by using theAWS Cloud Formation template format or native Open-Stack Heat Orchestra-tion Template (HOT) format. Both of these formats enable the developer todescribe applications, services, and network infrastructure within a stack in aneasy way. The template format used in this thesis is HOT, and several resourcetypes are included in it, such as floating IP address, image type, volume, privatekey, security group, and user data.

Three HOT files has been used in this project to create a stack with thetraditional auto-scaling mechanism. These files are centos, environment andautoScaling.

centos Describes the resource type for one instance (of a web server) in theserver pool.

This file should include resource types such as OS::Nova::Server(definesinstances’ name , flavor, key name, image type and user data), OS::Neutron::Port(defines which subnet this instance belongs to), and OS::Neutron::Flo-atingIP(allocate one public floating IP address to this instance).

environment Describes the environment variable for stacks.

For example, we can define centos as a user-defined resource in theenvironment file based on the centos file we created earlier.

autoScaling Used to describe autoscaling mechanism for the stack.

Some Heat and Ceilometer resources will be defined in this file for the auto-scaling function. This file is the most important file for the traditionalauto-scaling mechanism. More details about this file will be given in thenext section.

3.3.2 AutoScaling File

Three resource types will be used in this file to realize auto-scaling mech-anism. They are OS::Heat::AutoScalingGroup, OS::Heat::ScalingPolicy andOS::Ceilometer::Alarm.

• OS::Heat::AutoScalingGroup: this resource could define a web server poolwith property of cooldown period (to ensure that stack doesn’t launchor terminate additional instances before time out ), max size (maximumnumber of servers in the pool) and min size (minimum number of serversin the pool). The resource type for instance within server pool isOS::Nova::Server::Cirros which has already been defined in environmentfile.

• OS::Heat::ScalingPolicy : Based on the auto-scaling group created before,this resource defines adjustment type (ChangeInCapacity, ExactCapacityor PercentChangeInCapacity), cooldown period and scaling adjustment(po-sitive value means scale up and negative value means scale down). Forexample, if adjustment type is ChangeInCapacity and scaling adjustmentis 1, one new instance will be created once the threshold of scale-up policyhas been reached.

26

• OS::Ceilometer::Alarm: This resource is based on ceilometer to createan alarm for Heat Scaling Policy. Ceilometer has the different type ofmeter like cpu util, network incoming rate and so on for different networkresource. It defines meter name (which meter will be used for auto-scaling), statistic (average, maximum or minimum), period, threshold toalarm, alarm actions (defines what the system will do once the alarm becalled) and comparison operator (gt means greater than threshold and ltmeans less than threshold).

3.3.3 How Default Auto-scaling Works

Ceilometer periodically and automatically collects data from different networkresources through meters. The sampling period for data collection is definedin the ceilometer configuration file (ceilometer.conf ) and pipeline configurationfile (pipeline.yaml) and can be changed according to requirements. The dataprocessing progress can be considered as a pipeline within Ceilometer Service.Pipelines describe a coupling between sources of data and the correspondingsinks for data transformation and publication at a configuration level[36].Within the pipeline.yaml file, there is a parameter called interval which isset to change the sampling period.

Once data has been collected and saved into the database in the controller,a ceilometer alarm could provide Monitoring-as-a-Service for a resource runningon Open-Stack by using these data. There are three states for alarms: ok, alarm,and insufficient data. The ok state will be set once the rule governing the alarmhas been evaluated as false. The alarm state will be set once the rule governingthe alarm has been evaluated as true. Finally, the insufficient data state willbe set when there are not enough data points available during the evaluationperiods to meaningfully determine the alarm state [37].

When the alarm state changes from ok to alarm, the heat orchestrationservice will invoke a Scale-up or Scale-down action. Then depending on thethresholds this could automatically create or delete one web server in the stack.Figure 3.6 shows the data flow.

3.4 New Auto-scaling Mechanism

In this project, we have created a new online control engine based on a machinelearning model. This control engine can realizes a new auto-scaling mechanism.Instead of using the heat based stack to manage a resource, this online controlengine cooperates with the Nova service to realize a server auto-scaling function.As before, the ceilometer service is used for data collection.

The machine learning model generates a control signal that depends on real-time network data. In contrast to the default auto-scaling mechanism wherethe scaling action is to add or remove one server, the scaling action for the newauto-scaling mechanism adds or deletes servers to realize the new target number

27

Figure 3.6: The Data Flow Among Different Modules for Default Auto-scalingMechanism

of servers. Figure 3.7 shows the data flow for this new auto-scaling mechanism.

Figure 3.7: The Data Flow Among Different Models for New Auto-scalingMechanism

28

Chapter 4

Data Collection and OfflineModel Training

This chapter describes the procedure of how we collect data from the systemand how we realize offline training. Section 4.1 gives an introduction to theservices we used to collect network data along with a detailed analysis of thesenetwork data. Section 4.2 introduces the offline training models we used in thisproject along with the training processes and results.

4.1 Data Collection

As we all know a labeled dataset is very important for the machine learningtraining step. This section introduces how we generate network data and howwe store and fetch this data depending on the desired real-time client requestpattern

All the network resource data on the server side is collected through theceilometer service. Some parameters such as sampling period should be decidedupon before data collection. If the sampling time is too large, the testing periodwill be too long (i.e., the system will not be very responsive to changes) andthe data we collect will not reflect real-time changes. On the other hand, ifthe sampling period is too short, there will exist some data overlap betweentwo sampling periods. After some initial testing, we set the sampling time forceilometer to 3 minutes in this project.

HAproxy[34] also provides a variety of statistics metrics to monitor thenetwork and to show current states. The statistics metrics will be used inthis project is showed as below.

scur number of current sessions

slim Session limitation. In this project, it has been configured to 5,000

eresp Accumulated number of response errors.

status Server status(UP/DOWN/NOLB/MAINT/MAINT(via)...)

30

rate Number of sessions per seconds during the last elapsed second

rtime The average response time in ms over the 1024 last requests

ttime The average total session time in ms over the 1024 last requests

These data are achieved through HAProxy’s interface by using socket commu-nication. A service call netcat[38] is used to realize this socket communicationwith the following comment:

1 echo ”show s t a t ” | nc −U / var /run/haproxy . sock

4.1.1 HTTP Request Pattern

Using the SMOOTH-load pattern introduced earlier, three clients are createdto generate HTTP requests. Figure 4.1 shows the real-time patterns of theiraggregated HTTP request sending rate.

Figure 4.1: The Pattern of Client Request Sending Rate (16 hours)

This figure only shows the changes in client sending rate over a short period(16 hours). It starts with a very small value and smoothly increases to amaximum value and then decreases smoothly. Finally, the sending rate willreturn to the initial value. The whole process of increase and decrease can beconsidered as one loop and will be repeated over and over again. We let thesystem run for over five days to generate the training data. As a result, thereare nearly 2400 timestamps in the data set.

4.1.2 Features Within Data Set

The network data of different resources are collected by the ceilometer ser-vice through meter instances. One meter provides one feature for ourdataset. Although there are different types of meters (features) - shownin Figure 4.2, some of them do not contribute to the final results. For

31

example, disk.device.read.bytes.rate will always remain 0 and disk.allocationdisk.allocation remains a constant. As both of these types of data do notchange over time they do not reflect real-time network conditions. We alsoremoved all the cumulative data types from the data set, as it is difficultto detect real-time network changes when using cumulative data. Althoughwe could derive the difference between cumulate values and act on thesedifference, the results have a similar function as the gauge values. For example,disk.device.write.bytes could be used and the difference between cumulate values(assuming the sampling period is 1 second) would give the byte rate which isthe same as the disk.device.write.bytes.rate gauge.

Figure 4.2: Meters for Ceilometer

After feature selection, we decided to keep six meters (features) in our dataset. We divided the training dataset into two parts, X input data set and Youtput data set. The structure of the training data set is shown in Figure 4.3.X is the data set we input to our machine learning model, and Y is the numberof web servers required by the system, i.e., the output of our machine learningmodel. For the training step, Y is generated by the default auto-scaling policyprovided by OpenStack.

32

Figure 4.3: Features in Training Data Set

4.1.3 Data Pre-processing

After we have collected all the data we need, we need to pre-process this datato make it fit for our training model. This data pre-processing can be dividedinto two steps.

In the first step, we will compute the statistic (like average, sum, etc.) valueof all the instance within a sampling period (3 minutes) as the sampling value forthis period. The reason for this operation is two-fold: (1) the heat based defaultauto-scaling mechanism only considers one network feature (average CPU util);however, we intend to create a simulation model based on more of the networkfeatures, specifically the average value for a particular feature within the samesampling period. The statistic we chose here depends on the types of networkfeature we want to process. The reason why we consider average as the mostimportant statistic is that we want to mimic what Heat orchestration servicehas done to these data. The average value is the most frequently used statisticsfor CPU utilization of single instance in Heat service and for network bytesincoming and outgoing rate, we will use sum instead of average. (2) At eachtimestamp, we should have only one data value for each feature as an input toour model.

Figure 4.4 and Figure 4.5 show us an example of pre-processing the CPUutilization data by using the average. We can see from the first figure thatthere are four instances during the sampling period (i.e., there are four webservers in the system during this period). We compute the average CPU utilvalue of these four web servers and use this average to represent the CPU utilvalue of this sampling period. The dataset after preprocessing is shown in 4.5.We perform this operation for all of the six features within the X input data set.

In the second step, we normalize the data to minimize the variance amongdata. There are several widely used normalization functions for data processing.

33

Figure 4.4: Data Set Before Pre-processing

Figure 4.5: Data Set After Pre-processing

• Rescaling (min-max normalization):

x′ =x−min(x)

max(x)−min(x)

• Mean normalization:

x′ =x− average(x)

max(x)−min(x)

• Standardization:

x′ =x− xσ

We also used KB/s instead of B/s in this project. The reason forthis is that the data we collected from the network is various from 1to thousands or even 100 thousand between features. For example, thevalue for Disk.device.write.requests.rate is less than 5 while the value fornetwork.incoming.bytes.rate usually varies between 102 and 105. This will makesome of the coefficients quite small (the smallest value is around 10−6) whichmeans we need to use more bits to represent it in a binary format (around 20bits). If we only have a fixed number of bits to represent a value, we may loseaccuracy for these near-zero values. For example, if we only have 16 bits torepresent a value, most of the coefficients would be set to 0. Since this thesisonly deals with software-based network control and all the data is in a decimalformat, this problem will not occur in this project. However, we still use KB/sin this project in case of future hardware development and data transmission.

4.1.4 Database Operations

A MongoDB based database called ceilometer located in the controller is usedto cooperate with ceilometer to store information. All the network resource

34

data collected by the ceilometer meters will be saved in a collection calledmeter within this database. The IP address (192.168.0.103) and password ofthe database are defined in ceilometer configuration file. We can connect to thisdatabase through an ssh tunnel by using this information.

After the tunnel has been established, we can select the data we need fromthis database with a MongoDB Shell Script. For example, we can retrievecpu util by using the commands shown below:

1 db . meter . f i n d ({ counter name : ” c p u u t i l ” ,2 p r o j e c t i d :”05825 ebfbc60439b836886199af593ea ”})3 . s o r t ({ timestamp :−1})

In the above, project id is used to select data for a particular tenant. ”05825e-bfbc60439b836886199af593ea” is the ID of the autoManage tenant, thus theCPU utilization data only comes from web servers.

A MongoDB Shell Script can also be used to process the data. We retrievethe average value for different features during each sampling period using thecommand shown below. The value of counter name should be changed accordingto the feature type whose average value we want.

1 var c p u u t i l=db . meter . f i n d ({ counter name : ” c p u u t i l ” ,2 p r o j e c t i d :”05825 ebfbc60439b836886199af593ea ”})3 . s o r t ({ timestamp :−1})4 c p u u t i l . forEach ( func t i on ( doc ){db . temp data . i n s e r t ( doc )} )5 db . g e t C o l l e c t i o n (” temp data ” ) . aggregate ( [6 {7 $ p r o j e c t :{ new time stamp :{ $subst r : [ ” $timestamp ” ,0 , 15 ]} ,8 counter name : 1 , u s e r i d : 1 , r e s o u r c e i d : 1 ,9 timestamp : 1 , counter volume :1}

10 } ,11 {12 $match : {new time stamp :{ $gt : ”2018−xx−xx”}}13 } ,14 {15 $group : { i d : ” $new time stamp ” , count : {$sum : 1} ,16 avg :{ $avg : ” $counter volume ”}}17 } ,18 {19 $out : ” autoSca l ing ”20 }21 ] )

4.1.5 How Resource Data Changes Over Time

After collecting the data and pre-processing this data, we analyze how networkresource utilization changes with time for different HTTP sending rates. Afterthis, we will decide what machine learning model should be used for the new

35

auto-scaling mechanism. Figure 4.6 to Figure 4.11 show how the pattern changesfor different network resources for the SMOOTH-load pattern.

Figure 4.6: Change Pattern forCpu util

Figure 4.7: Change Pattern forNetwork Incoming Rate

Figure 4.8: Change Pattern forNetwork Outgoing Rate

Figure 4.9: Change Pattern forMemory Usage

Figure 4.10: Change pattern ofDevice Write Rate

Figure 4.11: Change pattern ofnumber of Web Servers

Disk.device.write.bytes.rate and Disk.device.write.requests.rate are two sim-ilar features, so we only plot one of them. As we can see from the figuresall of the following have a pattern of change that similar to the client sendingrate: cpu util , Network.incoming.bytes.rate, network.outgoing.bytes.rate andDisk.device.write.bytes.rate have a similar pattern to client sending rate.However, Memory usage shows a different pattern of changes, as it remainsat a high level even when clients have stopped sending requests. This occurs

36

because the CPU usually takes some time to release unoccupied memory. Thisproperty(of delayed memory release) will be useful for linear regression and willbe introduced in detail in Section 4.2.2.

4.2 Offline Model Training

Two machine learning models are considered for offline training: Neural Networkand Linear Regression. The reason for choosing these two models lies in thefollowing two aspects:

1. These two machine learning models are well developed and offer powerfulfunctionality. Neural Networks are widely used in classification problems,while linear regression deals with problems where there is a linearrelationship between parameters. We want to compare how these twotypes of machine learning model (classification model versus regressionmodel) work for the auto-scaling problem.

Although the number of the web servers is an integer between theminimum and the maximum number of servers in the system, we see asomewhat linear relationship between network resource data and numberof servers in the change patterns of the network resource data shown inSection 4.1.5. However, auto-scaling cannot be considered as a simpleclassification problem as the number of web servers in the pool may scaleto an unknown value, as long as we have enough resources in the serverpool. For example, if we set the maximum number of servers in the poolto 5 during the training step, for a classification method, then we willhave five possible classes for output. While for linear regression we have alinear relationship between the input network data and the output. Whatif we change the maximum number of servers to 10 for online testing? Theclassification model can only provide five possible outputs unless we re-train the model, while linear regression can easily scale to 10 depending onnetwork traffic. Methods such as round up and round down can be usedto make the output an integer according to the system’s requirement. Forexample, if the system is resource sensitive, the round down method wouldbe considered. Although round down method will lead to an increase infailures to be SLOs, for some systems with moderate SLOs, this would notbe a big problem. Several tests should be conducted for a better choiceof approximate method in order to achieve a balancer between differentrequirements.

2. Both machine learning models are light-weight and have a simple structurewhich means they will not consume too much computing resources.Although some other machine learning or deep learning models offer betterperformance, they are too complex and may introduce a lot of delays.

37

4.2.1 Neural Network Structure

Figure 4.12 shows a very simple neural network structure with one input layer,one hidden layer, and one output layer. The input data set (x1, x2, ..., xn) ofthis model is the X input data set we have introduced in Section 4.1.2 and Yis the output of the model and represents the number of web servers requiredin this system. Each neuron in the output layer represents one possible outputof the model, and the value of this neuron represents the probability of thisoutput. For example, if the value for output 1 is 0.8, this means this systemhas an 80% chance to have only one web server running. We take the maximumvalue among all the output neurons as the output(Y) of the system. The outputY can be used to evaluate the system or to generate a control signal for auto-scaling of the web servers.

Figure 4.12: Neural Network Structure

The number of neurons within the Input Layer and Output Layer aredetermined by some features within the X input dataset and the maximumnumber of web servers in the system, respectively. In the data collection step,we set the maximum number of web servers in the system to 5. After training,the weights between layers determine the output of our online test engine andgenerate the control signal.

38

4.2.2 Hidden Layer Structure

Since the number of neurons for Input Layer and Output Layer is determinedas per the previous subsection, we need to carefully consider how many neuronswe need in hidden layer for better performance. Testing of different hidden layerstructures was conducted along with parameter tuning, and the results will beshown in the next subsection.

4.2.3 Parameter Tuning

Before the training process, there are several parameters that need to bedetermined, such as the batch size and training epochs. The batch size isthe number of samples that are propagated through the neural network. Onetraining epoch is one forward pass and one backward pass of all the trainingexamples. These parameters could affect the model’s accuracy, hence theyshould be considered carefully. We can use Grid Search[39] for parameter tuning.We did grid search once for each possible structure (the number of neurons inthe hidden layer is 3, 4, 5, or 6) and combined this with testing for differenthidden layer structures. The results are shown in Tables 4.1 to 4.4.

Table 4.1: Parameter Tuning(3 neurons in the hidden layer)

Epochs

Error Batch1 2 5 10 22 25

10 0.22260274 0.17465753 0.25684932 0.6952055 0.9589041 0.297945220 0.91780822 0.25684932 0.21917808 0.9589041 0.2602740 0.260274030 0.25000000 0.25684932 0.25684932 0.7226027 0.2294521 0.996575340 0.18150685 0.17465753 0.19863014 0.9178082 0.9143836 0.256849350 0.17465753 0.86301370 0.19863014 0.9965753 0.1986301 0.996575360 0.25684932 0.85273973 0.25684932 0.1815068 0.2568493 0.383561670 0.21917808 0.67808219 0.18150685 0.1986301 0.5171233 0.352739780 0.21917808 0.22945205 0.92808219 0.1986301 0.2979452 0.914383690 0.17465753 0.25684932 0.25684932 0.2568493 0.6541096 0.2089041100 0.42808219 0.17465753 0.17808219 0.2568493 0.1746575 0.1986301150 0.19863014 0.29109589 0.27397260 0.1986301 0.3801370 0.2910959200 0.17465753 0.25684932 0.25342466 0.9178082 1.0000000 0.9965753

As we can see from these tables, we achieve the smallest error rate withmany different configuration. The higher the epoch number is, the more trainingtime and computing resource will be required. Therefore, we want to choose aparameter combination with the smallest error and smallest number of epochs.For this reason, we will use a 2-layer Neural Network with six inputs, fiveoutputs, and four neurons in the hidden layer. For the epoch and batch size,we set the number of epochs to 20 and batch size to 1 for model training, thus

39


Epochs


10 0.23972603 0.18150685 0.19863014 1.0000000 0.5958904 0.897260320 0.99315068 0.18150685 0.26027397 0.9212329 0.2568493 0.294520530 0.21917808 0.17465753 0.21917808 0.2945205 0.2568493 0.202054840 0.18150685 0.18493151 0.35273973 0.2568493 0.4315068 0.256849350 0.18150685 0.18150685 0.98630137 0.1746575 0.1986301 0.256849360 0.25684932 0.21917808 0.32191781 0.1986301 1.0000000 0.256849370 0.24657534 0.17465753 0.17465753 0.8938356 0.2568493 0.914383680 0.92123288 0.25684932 0.91780822 0.9143836 0.1746575 0.174657590 0.26027397 0.29109589 0.25684932 0.1746575 0.3561644 0.2568493100 0.19863014 0.65068493 0.17465753 0.3356164 0.9965753 0.2568493150 0.18493151 0.26369863 0.18493151 0.2397260 0.2397260 0.2397260200 0.26369863 0.28424658 0.26712329 0.3219178 0.3219178 0.2397260


Epochs


10 0.958904 0.256849 0.256849 0.914384 0.174658 0.65753420 0.178082 0.325342 0.198630 0.174658 0.825342 0.91095930 0.198630 0.219178 0.958904 0.941781 0.294521 0.91438440 0.219178 0.260274 0.256849 0.202055 0.917808 0.91438450 0.955479 0.198630 0.284247 0.301370 0.256849 0.91438460 0.311644 0.229452 0.256849 0.198630 0.958904 0.26027470 0.256849 0.256849 0.311644 0.342466 0.267123 0.26027480 0.921233 0.219178 0.174658 0.174658 0.297945 0.32534290 0.174658 0.256849 0.767123 0.667808 0.256849 0.845890100 0.174658 0.236301 0.198630 0.243151 0.332192 0.195205150 0.205479 0.198630 0.198630 0.174658 0.181507 0.284247200 0.198630 0.294521 0.633562 0.914384 0.178082 0.260274

40


Epochs


10 0.256849 0.256849 0.246575 0.914384 0.893836 0.25684920 0.174658 0.256849 0.250000 0.174658 0.345890 0.25684930 0.181507 0.198630 0.476027 0.256849 0.993151 0.75000040 0.198630 0.178082 0.198630 0.506849 0.393836 0.93835650 0.198630 0.260274 0.219178 0.260274 0.743151 0.19863060 0.212329 0.222603 0.198630 0.352740 0.256849 0.34589070 0.219178 0.181507 0.883562 0.202055 0.914384 0.69178180 0.174658 0.256849 0.198630 0.363014 0.260274 0.21232990 0.256849 0.178082 0.198630 1.000000 0.198630 1.000000100 0.253425 0.229452 1.000000 0.256849 0.198630 0.178082150 0.174658 0.256849 0.250000 0.174658 0.674658 0.321918200 0.219178 0.174658 0.924658 0.328767 0.250000 0.256849

the smallest error rate is 0.174658.

However, as we can see from the results that the Neural Network basedmachine learning model is quite unstable and it generates a lot of terrible resultswhich means this model might be unsuitable for online server scaling. Thisconclusion can be further proved by the online testing described in Section 5.5.During the research, we also tried some other methods such as feature electionto achieve better results. However, the results have not been improved as eachof these alternatives generated a lot of unstable states. Thus, we will not godeeply into these tests in this thesis.

4.2.4 Linear Regression

The linear regression model that we have used is based on the equation:

Y = b0 + b1x1 + b2x2...+ bnxn

The data set X (x1, x2, ..., xn) is the same as X input data set we have introducedin Section 4.1.2. Y is the number of web servers at the next timestamp, i.e.,how the system will react to the current network data. We separate the dataset into a training set (70%) and testing set (30%). After model training, welearn a set of achieve coefficients (b0, b1, ..., bn) of this model (b0 is the bias).

Estimation the model’s error is computed over the test set to evaluate themodel. We define the estimation error as the Mean Absolute Error (MAE):.

Where yi is the real output Y in the test data set. yi is the estimated outputgenerated by the model and we use some method (such as round off) to change itto an integer. As we have stated in Section 4.1.5, Memory usage has a differentpattern than the other five features, hence using this feature results in a higher

41

error rate. Therefore, we only consider five features in our dataset(Table 4.5).

Table 4.5: Data Set for Linear Regression ModelX

cpu util CPU utilizationNetwork.incoming.bytes.rate Rate of incoming bytes (KB/s)Network.outgoing.bytes.rate Rate of outgoing bytes (KB/s)Disk.device.write.bytes.rate Rate of writes to disk (KB/s)Disk.device.write.requests.rate Rate of write requests (requests/s)

After training, the coefficients of linear regression model are shown in Table4.6. The computed MAE is 0.1519.

Table 4.6: Coefficients for Linear Regression Modelcoefficients Features Valueb0 Bias 0.5125438937945870b1 Cpu util -0.0089242680215350b2 Network.incoming.bytes.rate 0.0099030963409060b3 Network.outgoing.bytes.rate -0.0004226255823199b4 Disk.device.write.bytes.rate 0.0005697258887978b5 Disk.device.write.requests.rate -0.0317249167219835

As we can see from the table some coefficients for features are very smallsince Disk.device.write.requests.rate is a very small value, hence when multipliedby an even smaller value results in a near-zero value which makes littlecontribution to the results.As illustrated before, this project only considerssoftware-based network control and data is represented in a decimal formatetransmitted through APIs which means no accuracy would be lost during thewhole process. Thus, all the digits are used during computation. However,using all digits will lead to the computational complexity especially for hardwaredesign. Considering this fact, data approximation is worth investigated and wewill leave this work (like testing on how many bits should we keep) to the future.

42

Chapter 5

Online Testing and ResultAnalysis

This chapter provides the results of online testing along with detailed resultanalysis. Section 5.1 gives a preview of the network performance if we onlyhave one web server available in the network. Section 5.2 investigates how asystem will behave if network traffic is extremely high. Section 5.3 introducestwo client request pattern with light traffic which will be used in the followingtesting. Section 5.4 gives an introduction to the Online Control Engine designedfor online testing. Then, Section 5.5 discusses which machine learning model weare going to choose for online testing. Next, Section 5.6 describes testing resultsfor the scaling policy in a heavy traffic scenario, while Sections 5.7 shows theresults in a light traffic scenario. Results are analyzed at each step of testingand model comparisons are made in Section 5.6 and 5.7.

5.1 Single Web Server

Before online testing, we will give a preview of what network traffic will look likeand how the system will behave if we only have one web server working in thenetwork, especially when the network traffic is heavy. To test the performanceof single web server, we generate a load pattern with request rates changes from2,000 req/s to 5,000 req/s. Figures 5.1 to Figure 5.4 shows how different typesof network data changes for this pattern of load.

As we can see from the figures that at timestamp 15:32 when the HTTPrequest rate is 3,600 requests per second, the system enters a system’s overloadedstate. Even if we keep increasing the request rate on the client side, the requestsrate on the Load Balancer Backend which connects to the web server do notincrease. The system is in a state where the response time from the web serverbecomes extremely large and a lot of requests are dropped which results in alarge amount of error responses. This situation is mainly due to the web serverbeing overloaded and should be avoided in the real world by applying auto-scaling.

44

Figure 5.1: Request Rate for a SingleWebserver

Figure 5.2: Response Time for a SingleWebserver

Figure 5.3: Accumulated Error Re-sponses for a single WebServer

Figure 5.4: CPU Utilization for aSingle WebServer

Another finding from these results lies in the CPU Utilization aspect. Fromtimestamp 15:25 to 15:32 in Figure 5.2 and Figure 5.4, even though the CPUutilization is near 100%, the response time remains at a very low level. That isto say, a high percentage CPU utilization does not necessarily mean the systemwas overloaded. As heat orchestration usually realize auto-scaling based ona single resource value such as average CPU utilization, the resource may bewasted if we want to achieve a given Service Level Agreement(SLA) based onresponse time. The machine learning based auto-scaling model is likely to havethe same problem, as it only analyzes the network resource data on the serverside. Considering this fact, a response time based auto-scaling mechanism willbe investigated and a comparison made with the other two auto-scaling methodsfor heavy traffic.

To summarize, the performance testing for single web server gives us anoverall view of the capacity for one such web server to handle incoming requests.In this system, the testing confirms that each such web server has the capacityto handle approximately 3,600 requests per second,

45

5.2 Heavy Network Traffic with Unlimited WebServers

In this section, we will investigate how the system will behave if network trafficis extremely high. Unlimited web servers mean we can add as many web serversas we want. One questions will be answered: Can we achieve better performanceif we keep adding web servers to realize parallel computation for extremely highrequests rate?

The answer has been given in several papers. In [40], the relationshipbetween throughput and the rate of Service Level Objectives (SLO) violationshave been given. As the offered load keeps increasing, a cluster’s saturation pointwill be reached. Beyond the saturation point, the relationship between SLOviolations rate and throughput become non-linear which means with a smallincrease in request rate the SLO violations rate will increase a lot and queuesstart to appear and grow. Delimitrou Christina and Kozyrakis Christos[41] alsoreach a similar conclusion, when the load reaches a saturation point the systemlatency will increase exponentially after that point. Systems’ bottleneck maylie in many aspects and this is worth investigating since it is very importantfor the system developer to understand a system’s limitation and avoid crashes.Considering this non-linear behavior, heat orchestration will be used to realizeauto-scaling for testing rather than linear regression. Figures 5.5 to 5.2 showsthese testing results.

As we can see from the figures, at timestamp 22:52 when the real HTTPrequest rate is 8,100 requests per second, the system enters a crash state. Theresult looks similar to the crash state for a single web server where the responsetime is extremely high and the number of error responses becomes very largewhen the client request rate continues to increase. Unlike the single web serverscenario, in Figure 5.9, the average CPU utilization remains at a lower levelafter the crash which means web servers are not overloaded for this situation.Given that we only have one load balancer in this system and average CPUutilization remains at a very high level (nearly 100%) (as shown in in Figure5.10), the bottleneck in this scenario lies in load balancer. That is to say, oneload balancer in our system has the capacity to handle approximately 8,000requests per second. In Figure 5.8, we find out that after the crash point, theload balancer keeps creating new sessions and try to reconnect to clients andweb servers which further increases the burden on the load balancer. Figure5.2 shows us the auto-scaling results by using heat orchestration (the scalingup threshold for cpu util is 60% and the scaling down threshold is 30%). Themaximum number of web servers is 9 and the average CPU utilization of all theinstance is 60.773%. The average and maximum response time of web serverbefore the crash point is 9.8919ms and 21ms, respectively.

It is very important to understand the systems’ crash point as it makes nosense to add more servers after that point. This means that other methods suchas adding more load balancers or improving each load balancer’s performanceby using larger and more powerful virtual machines should be considered. Asdescribed in [41], services with moderate latency requirements such as a web

46

Figure 5.5: Request Rate for Unlim-ited Webservers

Figure 5.6: Response Time forUnlimited Webservers

Figure 5.7: Accumulated Error Re-sponses for Unlimited Webservers

Figure 5.8: Number of Sessions forUnlimited WebServers

Figure 5.9: Average CPU Utilizationvs Request Rate for Unlimited Web-Servers

Figure 5.10: Average CPU Utiliza-tion of Load Balancer

47

captionNumber of Web Servers

server should balance between single-thread performance (with more powerVMs) and multi-process performance (request-level parallelism).

5.3 Client Request Pattern with Light Traffic

Two Client Request Patterns (SMOOTH-load pattern and SPIKEY-load pat-tern) with light network traffic are used for online testing to mimic users’behaviors in the real world. The reason these two patterns have been chosenis that users’ behaviors usually follow some simple rules most of the time, butwith some unexpected actions from time to time. These unexpected actionsmay result in a sudden increase or decrease in network traffic during a shortperiod. We believe that testing with these two patterns should cover most ofthe scenarios in our daily lives.

5.3.1 SMOOTH-load Pattern

In this pattern, the HTTP requests rate starts at a small initial rate and thenincreases smoothly over time until it reaches a maximum value. After that, itdecreases smoothly until the request rate has returned to the initial rate. Therequest rate will increase or decrease by a rate (within the range of 0 to 40)for each loop. We can consider this SMOOTH-load pattern as one loop of theclient request pattern introduced in Section 4.1.1, but only lasting for 8 hours.Figure 5.11 shows this pattern..

5.3.2 SPIKEY-load Pattern

This pattern shows a flash event. During the flash event, the HTTP requestrate will suddenly surge to a very high value and then drop back to the originalvalue in a short time. The request rate will increase or decrease by a rate(withinthe range of 0 to 150) for each loop. As we can see in Figure 5.12, there was

48

Figure 5.11: SMOOTH-load Pattern for testing

a flash event around timestamp 50. During this period, the client request ratesuddenly increased to 4,500 req/s and then dropped back to 2,000 req/s. Thewhole test lasted for 3 hours.

Figure 5.12: SPIKEY-load Pattern for testing

49

5.4 Online Control Engine

In this section, we describe how we created the online control engine. This onlinecontrol engine utilizes the Nova service to create or to remove server instances.This engine consists of two services:

• The Auto-scaling service is used to scale up or down the number ofweb servers based upon the control signal generated by Signal Generatingservice.

• The Signal Generating service analyzes the real-time data regardingthe network resources and generates a control signal.

These two services are written in python and cooperate with OpenStack services(Nova and ceilometer) through the os.popen function by using OpenStackcommand lines.

5.4.1 Auto-scaling service

This service has a user-defined interface which periodically receives a controlsignal from machine learning service. Depend on the scaling policy, a controlsignal may contain different information(how many web servers are needed atthe moment or whether to scale up or do nothing).

For the first type of signal, once the auto-scaling service receives the signal,it compares the current number of servers in the system with the expectednumber and makes them consistent. If the current number is smaller thanthe expected number, then a Nova instance creation service is called to createadditional web servers using the command shown below. The mydata.file is usedto start HTTPd web service once web server has been created. The content ofmydata.file is the same as the commands shown in section 3.2.4.

1 ’ openstack s e r v e r c r e a t e −−f l a v o r m1. smal l \2 −−image CentOS \3 −−n i c net−id =053c1a03−c4f6 −4995−907e−49f435693c3a \4 −−s e cu r i t y−group d e f a u l t \5 −−key−name p r i v a t e \6 −−user−data mydata . f i l e webServer ’

If the current number of servers is larger than the expected number, then aNova instance delete service will be called to remove web servers by using thecommand shown below:

1 nova d e l e t e + ’ webServer id ’

The webServer id is achieved by ’nova list’ command.

For the second type of signal, only scale up actions will be considered andthis function will only be used for testing the heavy traffic pattern where theclient request rate continually increases until the crash point is reached. Onreceiving the scaling up signal from the signal generating service, the system

50

will launch one more web server. The steps for server creation are the same asthe first web server instance.

5.4.2 Signal Generating Service

This service is used to generate a control signal for auto-scaling servicesdepending on different scaling policies. Two policies are considered in thisproject. They are a machine learning based auto-scaling policy and a responsetime sensitive policy.

For the machine learning based auto-scaling policy, the Signal GeneratingService receives real-time network data from the ceilometer service by usingthe ’ceilometer statistics’ command. The ’ceilometer statistics’ commandperiodically provides statistical values, such as sum, average, maximum,minimum, and so on, for each meter. The meters used are the same as thosewe used for model training. The period is again 3 minutes (the same as theceilometer sampling period). We input this real-time network data into ouralready trained machine learning model (from the offline training step). Finally,a control signal is generated by this machine learning model and sent to the auto-scaling service.

For the response time sensitive policy, network data is provided by HAproxyand frontend clients. We set an acceptable latency for the system as 30ms[40]on the client side. Once this threshold has been reached, a scaling up signal willbe generated and sent to auto-scaling service

5.5 Model Chosen

Initially, we run some small tests (result shown in Figure 5.13 and Figure 5.14)by using a client pattern with large initial rate and short testing time (less thanone hour). We found that although neural network could provide an acceptableerror rate for offline training with a large data set (where the system had runfor five days), it showed unexpected results (the number of the server alwaysremains 1) for testing with a different pattern and short testing times. Thismeans that this neural network is unsuitable for online auto-scaling. In contrast,the training and testing results using linear regression were more consistent andreasonable.

Secondly, we believe that the control engine could be improved by applyingonline training in the future. There are many online machine learning algorithmsavailable, such as online lasso regression (a linear regression based model) andwe believe that these algorithms have a great potential to improve the results.

To summarize, because linear regression has a more reasonable performanceand we anticipate that it will be easy to improve in the future (especially withonline training), we choose a linear regression based control model for furtheronline testing.

51

Figure 5.13: Neural Network BasedAuto-scaling

Figure 5.14: Reponse time for NeuralNetwork Based Auto-scaling

5.6 Scaling for Heavy Traffic

As we have described in Sections 5.1 and 5.2, the system will behave in a non-linear way when approaching the crash point. Two methods, Heat Orchestrationbased policy and Response Time Sensitive policy were considered to scale theservers before the crash point in a heavy load pattern. The machine learningbased auto-scaling was investigated and it had bad performance. As we want toinvestigate how the system will scale when network traffic keeps increasing untilthe crash point, we only consider scaling up. The SLO we used in this projectas the acceptable service latency was 30ms.

5.6.1 Heat Orchestration Based Policy

From Section 5.1 we know that low latency (small response time) could beachieved even when the CPU utilization of web server is near 100% which meanscpu util based auto-scaling is not an effective way if we want to satisfy the statedSLO. Considering this, we set the scale up threshold for cpu util as 90% in theheat orchestration based scaling policy. Figures 5.15 to 5.18 shows the auto-scaling results.

As we can see from Figure 5.15 and Figure 5.16, the system scales when theclient request rate increases. The maximum and average number of web serversin use is 5 and 3, respectively. The average CPU utilization of all the instanceis 73.89%.

Figure 5.17 and Figure 5.18 show the response time in total and for a singlethread (the data comes from the web server which works from beginning to theend), respectively. By looking into the detailed data, we find that when thesystem is lightly loaded with fewer servers (before timestamp 08:47), the totalresponse time drops at the scale-up point. This phenomenon is more apparentfor the single thread response time. For heavy traffic with more web servers,adding one server makes very little improvement in the system’s performance.Although parallel processing could help to speed up the system, more serversdoes not mean lower response time. This occurs because the threads are not

52

Figure 5.15: Hear OrchestrationBased Auto-scaling with Heavy Traf-fic

Figure 5.16: Average CPU Utiliza-tion for Hear Orchestration BasedAuto-scaling

Figure 5.17: Total Response timefor Hear Orchestration Based Auto-scaling

Figure 5.18: Single Thread Responsetime for Hear Orchestration BasedAuto-scaling

totally independent and they compete for network resource which may addoverhead to the system. The average overall response time is 10.75ms andfurther statistics of the response time is showed in Figure 5.19.

5.6.2 Response Time Sensitive Policy

As discussed earlier, the system performs with low latency until it hits the crashpoint. After the crash point, the system will exhibit extremely high latency.From Section 5.1, we also know that crash point reflects the maximum processingrate that one server could handle. In order to meet SLO and to make full use ofthe web servers, we propose a Response Time Sensitive Policy to scale up theservers and to detect the crash point.

The scaling mechanism is based on the online control engine described inSection 5.4. Figure 5.20 to Figure 5.23 shows the auto-scaling results. Thethreshold of response time for scaling is 30ms which maintains the SLO. We cansee from the results that at timestamp 02:07, the request rate at backend startsto decrease and response time starts to increase sharply which means the crashpoint is hit. We will ignore the data collected after the crash point.

In Figure 5.20, compares heat orchestration based auto-scaling with Re-sponse Time Sensitive Auto-scaling, and the later is shown to react slower. The

53

Figure 5.19: Statistic Data of Response Time for Each Server

maximum and the average number of web servers in use (before crash point) is 4and 2, respectively. In Figure 5.21, we can see that the average CPU utilizationof the web server remains at a very high level. The overall average of CPUutilization is 83.27% which means that most of time the web server works atfull capacity.

Similar to the previous testing, Figure 5.22 and Figure 5.23 show theresponse time in total and for single thread, respectively. For Response TimeSensitive Auto-scaling with fewer web servers, the effect of adding one serveron response time is much larger. We consider the suddenly increasing anddecreasing point in these two figures (before the system’s crash point) as a localcrash point. By adding one more server at this point, the system returned tothe normal state resulting in a sharp decrease in response time. The averageoverall response time is 13.26ms and more statistics of the response time areshowed in Figure 5.24.

To summarize, (1) the average response time for Heat Orchestration auto-scaling is very close to the average time for Response Time Sensitive auto-scaling. (2) However, the Response Time Sensitive auto-scaling uses fewerweb servers than heat did which means resources could be saved. (3) 4%of response time has violated the SLO for heat based auto-scaling while forResponse Time Sensitive auto-scaling, the violation rate is 10.7%. If the systemis sensitive to the latency, Heat Orchestration auto-scaling would be a betterchoice. Considering the fact that web service (like web searching) has moderatelatency requirement, Response Time Sensitive auto-scaling would bring a lotof benefits. For these web services, the SLOs can even be set to 1s, then theviolation rate will turn to be 0 for this simple system by using Response Time

54

Figure 5.20: Response Time SensitiveAuto-scaling with Heavy Traffic

Figure 5.21: Average CPU Utiliza-tion for Response Time SensitiveAuto-scaling

Figure 5.22: Total Response timefor Response Time Sensitive Auto-scaling

Figure 5.23: Single Thread Responsetime for Response Time SensitiveAuto-scaling

Sensitive auto-scaling.

5.6.3 Machine Learning Based Policy

Since the system will behave in a non-linear way when the network traffic is veryheavy, it is predictable that machine learning based auto-scaling will have badperformance. Figure 5.25 and Figure 5.26 show the testing results. Althoughmachine learning based auto-scaling could scale the backend servers when therequest rate keeps increasing, it wastes too many resources. The maximum andthe average number of web servers in use is 10 and 5, respectively. However,the average response time is 7.310ms which is slightly shorter than the previoustwo methods.

The latency decreases slightly at the cost of more idle web servers whichmakes little sense in a cloud computing based service. Cloud computing followsa pay-as-you-go model where customers save money by using fewer servers.Moreover a web service is not a latency-sensitive service which means latencycan be accepted as long as it does not violate SLO.

55

Figure 5.24: Statistic Data of Response Time for Each Server

Figure 5.25: Machine Learning BasedAuto-scaling with Heavy Traffic

Figure 5.26: Response Time for Ma-chine Learning Based Auto-scaling

5.7 Scaling for Light Traffic

In this scenario, both scale up and down will be considered. We implementedtwo auto-scaling mechanisms, heat orchestration based and online control enginebased, and tested them with the two client requests pattern we introducedearlier. We compare what system reactions these two mechanisms produce. Inall tests, the sampling period is 3 minutes. Depend on the testing described inSection 5.6, five web servers will be sufficient for this system. Thus we set themaximum number of the web servers to 5 in order to achieve better performance.

56

5.7.1 Result on SMOOTH-load Pattern

As we can see in Figure 5.27, when the clients’ request rate changes smoothly,the online control engine and heat orchestration have similar performance.

Figure 5.27: The results for SMOOTH-load Pattern

Figure 5.28 and Figure 5.29 show us the request rate at the load balancerand response latency for both of these two auto-scaling mechanism. These twoauto-scaling mechanisms have similar server behavior with average responsetime 2.07ms for heat orchestration based and 2.13ms for machine learning based.With similar latency, we can also see from Figure 5.27 that the machine learningbased online control engine reacts slower when it should increase and faster whenit should decrease the number of web servers, thus potentially saving resources.Although the average number of the server in use for online control engine andfor heat orchestration are the same after round off(3.13 and 3.25, respectively),we consider the machine learning based online control engine outperformed theheat orchestration in resource savings.

From these results, we also see that during the scaling down process, therequest rate at the Load Balancer Backend (directly connected to web servers)will have a sudden drop at each scaling point. Through a deep analysis ofnetwork traffic, we found that at each scaling down point the Load Balancerdropped a lot of incoming packets which causes large of a amount error requestsuntil the scaling progress has finished. The reason for this situation is thatat the scaling down point, the instance which received shut down signal haslost connection to load balancer immediately which lead to mishandling of therequests for the load balancer. This is not good for a system, as the clientswould be unsatisfied if their requests have been dropped. Therefore, frequentlyscaling up or down should be avoided.

57

Figure 5.28: Response Lantencyfor Heat Orchestration Based Auto-scaling

Figure 5.29: Response Time for Ma-chine Learning Based Auto-scaling

5.7.2 Scaling Policy Based on Deep Network Analysis

The previous sections gave a deep analysis of the system and network we usedin this project. In Section 5.1, we found out that one web server could supportat most 3,600 requests per seconds. In Section 5.2, we found out that one LoadBalancer could support at most 8,100 requests per seconds. In Section 5.6,we found that even for heavy traffic, 4 web servers were sufficient to achieveacceptable performance with efficient resource utilization and low latency.

Considering the fact that the maximum request rate is around 5,000 requestsper second for the SMOOTH-load Pattern, we could use fewer web servers thanwe expected. Figure 5.20 in Section 5.6.2 provided us a better solution. As wecan see from the figure that 2 web servers would be sufficient to support a clientrequest rate less than 6,000 requests per second.

Based on these findings, we implement a scaling policy with a maximum 2web servers in the system and set 3,500 requests/s as the threshold to scale.Thus, the system will scale up to 2 web servers when the client requests rate isgreater than 3,500 requests/s and scale down to 1 web server when less. Thetesting result are shown in Figure 5.30 and Figure 5.31.

As we can see from these results the maximum and the average numberof web servers in use is 2 and 1, respectively. The average response time is4.2506ms. The performance is pretty good with only two servers thus it couldsave a lot of resources.

However, this scaling policy is not a general method and is based on deepnetwork analysis which means developers need to have a full view of the networkin advance. It is not easy for network developers to fully understand theperformance of a telecom cloud since the network structure has become moreand more complex in the real world. Although this scaling policy could provideus with great performance, it is not a general way to solve auto-scaling problemsfor all kinds of networks. On the other hand, since it is difficult to collect dataon the client side (for example, client-side latency), most of the auto-scaling

58

Figure 5.30: Scaling Policy Based on Deep Network Analysis

Figure 5.31: The results for SMOOTH-load Pattern

algorithms are intended to be triggered by using the backend data.

5.7.3 Result on SPIKEY-load Pattern

Unlike the case for the SMOOTH-load Pattern where the system has a highrequirement for efficient resource utilization, SPIKEY-load Pattern requires thesystem exhibit fast reaction. Considering the fact that system would easilyoverloaded at the flash point with a suddenly increase in incoming traffic, fastreaction could ensure the system remains in a stable state. The flash point hereis different from the crash point of the system. At the flash point, the system

59

should remain stable as long as we can scale up servers quickly.

The testing for the SPIKEY-load Pattern shows a completely different resultcompared with SMOOTH-load Pattern. In Figure 5.32, starting at around 51minutes the client request rate suddenly increases to 4,500 requests/s. Although,from the detailed data we find that the machine learning based online controlengine could react (at 48 minutes) a little bit ahead of the changes in the requestrate (the exact sampling point is at 50.7833 minutes), this is not because of theimplementation of the online control engine. The reason for this is that dataare collected in a different way. We collected the client request rate at theclient side in a loop and recorded the timestamp for each loop. The samplingperiod is different for each loop based on the request rate and the total numberof requests the client intends to send. The number of servers is collected atthe server side based on the Nova service and the sampling period has beenset to 3 minutes. Although the clock is synchronized, we can only tell that theflash event happens between 47.7500 minutes (the previous sampling point) and50.5333 minutes.

Based on this fact, we could consider that the machine learning based onlinecontrol engine reacted at the time of the changes, while the heat orchestrationbased engine usually has some delay. The reason for this is twofold.

Firstly, during the training step, we use the number of the servers in the nexttimestamp as the predicted number of the server at the moment (Y dataset)to train the model, thus the machine learning based engine knows how manyweb servers are needed at the moment and can change directly to that state.However, because heat orchestra depends only on CPU utilization, there alwaysexists some delay before the system reacts to changes. For example, assume thatthe client request rate increases sharply at timestamp 1. After this happens,the web servers need to deal with more incoming requests which will causethe CPU utilization to increase. However, the system needs to wait for anotheralarm period, as changes in CPU utilization are only seen at the next timestamp(timestamp 2). Now if at timestamp 2, the CPU utilization exceeds a certainthreshold this would raise the alarm and initiate the scale-up function. For thisreason, the whole scaling process introduces a delay before the system can reactto the change in the client request rate.

Secondly, unlike the heat based auto-scaling engine which adds or removeonly one web server at a time, the online control engine can add or removeseveral web servers at one time, thus it can change the number of web serversin the system directly to the expected number. To see this we can zoom in tolook at the change pattern between timestamp 45 and timestamp 90 in Figure5.32 to have a more detailed look at how the number of servers changes duringthe spike. As we can see in Figure 5.33, at timestamp 48, the online controlengine has recognized the change in clients request rate and directly changedthe number of the web servers from 3 to 5. However, at timestamp 57, the heatorchestration engine starts to realize that there has been an increase in clientrequest rate and changes the number of the web server in two steps, from 3 to4 and then to 5.

60

Figure 5.32: The results for SPIKEY-load Pattern

Figure 5.33: Zoom in Figure for Online control engine and Heat Orchestration(at SPIKEY point)

However, as we can see from the results that machine learning based onlinecontrol engine is more likely to enter an unstable state with frequently scalingup and down events. Heat orchestration based auto-scaling will also have thisproblem. We can see at the beginning of the scaling process, the system scales up

61

and down and then up again during a short period. This phenomenon is calledthe ping-pong effect and was introduced in [42]. When the CPU utilizationincreases it triggers the scaling up alarm which lead to a scaling up event.Once a new server has been added to the system, the average CPU utilizationwill decrease and when it falls below the threshold for the scaling down alarm,another scale down event is triggered. After removing one server, the averageCPU utilization will increase again. The scaling up and down cycle repeats.

The testing experience stated in [43] have shown us the bad effects onclient-side latency that the ping-pong phenomenon could bring to us. Also,as we have described in the previous testing, this phenomenon will resultin a bad performance for the system due to the number of packet dropped.This phenomenon should be avoided in the future by using more powerful andsophisticated scaling mechanisms.

Based on the previous testing, we know that 2 web servers will be sufficientfor this light loaded system (as the maximum request rate is around 5,000requests per second) which means fast reaction is unnecessary for this system(from 3 to 5). However, this testing was still valuable as provided us with apossibility to implement this method in a more complex system where the fastreaction would bring a significant benefit.

62

Chapter 6

Conclusion and FutureWork

This chapter states conclusions based on the analysis of the results of testingthe proposed new online control engine and suggests future work. Firstly, someconclusions will be drawn including summarizing the results and what benefitthis project can provide to us. Then we will discuss the problem we encounteredduring the process. After that, we describe the limitations of the online controlengine. The future work section suggests how these related problems can besolved in the future. The chapter ends with some reflections on ethics andsustainability.

6.1 Conclusion

Cloud computing technology is developing very rapidly. This growth has raisedthe question of how to realize efficient resource management. In this context,the auto-scaling problem is a typical auto-managing problem which is worthinvestigating. OpenStack provided a resource auto-managing function based onthe heat orchestration service. However, the heat orchestration service onlyconsidered a single resource when deciding to generate a scale up or downsignal. Furthermore, this choice of feature may cause some delay in systemmanagement.

This project suggested a new way to realize a cloud-based auto-scalingtechnique by investigating how to use several types of network resource datatogether with machine learning. A testing environment consisting of two tenantsand one control model was built together with a data collection service and anauto-scaling service. Two different machine learning techniques were examinedfor offline training. An online control engine based on the already trained modelwas established which could realize a real-time auto-scaling function. Then adeep network analysis was conducted in order to find out the system’s capacityand bottleneck. After that several tests have been conducted for scaling undera heavy traffic load where the server exhibits non-linear behavior. These testsused both heat orchestration based and response time based auto-scaling. In

64

order to find out how many servers are actually needed. we created an scalingpolicy based on these testing and network performance analysis. This scalingpolicy is used to make a comparison with two models. Finally, a comparison wasmade and the performance measured for four different scenarios: a SMOOTH-load pattern (with heat orchestration and with machine learning based controlengine) and a SPIKEY-load pattern (with heat orchestration and with machinelearning based control engine).

The network analysis showed us the system’s performance. One web servercould support at most 3,600 requests per seconds and one Load Balancer couldsupport at most 8,100 requests per seconds. If client request rate is larger than8,100 requests per second, the load balancer was overloaded. Above that point,the system exhibited will experience extremely large latency, high response errorrate, and high session reconnection rate.

For heavy traffic, the system will behave in a non-linear way, hence linearregression based auto-scaling would provide terrible performance. In order toscale the system, heat orchestration based and response time based auto-scalingwere used and compared. The comparison showed that response time basedauto-scaling outperformed the heat based auto-scaling with fewer servers in useand acceptable latency. This testing also provided us with a clear idea of howmany web servers are actually needed in this system. From these results, wefound that 4 web servers were sufficient even for the case of the heavy trafficload. This conclusion lead to a new scaling policy in next step.

The comparison between the four scenarios showed that when the clientrequest rate changes smoothly both heat orchestration and online controlengine have similar performance. However, for the SPIKEY request pattern,the machine learning based online control engine outperformed the heatorchestration engine with faster reaction at the spike. The scaling policy basedon deep network analysis has the best performance with the least web servers inuse and acceptable latency. Based on these results, we can draw the followingconclusions:

1. Firstly, the machine learning based auto-scaling engine could tell us howmany web servers we need at the moment by analyzing various real-timenetwork data. In contrast Heat’s generating a scale up or down signal toadjust the number of servers one by one; the new control engine coulddirectly scale to the required number of servers, hence it could provideclients a faster response to changes in request rate.

2. Secondly, this testing system provides us with a means of applying closed-loop control in a simple environment. Based on these tests, we can seea great potential for implementing closed-loop control in a more complexsystem in the future.

3. Thirdly, we consider this online control engine as a lightweight controlmodel which cooperates with the Nova service by invoking several simplepython based scripts. This control engine could save users a lot of effortin comparison with Heat’s stack configuration and management.

65

4. Fourthly, although deep network analysis based auto-scaling policy couldprovide the best performance, it required a thoroughly understanding ofthe network load patterns and performance analysis which makes thismethod not easy to realize.

5. Finally, more web servers working in parallel does not mean betterperformance. For a small request with a high request rate, using fewerservers with higher performance may outperform using more servers eachwith lower performance.

6.2 Problems and limitations

As we can see from the whole of the project, although this new auto-scalingmechanism can provide some benefits in server auto-scaling (especially forthe SPIKEY-load pattern), a lot of effort is required for data collection, pre-processing, and model training. Too many parameters (such as sampling period,threshold, etc.) needed to be carefully considered. Therefore, if the system needsto achieve stable performance and minimize the response time when the requestrate changes a lot, then this machine learning based auto-scaling mechanismmay offer some benefits. Otherwise, the default heat orchestration would be abetter choice. For example, during the Global Shopping Festival, the Alibabaplatform has a high requirement on scaling in order to server the online trafficin order to ensure the predicted peak traffic load can be handled[44]. In thissituation, a fast response will be very important for auto-scaling.

The online testing combined with offline testing is based on historicalnetwork data regarding client request patterns, hence the online control enginewe may not be suitable for all situations. If the client request pattern haschanged a lot, we need to re-train the model to retune the online control engine.

Both machine learning based and heat orchestration based auto-scalingexhibit ping-pong effects which may cause requests to be dropped and highlatency.

6.3 Future Works

Considering the problems mentioned earlier, there are several works that couldbe done in the future to solve these problems.

Firstly, instead of offline training, an online training algorithm should beinvestigated and developed. In this way, the results of online testing couldchange the (machine learning based) control model in a timely manner and thusrealize closed-loop control. However, such online model training of a machinelearning algorithm is a time and resource consuming process and may introducean extra overhead to the system. For this reason, lightweight online machinelearning algorithm such as online lasso regression and stochastic gradient descentneed to be carefully evaluated.

66

Secondly, several auto-scaling mechanisms could be combined for web servermanagement based on real-time network traffic data. Because the auto-scalingservice in the online control engine provides an auto-scaling service that dependson the control signal generated by machine learning service, if we replace themachine learning service with another control service such as a threshold togenerate a control signal for the auto-scaling service, we can make this onlinecontrol engine more general. As a result, based upon the specific situationwe could choose the control services with the best performance. It is worthinvestigating in the future how and when to switch between different controlservices. Moreover, all the testing examined in this thesis took place in a verysimple cloud environment with a simple network structure and very simple trafficpatterns. In the future, there is a need to collect real-world data and do somefurther investigation on whether this online control engine can be applied in thereal world.

Thirdly, in order to avoid a single point failure, a more complex system withseveral powerful load balancers and servers should be developed for testing anddata collection. More services should be investigated. For example, we can usea spikey event sensitive service to test how much benefit could online controlengine could off.

Finally, auto-scaling is only one auto-management technique in cloudorchestration. There are many other auto-management services which areworth investigating in the future. For example, when using heterogeneous webservers (i.e., those with different memory size and processing rate), how shoulda load balancer service efficiently distribute incoming requests to these webservers. If machine learning or deep learning based orchestration techniquescan be implemented to solve this problem, we could potentially use the networkresources in a more efficient way while providing faster service.

6.4 Ethics and Sustainability

Cloud platforms and resource auto-management techniques have become apopular area which may bring a lot of benefits, such as network flexibilityand resource saving. Since the ultimate goal for network improvement andmanagement is to meet the clients’ requirements, a system should work andimproved based on clients’ feedback. Although we can realize a better scalingpolicy based on the data collected from the client side, it is not easy to set thethreshold which requires a thorough understanding of the networks performanceand clients’ requirements. The difficulty of data collection on the client sidemakes this scaling policy more difficult to realize. In contrast, although theHeat Orchestration Service provided by OpenStack could realize auto-scalingbased on the network data on the server side, it lacks an overall view of the wholenetwork’s performance. This may cause resource wasting and slow reaction. Themachine learning based online control model designed in this project provides usa concept of whether network performance could be detected based on historicdata learning. Though the results are not as good as we expected, it still worthinvestigating as more sophisticated learning model could be developed based onthis concept.

67

As clients are seeking a better way for network resource management withouta full understanding of the network structure, this thesis provides a conceptof understanding network performance by analyzing network raw data. It alsoprovides a new control model for clients to dynamically add or remove instancesin an OpenStack based cloud environment instead of using Heat OrchestrationService. Compared with Heat Orchestration Service, this control model couldinteract with different auto-scaling policy in a more convenient way.

68

Bibliography

[1] J. Shuja, S. A. Madani, K. Bilal, K. Hayat, S. U. Khan, and S. Sarwar,“Energy-efficient data centers,” vol. 94, no. 12. Computing, Dec 2012.doi: 10.1007/s00607-012-0211-2. ISSN 0010-485X, 1436-5057 pp. 973–994.[Online]. Available: http://link.springer.com/10.1007/s00607-012-0211-2

[2] “ETSI GS NFV-MAN 001.” published by Network FunctionsVirtualisation (NFV) ETSI Industry Specification Group (ISG),Dec. 2014. [Online]. Available: http://www.etsi.org/deliver/etsi gs/NFV-MAN/001 099/001/01.01.01 60/gs nfv-man001v010101p.pdf

[3] Tayeb Ben Meriem, Ranganai Chaparadza, Benoıt Radier, SaidSoulhi, Jose-Antonio LozanoLopez, Arun Prakas, “GANA - GenericAutonomic Networking Architecture,” in ETSI Website, Oct 2016.[Online]. Available: https://www.etsi.org/images/files/ETSIWhitePapers/etsi wp16 gana Ed1 20161011.pdf

[4] F. Paganelli, F. Paradiso, M. Gherardelli, and G. Galletti, “Networkservice description model for VNF orchestration leveraging intent-basedSDN interfaces,” in 2017 IEEE Conference on Network Softwarization(NetSoft), July 2017. doi: 10.1109/NETSOFT.2017.8004210. ISBN 978-1-5090-6008-5 pp. 1–5. [Online]. Available: http://ieeexplore.ieee.org/document/8004210/

[5] G. Carella, L. Foschini, A. Pernafini, P. Bellavista, A. Corradi, M. Corici,F. Schreiner, and T. Magedanz, “Quality Audit and Resource Brokering forNetwork Functions Virtualization (NFV) Orchestration in Hybrid Clouds,”in 2015 IEEE Global Communications Conference (GLOBECOM), Dec.2015. doi: 10.1109/GLOCOM.2015.7417385 pp. 1–6. [Online]. Available:https://ieeexplore-ieee-org.focus.lib.kth.se/document/7417385

[6] R. Boutaba, M. A. Salahuddin, N. Limam, S. Ayoubi, N. Shahriar,F. Estrada-Solano, and O. M. Caicedo, “A comprehensive survey onmachine learning for networking: evolution, applications and researchopportunities,” Journal of Internet Services and Applications, vol. 9,no. 1, Dec. 2018. doi: 10.1186/s13174-018-0087-2. [Online]. Available:https://jisajournal.springeropen.com/articles/10.1186/s13174-018-0087-2

[7] C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu, “DCell: AScalable and Fault-Tolerant Network Structure for Data Centers,” vol. 38.ACM SIGCOMM Computer Communication Review, Aug 2008. doi:10.1145/1402946.1402968 pp. 75–86.

69

http://link.springer.com/10.1007/s00607-012-0211-2

http://www.etsi.org/deliver/etsi_gs/NFV-MAN/001_099/001/01.01.01_60/gs_nfv-man001v010101p.pdf

http://www.etsi.org/deliver/etsi_gs/NFV-MAN/001_099/001/01.01.01_60/gs_nfv-man001v010101p.pdf

https://www.etsi.org/images/files/ETSIWhitePapers/etsi_wp16_gana_Ed1_20161011.pdf

https://www.etsi.org/images/files/ETSIWhitePapers/etsi_wp16_gana_Ed1_20161011.pdf

http://ieeexplore.ieee.org/document/8004210/


https://ieeexplore-ieee-org.focus.lib.kth.se/document/7417385

https://jisajournal.springeropen.com/articles/10.1186/s13174-018-0087-2

[8] L. Z. Ron Bower, Jeff McNeely, “Foundations of IBM Cloud ComputingArchitecture.” Presentation at IBM Innovation Center, 2010. [Online].Available: https://www.ibm.com/developerworks/community/files/form/anonymous/api/library/37271a25-f537-4acb-9924-417371aa7af3/document/030737b3-80c4-4115-8ee7-830e39fff738/media/Reading%20Material%20032-Cloud.pdf

[9] G. Walker, “Cloud computing fundamentals.” IBM Website, 2012.[Online]. Available: https://www.ibm.com/developerworks/cloud/library/cl-cloudintro/index.html

[10] G. P. Katsikas, “NFV Service Chains at the Speed of the UnderlyingCommodity Hardware,” Ph.D. dissertation, KTH, Network SystemsLaboratory (NS Lab). ISBN 978-91-7729-863-2 2018, urn:nbn:se:kth:diva-233629.

[11] R. Sherwood, G. Gibb, K.-K. Yap, G. Appenzeller,M. Casado, N. McKeown, and G. Parulkar, “FlowVisor: Anetwork virtualization layer.” OPENFLOW-TR-2009-1, Oct 2009,p. 15. [Online]. Available: https://pdfs.semanticscholar.org/64f3/a81fff495ac336dccdd63136d451852eb1c9.pdf

[12] R. Mijumbi, J. Serrat, J. l. Gorricho, S. Latre, M. Charalambides, andD. Lopez, “Management and orchestration challenges in network functionsvirtualization,” IEEE Communications Magazine, vol. 54, no. 1, pp.98–105, Jan 2016. doi: 10.1109/MCOM.2016.7378433. [Online]. Available:https://ieeexplore.ieee.org/document/7378433

[13] F. Z. Yousaf, P. Loureiro, F. Zdarsky, T. Taleb, and M. Liebsch,“Cost analysis of initial deployment strategies for virtualized mobile corenetwork functions,” IEEE Communications Magazine, vol. 53, no. 12, pp.60–66, Dec 2015. doi: 10.1109/MCOM.2015.7355586. [Online]. Available:https://ieeexplore-ieee-org.focus.lib.kth.se/document/7355586

[14] R. Chaparadza, T. Ben Meriem, B. Radier, S. Szott, M. Wodczak,A. Prakash, Jianguo Ding, S. Soulhi, and A. Mihailovic, “ImplementationGuide for the ETSI AFI GANA model: A Standardized Reference Modelfor Autonomic Networking, Cognitive Networking and Self-Management,”in 2013 IEEE Globecom Workshops (GC Wkshps). Atlanta, GA:IEEE, Dec. 2013. doi: 10.1109/GLOCOMW.2013.6825110. ISBN 978-1-4799-2851-4 pp. 935–940. [Online]. Available: http://ieeexplore.ieee.org/document/6825110/

[15] N. Tcholtchev, R. Chaparadza, and A. Prakash, “Addressing stability ofcontrol-loops in the context of the GANA architecture: Synchronizationof actions and policies,” in Self-Organizing Systems, T. Spyropoulos andK. A. Hummel, Eds. Springer Berlin Heidelberg, 2009, vol. 5918, pp.262–268. ISBN 978-3-642-10864-8 978-3-642-10865-5. [Online]. Available:http://link.springer.com/10.1007/978-3-642-10865-5 28

[16] “Amazon Web Service,” https://aws.amazon.com, September 27 2018.

[17] “Windows Azure,” https://azure.microsoft.com, September 18 2018.

70

https://www.ibm.com/developerworks/community/files/form/anonymous/api/library/37271a25-f537-4acb-9924-417371aa7af3/document/030737b3-80c4-4115-8ee7-830e39fff738/media/Reading%20Material%20032-Cloud.pdf




https://www.ibm.com/developerworks/cloud/library/cl-cloudintro/index.html

https://www.ibm.com/developerworks/cloud/library/cl-cloudintro/index.html

https://pdfs.semanticscholar.org/64f3/a81fff495ac336dccdd63136d451852eb1c9.pdf

https://pdfs.semanticscholar.org/64f3/a81fff495ac336dccdd63136d451852eb1c9.pdf

https://ieeexplore.ieee.org/document/7378433




http://link.springer.com/10.1007/978-3-642-10865-5_28

https://aws.amazon.com

https://azure.microsoft.com

[18] “Scalr,” https://www.scalr.com, September 18 2018.

[19] M. Mao and M. Humphrey, “Auto-scaling to minimize cost and meetapplication deadlines in cloud workflows,” in SC ’11: Proceedings of 2011International Conference for High Performance Computing, Networking,Storage and Analysis, Nov 2011. ISSN 2167-4329 pp. 1–12.

[20] S. Dutta, S. Gera, A. Verma, and B. Viswanathan, “SmartScale:Automatic Application Scaling in Enterprise Clouds,” in 2012 IEEEFifth International Conference on Cloud Computing, June 2012. doi:10.1109/CLOUD.2012.12. ISSN 2159-6190 pp. 221–228. [Online]. Available:https://ieeexplore-ieee-org.focus.lib.kth.se/document/6253509

[21] Y. Ahn and Y. Kim, “VM Auto-Scaling for Workflows in Hybrid CloudComputing,” in 2014 International Conference on Cloud and AutonomicComputing, Sept 2014. doi: 10.1109/ICCAC.2014.34 pp. 237–240.

[22] H. Kang, J. in Koh, Y. Kim, and J. Hahm, “A SLA drivenVM auto-scaling method in hybrid cloud environment,” in 201315th Asia-Pacific Network Operations and Management Symposium(APNOMS), Sept 2013. ISBN 978-4-8855-2279-6 pp. 1–6. [Online].Available: https://ieeexplore.ieee.org/document/6665285

[23] “Autoscaling with openstack heat service,” https://docs.openstack.org/senlin/latest/scenarios/autoscaling heat.html, September 27 2018.

[24] “Docker: The Dev to Ops Choice for Container Platforms,” https://www.docker.com/, September 18 2018.

[25] Y. Li and Y. Xia, “Auto-scaling web applications in hybrid cloudbased on docker,” in 2016 5th International Conference on ComputerScience and Network Technology (ICCSNT), Dec 2016. doi: 10.1109/ICC-SNT.2016.8070122 pp. 75–79.

[26] H. Arabnejad, C. Pahl, P. Jamshidi, and G. Estrada, “A comparisonof reinforcement learning techniques for fuzzy cloud auto-scaling,” in2017 17th IEEE/ACM International Symposium on Cluster, Cloud andGrid Computing (CCGRID), May 2017. doi: 10.1109/CCGRID.2017.15pp. 64–73. [Online]. Available: https://ieeexplore-ieee-org.focus.lib.kth.se/document/7973689

[27] P. Jamshidi, A. Ahmad, and C. Pahl, “Autonomic resource provisioningfor cloud-based software,” in Proceedings of the 9th InternationalSymposium on Software Engineering for Adaptive and Self-ManagingSystems - SEAMS 2014. ACM Press. doi: 10.1145/2593929.2593940.ISBN 978-1-4503-2864-7 pp. 95–104. [Online]. Available: http://dl.acm.org/citation.cfm?doid=2593929.2593940

[28] R. Yanggratoke, J. Ahmed, J. Ardelius, C. Flinta, A. Johnsson, D. Gillblad,and R. Stadler, “Predicting service metrics for cluster-based services usingreal-time analytics.” IEEE, Nov 2015. doi: 10.1109/CNSM.2015.7367349.ISBN 978-3-901882-77-7 pp. 135–143. [Online]. Available: http://ieeexplore.ieee.org/document/7367349/

71

https://www.scalr.com


https://ieeexplore.ieee.org/document/6665285

https://docs.openstack.org/senlin/latest/scenarios/autoscaling_heat.html

https://docs.openstack.org/senlin/latest/scenarios/autoscaling_heat.html

https://www.docker.com/

https://www.docker.com/



http://dl.acm.org/citation.cfm?doid=2593929.2593940




[29] S. Rahman, T. Ahmed, M. Huynh, M. Tornatore, and B. Mukherjee,“Auto-scaling VNFs Using Machine Learning to Improve QoS and ReduceCost,” in 2018 IEEE International Conference on Communications (ICC),May 2018. doi: 10.1109/ICC.2018.8422788. ISSN 1938-1883 pp. 1–6.[Online]. Available: https://ieeexplore-ieee-org.focus.lib.kth.se/document/8422788

[30] “Mirantis: Cloud platform,” https://docs.openstack.org/ceilometer/pike/admin/telemetry-data-pipelines.html, September 18 2018.

[31] “Mirantis Drivetrain Lifecycle Management,” https://www.mirantis.com/software/mcp/drivetrain/, September 18 2018.

[32] “httperf - linux man page,” https://linux.die.net/man/1/httperf, Septem-ber 18 2018.

[33] D. Mosberger and T. Jin, “httperf–a tool for measuring web serverperformance,” ACM SIGMETRICS Performance Evaluation Review,vol. 26, no. 3, pp. 31–37, Dec 1998. doi: 10.1145/306225.306235. [Online].Available: http://portal.acm.org/citation.cfm?doid=306225.306235

[34] “HAProxy, The Reliable, High Performance TCP/HTTP Load Balancer,”http://www.haproxy.org/, September 18 2018.

[35] “httpd - apache hypertext transfer protocol server,” https://httpd.apache.org/docs/2.4/programs/httpd.html, September 18 2018.

[36] “Data processing and pipelines,” https://docs.openstack.org/ceilometer/pike/admin/telemetry-data-pipelines.html, September 27 2018.

[37] “Alarms,” https://docs.openstack.org/ocata/admin-guide/telemetry-alarms.html, September 27 2018.

[38] “The GNU Netcat Project,” http://netcat.sourceforge.net/, September 272018.

[39] “Hyperparameter optimization,” https://en.wikipedia.org/wiki/Hyperparameter optimization, October 20 2018.

[40] K. L. Bogdanov, W. Reda, G. Q. Maguire, Jr., D. Kostic, andM. Canini, “Fast and Accurate Load Balancing for Geo-DistributedStorage Systems,” Proceedings of the ACM Symposium on CloudComputing, pp. 386–400, 2018. doi: 10.1145/3267809.3267820. [Online].Available: http://doi.acm.org.focus.lib.kth.se/10.1145/3267809.3267820

[41] C. Delimitrou and C. Kozyrakis, “Amdahl’s law for tail latency,”Communications of the ACM, vol. 61, no. 8, pp. 65–72, Jul. 2018.doi: 10.1145/3232559. [Online]. Available: http://dl.acm.org/citation.cfm?doid=3241891.3232559

[42] M. I. Hossain and M. I. Hossain, “Dynamic scaling of a web-based application in a Cloud Architecture,” Master’s thesis, KTH,Radio Systems Laboratory (RS Lab), 2014, urn:nbn:se:kth:diva-142361.[Online]. Available: http://www.diva-portal.org/smash/get/diva2:699823/FULLTEXT02.pdf

72



https://docs.openstack.org/ceilometer/pike/admin/telemetry-data-pipelines.html


https://www.mirantis.com/software/mcp/drivetrain/

https://www.mirantis.com/software/mcp/drivetrain/

https://linux.die.net/man/1/httperf

http://portal.acm.org/citation.cfm?doid=306225.306235

http://www.haproxy.org/

https://httpd.apache.org/docs/2.4/programs/httpd.html

https://httpd.apache.org/docs/2.4/programs/httpd.html



https://docs.openstack.org/ocata/admin-guide/telemetry-alarms.html

https://docs.openstack.org/ocata/admin-guide/telemetry-alarms.html

http://netcat.sourceforge.net/

https://en.wikipedia.org/wiki/Hyperparameter_optimization

https://en.wikipedia.org/wiki/Hyperparameter_optimization

http://doi.acm.org.focus.lib.kth.se/10.1145/3267809.3267820



http://www.diva-portal.org/smash/get/diva2:699823/FULLTEXT02.pdf

http://www.diva-portal.org/smash/get/diva2:699823/FULLTEXT02.pdf

[43] A. Kejariwal, “Techniques for Optimizing Cloud Footprint,” in 2013IEEE International Conference on Cloud Engineering (IC2E). RedwoodCity, CA: IEEE, Mar. 2013. doi: 10.1109/IC2E.2013.14. ISBN978-0-7695-4945-3 978-1-4673-6473-7 pp. 258–268. [Online]. Available:http://ieeexplore.ieee.org/document/6529292/

[44] “Global Shopping Festival 2017 - TECHNOLOGYFACT SHEET.” Alibaba, September 27 2017. [Online].Available: https://www.alizila.com/wp-content/uploads/2017/11/Factsheet 11.11-Global-Shopping-Festival-Technology final-1.pdf

73


https://www.alizila.com/wp-content/uploads/2017/11/Factsheet_11.11-Global-Shopping-Festival-Technology_final-1.pdf

https://www.alizila.com/wp-content/uploads/2017/11/Factsheet_11.11-Global-Shopping-Festival-Technology_final-1.pdf

TRITA TRITA-EECS-EX-2018:712

www.kth.se

Documents

Cloud Auto-Scaling Control Engine Based on …1263590/...Cloud Auto-Scaling Control Engine Based on Machine Learning Yantian You 2018-10-29 Master Thesis Examiner Gerald Q. Maguire