32
Emerging Technology Reliability 2017 AI Techniques for Reliability and Performance in Softwarized Networks Prosper Chemouil , Imen Grida Ben Yahia, Jaafar Bendriss Orange Labs Networks July 3, 2017

AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

Embed Size (px)

Citation preview

Page 1: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

confidentiel Groupe France Télécom

Emerging Technology Reliability

2017

AI Techniques for Reliability and Performance in Softwarized Networks

Prosper Chemouil, Imen Grida Ben Yahia, Jaafar Bendriss

Orange Labs Networks July 3, 2017

Orange confidential

Page 2: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

2 confidentiel Groupe France Télécom

1. Context and Challenges

2. SLA Enforcement Methodology

3. Use Cases

4. Conclusions & Perspectives

Outline

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 3: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

3 confidentiel Groupe France Télécom

Context

Softwarized Networks towards 5G Networks

Virtualization

Cloud Architectures

NFV & SDN

Goal

How to make Future Networks more reliable?

Assessing the impact on the SLA/SLOs compliance in case of: Network & Service changes

Performance Degradation

Context & Challenges

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 4: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

4 confidentiel Groupe France Télécom

H2020 CogNet

Management software are semi-automated and still require human interactions to detect, correct problems and make decisions.

Management software are being stretched by 4G networks, and will be completely incapable of managing softwarized 5G/IoT networks.

CogNet

– aims at collecting massive data from networks and

– applying Machine Learning (AI) algorithms to detect and

– correct issues and allow the network to be Self-managed

Main Drivers & Objectives Partners

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 5: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

5 confidentiel Groupe France Télécom

Based on low level network metrics monitoring

Problem Statement: How to predict SLOs violations?

SLO_1: Service Availability ratio

SLO_2: Response time ratio

SLO_3: Downlink throughput ratio and downlink latency ratio.

SLO examples SLA example

SLA formalization in YAML (key,value) pair: Python dictionary variable:

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 6: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

6 confidentiel Groupe France Télécom

SLA violation

prediction

SLA Translation

resource monitoring

Related Work on SLA Management

New Context New challenges raised by NFV/SDN for SLA Management. Data diversity and sheer amount of data requires more than

“prediction techniques”

Justifies the use of AI techniques

T. Wood et al., “Sandpiper: black-box and gray-box resource management for virtual machines”, in Computer Networks (2009)

J. Ahmed et al., “Predicting SLA conformance for cluster-based services using distributed analytics”, in NOMS 2016 - 2016

G. Kousiouris et al., “Translation of application-level terms to resource-level attributes across the Cloud stack layers”, in IEEE ISCC 2011

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 7: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

7 confidentiel Groupe France Télécom

1. Context and Challenges

2. SLA Enforcement Methodology

3. Use Cases

4. Conclusions & Perspectives

Outline

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 8: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

8 confidentiel Groupe France Télécom

Detect SLOs that are expected to Breach

Comparison with Threshold

Threshold determined with Engineering Practice

Address the SLOs with Higher Impact

Determine the Right Action

Reduce Load

Scale up/down

Instantiate a new VM

SLO Breach

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 9: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

9 confidentiel Groupe France Télécom

Overall Architecture

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 10: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

10 confidentiel Groupe France Télécom

Cognitive Smart Engine

The CSE relies on the 3 modules:

Forecasting

Violation prediction

SLA enforcement

Violation

Prediction

1° calculates forecasted values

2° reads SLA descriptors 3° computes violation

probabilities

4° identifies the potential impacted SLO

SLA enforcer

5° recommends recovery actions

Forecasting

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 11: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

11 confidentiel Groupe France Télécom

Raw Data

Data Processing

Transformation

Pre-Processed Data

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 12: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

12 confidentiel Groupe France Télécom

Data Analysis

Auto-Correlation Analysis

Correlation Analysis (PCA)

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 13: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

13 confidentiel Groupe France Télécom

Auto-Correlation Analysis

From 26 metrics to 4 relevant KPIs >90% information preserved End of Pre-Processing

Principal Component Analysis

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 14: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

14 confidentiel Groupe France Télécom

Two methods

Back Propagation Neural Network

FeedForward Neural Network (FFNN)

Black Box Approach. Many degrees of freedom

Recursive Neural Network

Long Short Term Memory

Captures sequential features of data

Neural Approaches

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 15: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

15 confidentiel Groupe France Télécom

𝒀 = 𝒎𝒆𝒕𝒓𝒊𝒄𝟏 ∗ 𝒘𝟏 + 𝒃𝟏 + 𝒎𝒆𝒕𝒓𝒊𝒄𝟐 ∗ 𝒘𝟐 + 𝒃𝟐 + … + 𝒎𝒆𝒕𝒓𝒊𝒄𝟒 ∗ 𝒘𝟒 + 𝒃𝟒

Matrix Formulation

𝑌 = 𝑋 ∗𝑊 + 𝐵

Training: backpropagation algorithm.

𝐸𝑟𝑟𝑤 = 𝑡𝑛 log(𝐻𝑤 𝑥 ) + (1 − 𝑡𝑛) log(1 − 𝐻𝑤 𝑥 )

𝑁

𝑛=1

The objective is to minimize :

𝒅

𝒅𝒘 𝑬𝒓𝒓𝒘

For the output node, the error is the total error

𝛿 = 𝐻𝑤(𝑥) − 𝑡

For the hidden layer, the error nodes

𝛿(𝑖) = (𝑊(𝑖))𝑇 ∗ 𝛿(𝑖+1) .∗ 𝑔′(𝑂𝑢𝑡𝑝𝑢𝑡(𝑖))

FFNN Formulation

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 16: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

16 confidentiel Groupe France Télécom

Recursive Neural Network

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 17: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

17 confidentiel Groupe France Télécom

Long Short Term Memory

source: colab.github.io

Keras Library based on Google’s TensorFlow

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 18: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

18 confidentiel Groupe France Télécom

Performance Metrics

Precision

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒

𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒+𝐹𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒

Recall

𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒

𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒

F-Score

𝐹 − 𝑠𝑐𝑜𝑟𝑒 = 2 ∗𝑅𝑒𝑐𝑎𝑙𝑙 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛

𝑅𝑒𝑐𝑎𝑙𝑙 + 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 19: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

19 confidentiel Groupe France Télécom

1. Context and Challenges

2. SLA Enforcement Methodology

3. Use Cases

4. Conclusions & Perspectives

Outline

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 20: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

20 confidentiel Groupe France Télécom

3 streaming VMs that can be accessed through OVS network by 3 different clients.

the OVS are connected through OpenDaylight controller the OVS links capacity can be controlled via local scripts

Video Streaming

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 21: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

21 confidentiel Groupe France Télécom

Dataset

monitoring system: Monasca – MONitoring At SCAle,

Observation window: 1 month

Sampling frequency: 30 seconds

Number of entry lines per metric: 100.000

Number of features/metrics per VM: 26 metrics but we used only 4

SLA data:

YAML file translated into dictionary structure

( key, value ) pair

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 22: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

22 confidentiel Groupe France Télécom

Evaluation Result (1/3) Results of offline evaluation mode of the FFNN with three different SLO breach thresholds

Don’t want to miss anything

sure of everything

standard case

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 23: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

23 confidentiel Groupe France Télécom

Evaluation Result (1/3)

150 200 250 300 350 400 450

Real values

Predictions

Breaches

Alpha = 0.8

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 24: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

24 confidentiel Groupe France Télécom

Evaluation Result (3/3) Comparison between 2 Artificial Intelligence methods FeedForward Neural Network (Back Propagation NN) Long Short Term Memory (Recurrent NN)

LSTM one input (cpu.percentage)

FFNN with 20 layers

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 25: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

25 confidentiel Groupe France Télécom

VoIP on vIMS

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 26: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

26 confidentiel Groupe France Télécom

SLO Violation Identification

Detection of the 3 types of SLO violation thanks to a Decision tree algorithm

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 27: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

27 confidentiel Groupe France Télécom

Results

Anomaly

Classes

(SLOs)

Occurrence of

SLO Violation

precision recall F-score

1 60 0.822 0.879 0.849

2 60+120 0.890 0.917 0.903

3 60+120+80 0.909 0.913 0.910

Very accurate method to identify SLO violations according to F-score

But still suffers from some false positive (precision) and some false negatives (recall)

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 28: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

28 confidentiel Groupe France Télécom

1. Context and Challenges

2. SLA Enforcement Methodology

3. Use Cases

4. Conclusions & Perspectives

Outline

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 29: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

29 confidentiel Groupe France Télécom

.

LSTM is very robust at the expense of longer Training, with the risk of over-fitting.

Focus on Recovery Actions Policy Engine for Actuation

Next

High accuracy with respect to both precision and recall metrics.

Threshold 𝛼 is a key tuning parameter in the framework Use of Multi-Class Decision Tree (IR3)

Summary (1)

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 30: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

30 confidentiel Groupe France Télécom

Machine Learning appears to be very efficient for SLA Management.

Other Use cases addressed in CogNet Project using ML Techniques:

Anomaly Detection

Dynamic Adaptation to Anomalies

Flexible Topology and Routing

Service Function Chaining

Summary (2)

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 31: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

31 confidentiel Groupe France Télécom

References

H2020 CogNet http://www.cognet.5g-ppp.eu/

J. Bendriss, I. Grida Ben Yahia, D. Zeghlache Forecasting and Anticipating SLO Breaches in Programmable Networks Proc. ICIN 2017, Paris, France, March 2017

J. Bendriss, I. Grida Ben Yahia, P. Chemouil, D. Zeghlache AI for SLA Management in Programmable Networks Proc. DRCN 2017, Munich, Germany, March 2017

T. S. Buda, H. Assem, L. Xu. ADE: An ensemble approach for early anomaly detection Proc IM 2017, Lisbon, Portugal, May 2017.

AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)

Page 32: AI Techniques for Reliability and Performance in ...cqr.committees.comsoc.org/files/2017/03/03-Prosper_Chemouil.pdf · confidentiel Groupe France Télécom Emerging Technology Reliability

32 confidentiel Groupe France Télécom

Thank you!