Upload
hoangngoc
View
215
Download
0
Embed Size (px)
Citation preview
confidentiel Groupe France Télécom
Emerging Technology Reliability
2017
AI Techniques for Reliability and Performance in Softwarized Networks
Prosper Chemouil, Imen Grida Ben Yahia, Jaafar Bendriss
Orange Labs Networks July 3, 2017
Orange confidential
2 confidentiel Groupe France Télécom
1. Context and Challenges
2. SLA Enforcement Methodology
3. Use Cases
4. Conclusions & Perspectives
Outline
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
3 confidentiel Groupe France Télécom
Context
Softwarized Networks towards 5G Networks
Virtualization
Cloud Architectures
NFV & SDN
Goal
How to make Future Networks more reliable?
Assessing the impact on the SLA/SLOs compliance in case of: Network & Service changes
Performance Degradation
Context & Challenges
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
4 confidentiel Groupe France Télécom
H2020 CogNet
Management software are semi-automated and still require human interactions to detect, correct problems and make decisions.
Management software are being stretched by 4G networks, and will be completely incapable of managing softwarized 5G/IoT networks.
CogNet
– aims at collecting massive data from networks and
– applying Machine Learning (AI) algorithms to detect and
– correct issues and allow the network to be Self-managed
Main Drivers & Objectives Partners
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
5 confidentiel Groupe France Télécom
Based on low level network metrics monitoring
Problem Statement: How to predict SLOs violations?
SLO_1: Service Availability ratio
SLO_2: Response time ratio
SLO_3: Downlink throughput ratio and downlink latency ratio.
SLO examples SLA example
SLA formalization in YAML (key,value) pair: Python dictionary variable:
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
6 confidentiel Groupe France Télécom
SLA violation
prediction
SLA Translation
resource monitoring
Related Work on SLA Management
New Context New challenges raised by NFV/SDN for SLA Management. Data diversity and sheer amount of data requires more than
“prediction techniques”
Justifies the use of AI techniques
T. Wood et al., “Sandpiper: black-box and gray-box resource management for virtual machines”, in Computer Networks (2009)
J. Ahmed et al., “Predicting SLA conformance for cluster-based services using distributed analytics”, in NOMS 2016 - 2016
G. Kousiouris et al., “Translation of application-level terms to resource-level attributes across the Cloud stack layers”, in IEEE ISCC 2011
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
7 confidentiel Groupe France Télécom
1. Context and Challenges
2. SLA Enforcement Methodology
3. Use Cases
4. Conclusions & Perspectives
Outline
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
8 confidentiel Groupe France Télécom
Detect SLOs that are expected to Breach
Comparison with Threshold
Threshold determined with Engineering Practice
Address the SLOs with Higher Impact
Determine the Right Action
Reduce Load
Scale up/down
Instantiate a new VM
SLO Breach
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
9 confidentiel Groupe France Télécom
Overall Architecture
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
10 confidentiel Groupe France Télécom
Cognitive Smart Engine
The CSE relies on the 3 modules:
Forecasting
Violation prediction
SLA enforcement
Violation
Prediction
1° calculates forecasted values
2° reads SLA descriptors 3° computes violation
probabilities
4° identifies the potential impacted SLO
SLA enforcer
5° recommends recovery actions
Forecasting
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
11 confidentiel Groupe France Télécom
Raw Data
Data Processing
Transformation
Pre-Processed Data
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
12 confidentiel Groupe France Télécom
Data Analysis
Auto-Correlation Analysis
Correlation Analysis (PCA)
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
13 confidentiel Groupe France Télécom
Auto-Correlation Analysis
From 26 metrics to 4 relevant KPIs >90% information preserved End of Pre-Processing
Principal Component Analysis
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
14 confidentiel Groupe France Télécom
Two methods
Back Propagation Neural Network
FeedForward Neural Network (FFNN)
Black Box Approach. Many degrees of freedom
Recursive Neural Network
Long Short Term Memory
Captures sequential features of data
Neural Approaches
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
15 confidentiel Groupe France Télécom
𝒀 = 𝒎𝒆𝒕𝒓𝒊𝒄𝟏 ∗ 𝒘𝟏 + 𝒃𝟏 + 𝒎𝒆𝒕𝒓𝒊𝒄𝟐 ∗ 𝒘𝟐 + 𝒃𝟐 + … + 𝒎𝒆𝒕𝒓𝒊𝒄𝟒 ∗ 𝒘𝟒 + 𝒃𝟒
Matrix Formulation
𝑌 = 𝑋 ∗𝑊 + 𝐵
Training: backpropagation algorithm.
𝐸𝑟𝑟𝑤 = 𝑡𝑛 log(𝐻𝑤 𝑥 ) + (1 − 𝑡𝑛) log(1 − 𝐻𝑤 𝑥 )
𝑁
𝑛=1
The objective is to minimize :
𝒅
𝒅𝒘 𝑬𝒓𝒓𝒘
For the output node, the error is the total error
𝛿 = 𝐻𝑤(𝑥) − 𝑡
For the hidden layer, the error nodes
𝛿(𝑖) = (𝑊(𝑖))𝑇 ∗ 𝛿(𝑖+1) .∗ 𝑔′(𝑂𝑢𝑡𝑝𝑢𝑡(𝑖))
FFNN Formulation
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
16 confidentiel Groupe France Télécom
Recursive Neural Network
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
17 confidentiel Groupe France Télécom
Long Short Term Memory
source: colab.github.io
Keras Library based on Google’s TensorFlow
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
18 confidentiel Groupe France Télécom
Performance Metrics
Precision
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒+𝐹𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
Recall
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒
F-Score
𝐹 − 𝑠𝑐𝑜𝑟𝑒 = 2 ∗𝑅𝑒𝑐𝑎𝑙𝑙 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
𝑅𝑒𝑐𝑎𝑙𝑙 + 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
19 confidentiel Groupe France Télécom
1. Context and Challenges
2. SLA Enforcement Methodology
3. Use Cases
4. Conclusions & Perspectives
Outline
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
20 confidentiel Groupe France Télécom
3 streaming VMs that can be accessed through OVS network by 3 different clients.
the OVS are connected through OpenDaylight controller the OVS links capacity can be controlled via local scripts
Video Streaming
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
21 confidentiel Groupe France Télécom
Dataset
monitoring system: Monasca – MONitoring At SCAle,
Observation window: 1 month
Sampling frequency: 30 seconds
Number of entry lines per metric: 100.000
Number of features/metrics per VM: 26 metrics but we used only 4
SLA data:
YAML file translated into dictionary structure
( key, value ) pair
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
22 confidentiel Groupe France Télécom
Evaluation Result (1/3) Results of offline evaluation mode of the FFNN with three different SLO breach thresholds
Don’t want to miss anything
sure of everything
standard case
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
23 confidentiel Groupe France Télécom
Evaluation Result (1/3)
150 200 250 300 350 400 450
Real values
Predictions
Breaches
Alpha = 0.8
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
24 confidentiel Groupe France Télécom
Evaluation Result (3/3) Comparison between 2 Artificial Intelligence methods FeedForward Neural Network (Back Propagation NN) Long Short Term Memory (Recurrent NN)
LSTM one input (cpu.percentage)
FFNN with 20 layers
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
25 confidentiel Groupe France Télécom
VoIP on vIMS
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
26 confidentiel Groupe France Télécom
SLO Violation Identification
Detection of the 3 types of SLO violation thanks to a Decision tree algorithm
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
27 confidentiel Groupe France Télécom
Results
Anomaly
Classes
(SLOs)
Occurrence of
SLO Violation
precision recall F-score
1 60 0.822 0.879 0.849
2 60+120 0.890 0.917 0.903
3 60+120+80 0.909 0.913 0.910
Very accurate method to identify SLO violations according to F-score
But still suffers from some false positive (precision) and some false negatives (recall)
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
28 confidentiel Groupe France Télécom
1. Context and Challenges
2. SLA Enforcement Methodology
3. Use Cases
4. Conclusions & Perspectives
Outline
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
29 confidentiel Groupe France Télécom
.
LSTM is very robust at the expense of longer Training, with the risk of over-fitting.
Focus on Recovery Actions Policy Engine for Actuation
Next
High accuracy with respect to both precision and recall metrics.
Threshold 𝛼 is a key tuning parameter in the framework Use of Multi-Class Decision Tree (IR3)
Summary (1)
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
30 confidentiel Groupe France Télécom
Machine Learning appears to be very efficient for SLA Management.
Other Use cases addressed in CogNet Project using ML Techniques:
Anomaly Detection
Dynamic Adaptation to Anomalies
Flexible Topology and Routing
Service Function Chaining
Summary (2)
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)
31 confidentiel Groupe France Télécom
References
H2020 CogNet http://www.cognet.5g-ppp.eu/
J. Bendriss, I. Grida Ben Yahia, D. Zeghlache Forecasting and Anticipating SLO Breaches in Programmable Networks Proc. ICIN 2017, Paris, France, March 2017
J. Bendriss, I. Grida Ben Yahia, P. Chemouil, D. Zeghlache AI for SLA Management in Programmable Networks Proc. DRCN 2017, Munich, Germany, March 2017
T. S. Buda, H. Assem, L. Xu. ADE: An ensemble approach for early anomaly detection Proc IM 2017, Lisbon, Portugal, May 2017.
AI Techniques for Reliability and Performance in Softwarized Networks ETR 2017 (July 3, 2017 – Bologna, Italy)