42
Diagnosis and situation assessment in self-adaptive networked systems Louise Travé-Massuyès

Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Diagnosis and situation assessment in self-adaptive networked systems

Louise Travé-Massuyès

Page 2: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

ADREAM

ADREAM 2

Modelling Mathematical models

Optimization Optimisation tools for sizing, routing, scheduling

Control Advanced control, diagnosis and supervision laws

Which topics for MOCOSY ? What does diagnosis bring in ?

Page 3: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Massive networked objects and artefacts

ADREAM 3

Network

How the local interactions and the network structure influence the properties of the global emerging system is still not completely understood.

  Survey of scientific topics arising from networked systems

  Illustration with some on-going research activities

  Focus on how diagnosis applies to the field

Page 4: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

ADREAM 4

Diagnosis : an observation task

DIAGNOSIS from ancient greek : διαγνωστικός, diágnostikos

δια-, dia- : apart-split +

γνω̃σις, gnosis : knowledge

Monitoring the system Isolating and identifying faults / characteristic situations Goal : maintaining the system’s vital functions

The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms arising from selected observations,

checks or tests.

Page 5: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

5

Principles of Model based Diagnosis

Physical system

System model

Observed behaviour

Predicted behaviour

Comparison

Fault detection: is there a fault ? Fault isolation: where is the fault ? Fault identification: what is the fault ?

ADREAM

Page 6: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Diagnosis : more formally

  Fault detection: trigger an alarm when   SDnorm ∪ OBS is inconsistent

  Diagnosis: find ∆, a representation of the faulty situation, consistent with the observations   SD ∪ OBS ∪ ∆ is satisfiable   Fault isolation: ∆ = {AB(c)⏐ c ∈ Δ} ∪ {¬AB(c)⏐ c ∈ COMPS \ Δ}   Fault identification: {mi(c)⏐c ∈ COMPS}

ADREAM 6

  Given a model of the system SD   Given a set / sequence of observations OBS

Page 7: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Diagnoses

In practice, given the observations OBS, several diagnoses ∆i are possible:

Diag(SD, OBS) = U ∆i

ADREAM 7

  One diagnosis only: certainty   Several diagnoses: ambiguity

  Additional observations are necessary to resolve the ambiguity

Page 8: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Diagnosability: about ambiguity

f1

f2 Sig(f2)

OBS2

OBS1

Sig(f1)

Diagnosability is the capability of a system and its monitors to exhibit different observables for different anticipated faulty situations.

  Diagnosability is a property to be checked   It provides the formal guarantee that an anticipated fault can always be

diagnosed   There are several formal definitions according to the modeling formalism

for SD

Page 9: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Diagnosability: about ambiguity

f1

f2

Sig(f1) OBS1

Sig(f2) OBS2

∩ = ∅

Sig(f1)

Diagnosability is the capability of a system and its monitors to exhibit different observables for different anticipated faulty situations.

  Diagnosability is a property to be checked   It provides the formal guarantee that an anticipated fault can always be

diagnosed   There are several formal definitions according to the modeling formalism

for SD

Page 10: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Diagnosability: about ambiguity

f1

f2

Sig(f1) OBS1

Sig(f2) OBS2

∩ = ∅

OBS1

Sig(f1) f1 f2

f6 f5

f4

f3 f7 f8

OBS

Diagnosability is the capability of a system and its monitors to exhibit different observables for different anticipated faulty situations.

  Diagnosability is a property to be checked   It provides the formal guarantee that an anticipated fault can always be

diagnosed   There are several formal definitions according to the modeling formalism

for SD

Page 11: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Network and artefacts

ADREAM 11

Network Control / Observation of networks

Control / Observation over networks

Multi-agent systems

Page 12: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Control / Observation of networks

ADREAM 12

Network

Page 13: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Control / Observation of networks

  Ressource allocation problems   Call admission   Scheduling   Routing

  Theoretical understanding of network congestion control   Mathematical models for flow control under various protocols   Fluid flow models for analysis and design, possibly including

effects of time delays and nonlinearities   Scalable and distributed optimization control algorithms

ADREAM 13

Provide QoS while achieving efficient and fair utilization of network ressources

Page 14: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Multiple time-delay system modeling and control for router management

  TCP (Transfert Control Protocol) : an end-to-end congestion control mechanism

  Upon receipt of an ack (or not), the source increases (or decreases) its sending rate (AIMD algorithm : Additive Increase Multiplicative Decrease)

  When buffer overflows, packets are dropped

ADREAM 14 Y.ARIBA , F.GOUAISBAUT , Y.LABIT, Feedback control for router management and TCP/IP network stability, IEEE Transactions on Network and Service Management, Vol.6/4, December 2009.

Page 15: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Multiple time-delay system modeling and control for router management

ADREAM 15

Formulating a fluid-flow model, use the non-linear multiple time delay systems theory to design an AQM (Active Queue Management Mechanism)

Control law for the dropping probability :

  xi = rate of source i (pkts/s)   b = queue length (pkts)   τi = round trip time (RTT) (s)   τi

f = forward delay (s)   τi

b = backward delay (s)   C = link capacity (pkt/s)   N = number of TCP connections   pi = dropping probability

Y.ARIBA , F.GOUAISBAUT , Y.LABIT, Feedback control for router management and TCP/IP network stability, IEEE Transactions on Network and Service Management, Vol.6/4, December 2009.

Page 16: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Traffic monitoring and CBR anomaly detection

ADREAM 16

Formulating a fluid-flow model, use the non-linear multiple time delay systems theory to design an AQM (Active Queue Management Mechanism)

  xi = rate of source i (pkts/s)   b = queue length (pkts)   τi = round trip time (RTT) (s)   τi

f = forward delay (s)   τi

b = backward delay (s)   C = link capacity (pkt/s)   N = number of TCP connections   pi = dropping probability

+ d(t)

CBR anomaly represents a flooding attack. It is modeled as a piece-wise constant function d(t)

S.RAHME , Y.LABIT , F.GOUAISBAUT, An unknown input sliding observer for anomaly detection in TCP/IP network, International Conference on Ultra Modern Telecommunications, ICUMT 2009, Saint Petersbourg (Russie), 12-14 Octobre 2009.

Page 17: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Traffic monitoring and CBR anomaly detection

ADREAM 17

Formulating a fluid-flow model, use the non-linear multiple time delay systems theory to design an AQM (Active Queue Management Mechanism)

+ d(t)

CBR anomaly represents a flooding attack. It is modeled as a piece-wise constant function d(t)

An unknown input observer based approch allows us to detect and identify the malicious flow.

S.RAHME , Y.LABIT , F.GOUAISBAUT, An unknown input sliding observer for anomaly detection in TCP/IP network, International Conference on Ultra Modern Telecommunications, ICUMT 2009, Saint Petersbourg (Russie), 12-14 Octobre 2009.

Page 18: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Transport service self-adaptation through micro-protocol composition

ADREAM 18

In standard transport services, mechanisms offering different functionalities are merged within the same monolithic implementation.

Component-based composable transport services

Congestion control & Partial Reliability TFRC & PR

Van Wambeke N, Armando F, Chassot C, Exposito E. A model-based approach for self-adaptive Transport protocols. Elsevier Computer Communications, Special issue on end-to-end support over heterogeneous wired and wireless network, vol. 31, n°11, July 2008, pp. 2699-2705.

Page 19: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Chronicle based situation assessment for self-adapting strategies

ADREAM 19

Relevant situations generally express as temporally constrained event patterns chronicles

Chronicle for packet_loss

Self-adapting strategy from TFRC to TD-TFRC

Chronicle for agreement

A. SUBIAS, E.EXPOSITO, C. CHASSOT, L. TRAVE-MASSUYES, K. DRIRA, Self-adapting Strategies guided by Diagnosis and Situation Assessment in Collaborative Communicating Systems", Submission to 21st International Workshop on Principles of Diagnosis (DX-10), Portland (USA), October 2010.

Page 20: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Distributed Load Balancing and Game Theory

  Dispatchers take independent decisions to minimize:

  Processor sharing servers

•  Objective : compare distributed decision making with globally optimal solution

•  Result : distributed solution is at most √K worse than the global optimum

U.AYESTA , O.BRUN , B.PRABHU, Price of Anarchy in Non-Cooperative Load Balancing, 29th Annual International Conference on Computer Communications (IEEE INFOCOM 2010), San Diego (USA), 15-19 Mars 2010, 6p.

Page 21: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Long experience research in Telecommunications

  End-to-End Network Simulation   Differential Traffic Theory & Hybrid Simulation (International Patent)   Stochastic Models of Population behaviour and multimedia traffic sources …   Queueing models for Data Center Simulation

  Optimisation   Topology Design for resilient networks (access and backbone)   Capacity Planning   Optimisation of Internet Routing Protocols   Traffic Engineering and Quality of Service (MPLS)

  The SpinOff Company QoS Design (www.qosdesign.com)   Founders : 3 researchers from LAAS-CNRS   National Awards : The company was awarded 4 times   Products: NEST, a software suite for the Simulation/Planning/Supervision of Next

Generation Networks   WorldWide Market : Telecom operators, Entreprise with large scale WANs,

Datacenters   Partners/Customers : SFR, Vodafone, Alcatel, Maroc Telecom, British Telecom,

French Defense, NextiraOne, EADS-DS …

Page 22: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Control / Observation over networks

ADREAM 24

Page 23: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

General architecture

ADREAM

Actuators

Process

Sensors

Process

Actuators

Process

Sensors

A

S

A

S

Controller Network

  Time delays

  Packet loss

  Message errors

  Congestion   Collision   Medium

disturbancies

Finite capacity links which may suffer disturbancies

Page 24: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Influence of the network QoS on the controller : towards co-design

ADREAM 26

QoC (Quality of Control) depends on the QoS (Quality of Service) provided by the network

  Control application properties (stability, response time, …) are obviously dependent on the network QoS

  Network ressource allocation may be dynamically tuned to the application needs

  The idea is to adapt the flow or message priorities as a (non linear) function of the control application performances in time

Page 25: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Co-design strategies: the Hybrid Priority scheme for CAN networks   MAC (Medium Access Control) layer: message scheduling   Hybrid Priority Scheme

  static priority for a flow   dynamic priority for the messages of a flow

ADREAM 27

Now extending to wireless networks and

the Network layer of Mesh networks

X.NGUYEN , G.JUANOLE , G.MOUNEY , C.CALMETTES, Networked Control System (NCS) on a network CAN: on the Quality of Service (QoS) and quality of Control (QoC) provided by different message scheduling schemes based on hybrid priorities, International Workshop on Factory Communication Systems (WFCS 2010), Nancy (France), 18-21 Mai 2010, pp.261-270

Page 26: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

The hybrid priority scheme applied to fault detection

ADREAM 28

Output error εk=yk-ŷk

Residual rk=Tεk

εk+1= (Φ-LC)εk+ Φfk + ΦBΔukτk

The Hybrid Priority Scheme reduces the false

alarm rate

G.JUANOLE , G.MOUNEY , D.SAUTER , C.AUBRUN , C.CALMETTES, Decision Making Improvement for Diagnosis in Networked Control Systems based on Dynamic Message Scheduling, 18th Mediterranean Conf. on Control and Automation MED 2010, Marrakech, June 2010.

Page 27: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Multi-Agent Systems (Diagnosis for)

ADREAM 29

Page 28: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Diagnostic architectures

ADREAM 30

N N

Page 29: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Number of diagnoses

Remember that, given the observations OBS, several diagnoses ∆i are possible:

Diag(SD, OBS) = U ∆i

ADREAM 31

  The model constraints the possible diagnoses   MCi ⊆ Msys

  ΙDiag(MCi,OBS)Ι ≥ ΙDiag(Msys,OBS)Ι

  The observations constraint the possible diagnoses   Oi ⊆ O   ΙDiag(Msys,Oi)Ι ≥ ΙDiag(Msys,O)Ι

Page 30: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Coordinated diagnosis

  Each agent i knows the global model Msys but has only partial observations Oi

  The coordinator   Knows the global model Msys and the observability

of each agent   Recombines the diagnosis candidates based on

the global model

ADREAM 32

Page 31: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Decentralized / Distributed diagnosis

  Compute local diagnoses with the local model MCi and the local observations Oi   Get Diag(MCi,Oi), i=1, …, N   Local diagnoses are locally consistent

  Compute global diagnoses from local diagnoses   Account for the constraints of the adjacent local models and

for other observations   Get Diag(Msys,O)   Global diagnoses are globally consistent

ADREAM 33

Decentralized case: the computation of global diagnoses is orchestrated by a supervisor Distributed case: the computation of global diagnoses is achieved by communication means

Page 32: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

WS-DIAMOND: Web Services DIAgnosability, MONitoring and Diagnosis (IST 2005-2008)

WS Description

Workflow +

Data dependencies

Component Oriented Model

Components +

Structure

Constraint Model

Set of constraints

Structural model

Activity 4 model

M4 : Mode variable

O1, O2 : Input variables

Y, Z : Output variables

Diagnosis purpose WS-DIAMOND TEAM. WS-DIAMOND: Web Services – DIAgnosability, MONitoring

and Diagnosis, « At your service: An overview of results of projects in the field of service engineering of the IST programme » MIT Press Series on Information Systems, Chapter 9, J.Mylopoulos and M.Papazoglou (Eds.), 2009.

Modeling WSs

Page 33: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Starting Diagnosis Upon Alarms

An alarm is rised The corresponding local diagnoser wakes up

The awaken

local diagnoser computes

local candidate diagnoses

Console L., Picardi C. and Theseider Dupré D. A Framework for Decentralized Qualitative Model-Based Diagnosis. Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), January 6-12 2007, Hyderabad , India .

(From WS-Diamond Project)

Page 34: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Local Candidate Diagnosis

A local candidate diagnosis contains three elements:

hypotheses on local behaviour

blames on other (input) services

consequences of hypotheses on other (output) services

(From WS-Diamond Project)

Page 35: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

The Role of the Supervisor

COLLECT local candidate

diagnoses

(From WS-Diamond Project)

Page 36: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

The Role of the Supervisor

QUESTION ask for blame explanation

The blamed local diagnosers extend candidate diagnoses

(From WS-Diamond Project)

Page 37: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

The Role of the supervisor

VALIDATE ask for

consequence validation

Thanks to the admissibility property, uneeded local diagnosers are not involved and diagnosis is restricted to the needed parts of the system

(From WS-Diamond Project)

Page 38: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Distributed diagnosability analysis

ADREAM 40

The same algorithm is used

LD Local observations Partial diagnoses

Partial fault mode Partial signature

Diagnosability analysis compares (partial) signatures for discriminability

Algorithm efficiency relies on avoiding as many comparisons as possible X. Pucel , S. Bocconi, C. Picardi, D. Theseider Dupre, L. Travé-Massuyès. Diagnosability analysis for web services with constraint-based models. 18th International Workshop on Principles of Diagnosis (DX'07) , Nashville ( USA ), May 29-31, 2007, pp. 360-367.

(From WS-Diamond Project)

Page 39: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Local diagnosability and accuracy

  Local diagnosability: a fault mode F is locally diagnosable in a subsystem MCi if it always results in a set / sequence of local observations such that we can diagnose F with certainty   F diagnosable in MCi F diagnosable in Msys

  Accuracy : the diagnosis of a subsystem MCi w.r.t. a fault mode F is accurate if it is as ambiguous as the global diagnosis   The local diagnosability degree of F is equal to the global

diagnosability degree

ADREAM 41 P.RIBOT , Y.PENCOLE , M.COMBACAU, Design requirements for the diagnosability of distributed discrete event systems, 19th International Workshop on Principles of Diagnosis (DX-08), Blue Mountains (Australie), 22-24 September 2008, pp.347-354.

Page 40: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

A design-oriented algorithm

  Cost related to sensor placement   CD : to make the subsystem diagnosable   CA : to make the diagnosis accurate

  Cost related to communication protocols   CM : induced by the diagnosis architecture

ADREAM 42

For every fault Fi, find the smallest subsystem which can be turned diagnosable and accurate at minimum cost.

Model of 3 component system : Γ1, Γ2, Γ3

Page 41: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Conclusions

  The ADREAM initiative is part of a growing scientific field

  There is space for numerous topics which cross over our fields of expertise

  The design, analysis and operation of networked systems as a whole call for transversal skills and should result in cross- fertilization

  Reliability and safety are at the core

ADREAM 43

Page 42: Diagnosis and situation assessment in self-adaptive ... · The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms

Diagnosis and situation assessment in self-adaptive networked systems

Louise Travé-Massuyès