Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Diagnosis and situation assessment in self-adaptive networked systems
Louise Travé-Massuyès
ADREAM
ADREAM 2
Modelling Mathematical models
Optimization Optimisation tools for sizing, routing, scheduling
Control Advanced control, diagnosis and supervision laws
Which topics for MOCOSY ? What does diagnosis bring in ?
Massive networked objects and artefacts
ADREAM 3
Network
How the local interactions and the network structure influence the properties of the global emerging system is still not completely understood.
Survey of scientific topics arising from networked systems
Illustration with some on-going research activities
Focus on how diagnosis applies to the field
ADREAM 4
Diagnosis : an observation task
DIAGNOSIS from ancient greek : διαγνωστικός, diágnostikos
δια-, dia- : apart-split +
γνω̃σις, gnosis : knowledge
Monitoring the system Isolating and identifying faults / characteristic situations Goal : maintaining the system’s vital functions
The process of identifying or determining the nature and root cause of a failure, problem, or disease from the symptoms arising from selected observations,
checks or tests.
5
Principles of Model based Diagnosis
Physical system
System model
Observed behaviour
Predicted behaviour
Comparison
Fault detection: is there a fault ? Fault isolation: where is the fault ? Fault identification: what is the fault ?
ADREAM
Diagnosis : more formally
Fault detection: trigger an alarm when SDnorm ∪ OBS is inconsistent
Diagnosis: find ∆, a representation of the faulty situation, consistent with the observations SD ∪ OBS ∪ ∆ is satisfiable Fault isolation: ∆ = {AB(c)⏐ c ∈ Δ} ∪ {¬AB(c)⏐ c ∈ COMPS \ Δ} Fault identification: {mi(c)⏐c ∈ COMPS}
ADREAM 6
Given a model of the system SD Given a set / sequence of observations OBS
Diagnoses
In practice, given the observations OBS, several diagnoses ∆i are possible:
Diag(SD, OBS) = U ∆i
ADREAM 7
One diagnosis only: certainty Several diagnoses: ambiguity
Additional observations are necessary to resolve the ambiguity
Diagnosability: about ambiguity
f1
f2 Sig(f2)
OBS2
OBS1
Sig(f1)
Diagnosability is the capability of a system and its monitors to exhibit different observables for different anticipated faulty situations.
Diagnosability is a property to be checked It provides the formal guarantee that an anticipated fault can always be
diagnosed There are several formal definitions according to the modeling formalism
for SD
Diagnosability: about ambiguity
f1
f2
Sig(f1) OBS1
Sig(f2) OBS2
∩ = ∅
Sig(f1)
Diagnosability is the capability of a system and its monitors to exhibit different observables for different anticipated faulty situations.
Diagnosability is a property to be checked It provides the formal guarantee that an anticipated fault can always be
diagnosed There are several formal definitions according to the modeling formalism
for SD
Diagnosability: about ambiguity
f1
f2
Sig(f1) OBS1
Sig(f2) OBS2
∩ = ∅
OBS1
Sig(f1) f1 f2
f6 f5
f4
f3 f7 f8
OBS
Diagnosability is the capability of a system and its monitors to exhibit different observables for different anticipated faulty situations.
Diagnosability is a property to be checked It provides the formal guarantee that an anticipated fault can always be
diagnosed There are several formal definitions according to the modeling formalism
for SD
Network and artefacts
ADREAM 11
Network Control / Observation of networks
Control / Observation over networks
Multi-agent systems
Control / Observation of networks
ADREAM 12
Network
Control / Observation of networks
Ressource allocation problems Call admission Scheduling Routing
Theoretical understanding of network congestion control Mathematical models for flow control under various protocols Fluid flow models for analysis and design, possibly including
effects of time delays and nonlinearities Scalable and distributed optimization control algorithms
ADREAM 13
Provide QoS while achieving efficient and fair utilization of network ressources
Multiple time-delay system modeling and control for router management
TCP (Transfert Control Protocol) : an end-to-end congestion control mechanism
Upon receipt of an ack (or not), the source increases (or decreases) its sending rate (AIMD algorithm : Additive Increase Multiplicative Decrease)
When buffer overflows, packets are dropped
ADREAM 14 Y.ARIBA , F.GOUAISBAUT , Y.LABIT, Feedback control for router management and TCP/IP network stability, IEEE Transactions on Network and Service Management, Vol.6/4, December 2009.
Multiple time-delay system modeling and control for router management
ADREAM 15
Formulating a fluid-flow model, use the non-linear multiple time delay systems theory to design an AQM (Active Queue Management Mechanism)
Control law for the dropping probability :
xi = rate of source i (pkts/s) b = queue length (pkts) τi = round trip time (RTT) (s) τi
f = forward delay (s) τi
b = backward delay (s) C = link capacity (pkt/s) N = number of TCP connections pi = dropping probability
Y.ARIBA , F.GOUAISBAUT , Y.LABIT, Feedback control for router management and TCP/IP network stability, IEEE Transactions on Network and Service Management, Vol.6/4, December 2009.
Traffic monitoring and CBR anomaly detection
ADREAM 16
Formulating a fluid-flow model, use the non-linear multiple time delay systems theory to design an AQM (Active Queue Management Mechanism)
xi = rate of source i (pkts/s) b = queue length (pkts) τi = round trip time (RTT) (s) τi
f = forward delay (s) τi
b = backward delay (s) C = link capacity (pkt/s) N = number of TCP connections pi = dropping probability
+ d(t)
CBR anomaly represents a flooding attack. It is modeled as a piece-wise constant function d(t)
S.RAHME , Y.LABIT , F.GOUAISBAUT, An unknown input sliding observer for anomaly detection in TCP/IP network, International Conference on Ultra Modern Telecommunications, ICUMT 2009, Saint Petersbourg (Russie), 12-14 Octobre 2009.
Traffic monitoring and CBR anomaly detection
ADREAM 17
Formulating a fluid-flow model, use the non-linear multiple time delay systems theory to design an AQM (Active Queue Management Mechanism)
+ d(t)
CBR anomaly represents a flooding attack. It is modeled as a piece-wise constant function d(t)
An unknown input observer based approch allows us to detect and identify the malicious flow.
S.RAHME , Y.LABIT , F.GOUAISBAUT, An unknown input sliding observer for anomaly detection in TCP/IP network, International Conference on Ultra Modern Telecommunications, ICUMT 2009, Saint Petersbourg (Russie), 12-14 Octobre 2009.
Transport service self-adaptation through micro-protocol composition
ADREAM 18
In standard transport services, mechanisms offering different functionalities are merged within the same monolithic implementation.
Component-based composable transport services
Congestion control & Partial Reliability TFRC & PR
Van Wambeke N, Armando F, Chassot C, Exposito E. A model-based approach for self-adaptive Transport protocols. Elsevier Computer Communications, Special issue on end-to-end support over heterogeneous wired and wireless network, vol. 31, n°11, July 2008, pp. 2699-2705.
Chronicle based situation assessment for self-adapting strategies
ADREAM 19
Relevant situations generally express as temporally constrained event patterns chronicles
Chronicle for packet_loss
Self-adapting strategy from TFRC to TD-TFRC
Chronicle for agreement
A. SUBIAS, E.EXPOSITO, C. CHASSOT, L. TRAVE-MASSUYES, K. DRIRA, Self-adapting Strategies guided by Diagnosis and Situation Assessment in Collaborative Communicating Systems", Submission to 21st International Workshop on Principles of Diagnosis (DX-10), Portland (USA), October 2010.
Distributed Load Balancing and Game Theory
Dispatchers take independent decisions to minimize:
Processor sharing servers
• Objective : compare distributed decision making with globally optimal solution
• Result : distributed solution is at most √K worse than the global optimum
U.AYESTA , O.BRUN , B.PRABHU, Price of Anarchy in Non-Cooperative Load Balancing, 29th Annual International Conference on Computer Communications (IEEE INFOCOM 2010), San Diego (USA), 15-19 Mars 2010, 6p.
Long experience research in Telecommunications
End-to-End Network Simulation Differential Traffic Theory & Hybrid Simulation (International Patent) Stochastic Models of Population behaviour and multimedia traffic sources … Queueing models for Data Center Simulation
Optimisation Topology Design for resilient networks (access and backbone) Capacity Planning Optimisation of Internet Routing Protocols Traffic Engineering and Quality of Service (MPLS)
The SpinOff Company QoS Design (www.qosdesign.com) Founders : 3 researchers from LAAS-CNRS National Awards : The company was awarded 4 times Products: NEST, a software suite for the Simulation/Planning/Supervision of Next
Generation Networks WorldWide Market : Telecom operators, Entreprise with large scale WANs,
Datacenters Partners/Customers : SFR, Vodafone, Alcatel, Maroc Telecom, British Telecom,
French Defense, NextiraOne, EADS-DS …
Control / Observation over networks
ADREAM 24
General architecture
ADREAM
Actuators
Process
Sensors
Process
Actuators
Process
Sensors
A
S
A
S
Controller Network
Time delays
Packet loss
Message errors
Congestion Collision Medium
disturbancies
Finite capacity links which may suffer disturbancies
Influence of the network QoS on the controller : towards co-design
ADREAM 26
QoC (Quality of Control) depends on the QoS (Quality of Service) provided by the network
Control application properties (stability, response time, …) are obviously dependent on the network QoS
Network ressource allocation may be dynamically tuned to the application needs
The idea is to adapt the flow or message priorities as a (non linear) function of the control application performances in time
Co-design strategies: the Hybrid Priority scheme for CAN networks MAC (Medium Access Control) layer: message scheduling Hybrid Priority Scheme
static priority for a flow dynamic priority for the messages of a flow
ADREAM 27
Now extending to wireless networks and
the Network layer of Mesh networks
X.NGUYEN , G.JUANOLE , G.MOUNEY , C.CALMETTES, Networked Control System (NCS) on a network CAN: on the Quality of Service (QoS) and quality of Control (QoC) provided by different message scheduling schemes based on hybrid priorities, International Workshop on Factory Communication Systems (WFCS 2010), Nancy (France), 18-21 Mai 2010, pp.261-270
The hybrid priority scheme applied to fault detection
ADREAM 28
Output error εk=yk-ŷk
Residual rk=Tεk
εk+1= (Φ-LC)εk+ Φfk + ΦBΔukτk
The Hybrid Priority Scheme reduces the false
alarm rate
G.JUANOLE , G.MOUNEY , D.SAUTER , C.AUBRUN , C.CALMETTES, Decision Making Improvement for Diagnosis in Networked Control Systems based on Dynamic Message Scheduling, 18th Mediterranean Conf. on Control and Automation MED 2010, Marrakech, June 2010.
Multi-Agent Systems (Diagnosis for)
ADREAM 29
Diagnostic architectures
ADREAM 30
N N
Number of diagnoses
Remember that, given the observations OBS, several diagnoses ∆i are possible:
Diag(SD, OBS) = U ∆i
ADREAM 31
The model constraints the possible diagnoses MCi ⊆ Msys
ΙDiag(MCi,OBS)Ι ≥ ΙDiag(Msys,OBS)Ι
The observations constraint the possible diagnoses Oi ⊆ O ΙDiag(Msys,Oi)Ι ≥ ΙDiag(Msys,O)Ι
Coordinated diagnosis
Each agent i knows the global model Msys but has only partial observations Oi
The coordinator Knows the global model Msys and the observability
of each agent Recombines the diagnosis candidates based on
the global model
ADREAM 32
Decentralized / Distributed diagnosis
Compute local diagnoses with the local model MCi and the local observations Oi Get Diag(MCi,Oi), i=1, …, N Local diagnoses are locally consistent
Compute global diagnoses from local diagnoses Account for the constraints of the adjacent local models and
for other observations Get Diag(Msys,O) Global diagnoses are globally consistent
ADREAM 33
Decentralized case: the computation of global diagnoses is orchestrated by a supervisor Distributed case: the computation of global diagnoses is achieved by communication means
WS-DIAMOND: Web Services DIAgnosability, MONitoring and Diagnosis (IST 2005-2008)
WS Description
Workflow +
Data dependencies
Component Oriented Model
Components +
Structure
Constraint Model
Set of constraints
Structural model
Activity 4 model
M4 : Mode variable
O1, O2 : Input variables
Y, Z : Output variables
Diagnosis purpose WS-DIAMOND TEAM. WS-DIAMOND: Web Services – DIAgnosability, MONitoring
and Diagnosis, « At your service: An overview of results of projects in the field of service engineering of the IST programme » MIT Press Series on Information Systems, Chapter 9, J.Mylopoulos and M.Papazoglou (Eds.), 2009.
Modeling WSs
Starting Diagnosis Upon Alarms
An alarm is rised The corresponding local diagnoser wakes up
The awaken
local diagnoser computes
local candidate diagnoses
Console L., Picardi C. and Theseider Dupré D. A Framework for Decentralized Qualitative Model-Based Diagnosis. Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), January 6-12 2007, Hyderabad , India .
(From WS-Diamond Project)
Local Candidate Diagnosis
A local candidate diagnosis contains three elements:
hypotheses on local behaviour
blames on other (input) services
consequences of hypotheses on other (output) services
(From WS-Diamond Project)
The Role of the Supervisor
COLLECT local candidate
diagnoses
(From WS-Diamond Project)
The Role of the Supervisor
QUESTION ask for blame explanation
The blamed local diagnosers extend candidate diagnoses
(From WS-Diamond Project)
The Role of the supervisor
VALIDATE ask for
consequence validation
Thanks to the admissibility property, uneeded local diagnosers are not involved and diagnosis is restricted to the needed parts of the system
(From WS-Diamond Project)
Distributed diagnosability analysis
ADREAM 40
The same algorithm is used
LD Local observations Partial diagnoses
Partial fault mode Partial signature
Diagnosability analysis compares (partial) signatures for discriminability
Algorithm efficiency relies on avoiding as many comparisons as possible X. Pucel , S. Bocconi, C. Picardi, D. Theseider Dupre, L. Travé-Massuyès. Diagnosability analysis for web services with constraint-based models. 18th International Workshop on Principles of Diagnosis (DX'07) , Nashville ( USA ), May 29-31, 2007, pp. 360-367.
(From WS-Diamond Project)
Local diagnosability and accuracy
Local diagnosability: a fault mode F is locally diagnosable in a subsystem MCi if it always results in a set / sequence of local observations such that we can diagnose F with certainty F diagnosable in MCi F diagnosable in Msys
Accuracy : the diagnosis of a subsystem MCi w.r.t. a fault mode F is accurate if it is as ambiguous as the global diagnosis The local diagnosability degree of F is equal to the global
diagnosability degree
ADREAM 41 P.RIBOT , Y.PENCOLE , M.COMBACAU, Design requirements for the diagnosability of distributed discrete event systems, 19th International Workshop on Principles of Diagnosis (DX-08), Blue Mountains (Australie), 22-24 September 2008, pp.347-354.
A design-oriented algorithm
Cost related to sensor placement CD : to make the subsystem diagnosable CA : to make the diagnosis accurate
Cost related to communication protocols CM : induced by the diagnosis architecture
ADREAM 42
For every fault Fi, find the smallest subsystem which can be turned diagnosable and accurate at minimum cost.
Model of 3 component system : Γ1, Γ2, Γ3
Conclusions
The ADREAM initiative is part of a growing scientific field
There is space for numerous topics which cross over our fields of expertise
The design, analysis and operation of networked systems as a whole call for transversal skills and should result in cross- fertilization
Reliability and safety are at the core
ADREAM 43
Diagnosis and situation assessment in self-adaptive networked systems
Louise Travé-Massuyès