39
Towards Self-Adaptive MAS Towards Self-Adaptive MAS Fault-Tolerant MAS Fault-Tolerant MAS Zahia Guessoum Zahia Guessoum OASIS OASIS ( ( O O bjects and bjects and A A gents for gents for S S imulation and imulation and I I nformation nformation S S ystems ystems ) ) LIP6 LIP6 ( ( L L aboratoire d' aboratoire d' I I nformatique de nformatique de P P aris 6 aris 6 ) ) [email protected] [email protected] http://www-poleia.lip6/~guessoum http://www-poleia.lip6/~guessoum OASIS

Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Embed Size (px)

Citation preview

Page 1: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Towards Self-Adaptive MASTowards Self-Adaptive MASFault-Tolerant MASFault-Tolerant MAS

Zahia GuessoumZahia Guessoum OASISOASIS ( (OObjects and bjects and AAgents for gents for SSimulation imulation

and and IInformation nformation SSystemsystems) ) LIP6LIP6 ( (LLaboratoire d'aboratoire d'IInformatique denformatique de PParis 6aris 6))

[email protected]@lip6.frhttp://www-poleia.lip6/~guessoumhttp://www-poleia.lip6/~guessoum

OASIS

Page 2: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -22--

OutlineOutlineOutlineOutline

1.1. Towards Self-Adaptive MASTowards Self-Adaptive MAS

2.2. Fault-Tolerant MAS Fault-Tolerant MAS MotivationsMotivations

Multi-agent Architecture Multi-agent Architecture

Agent Criticality Agent Criticality

Resources ManagementResources Management

Experiments Experiments

Conclusions and Future WorkConclusions and Future Work

Page 3: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -33--

Self-Adaptive MASSelf-Adaptive MASSelf-Adaptive MASSelf-Adaptive MAS

Design and control of dynamic and complex systemsDesign and control of dynamic and complex systems

Characteristics Characteristics Open, distributed, Open, distributed, large scalelarge scale Dynamic environmentDynamic environment Limited resourcesLimited resources Several usersSeveral users ……

Existing solutions: Existing solutions: adaptiveadaptive multi- multi-agentagent systems systems adaptation of the internal agent structure or behavioradaptation of the internal agent structure or behavior

A

A

AA AA

A

AA

Page 4: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -44--

Self-Adaptive MASSelf-Adaptive MASSelf-Adaptive MASSelf-Adaptive MAS

Limits of Limits of AdaptiveAdaptive Multi- Multi-AgentAgent Systems Systems emergence of global emergence of global undesirable behaviorsundesirable behaviors

Our Solution: Our Solution: Self-AdaptiveSelf-Adaptive Multi-Agent Multi-Agent Systems Systems [JFSMA’00][JFSMA’00] An intelligent system must be able to An intelligent system must be able to observe its own behaviorobserve its own behavior

[Pitrat 90][Pitrat 90] Monitor the system to detect, and when possible to anticipate, Monitor the system to detect, and when possible to anticipate,

undesirable behaviorsundesirable behaviors• Reification of the needed system’s aspects to detect or Reification of the needed system’s aspects to detect or

anticipate undesirable conditionsanticipate undesirable conditions• Examples: Examples: Interdependence graph, rolesInterdependence graph, roles

Page 5: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -55--

Self-Adaptive MASSelf-Adaptive MASSelf-Adaptive MASSelf-Adaptive MAS

AdaptationAdaptation Micro level (agent) Micro level (agent)

• How to adapt the structure How to adapt the structure and behavior of an agent and behavior of an agent according to the evolution according to the evolution of its environment?of its environment?

Macro level (organization)Macro level (organization)• How to detect, and when How to detect, and when

possible to anticipate, possible to anticipate, global undesirable global undesirable behaviors?behaviors?

Models and Architectures: Models and Architectures: Adaptive Agents and Adaptive Adaptive Agents and Adaptive Multi-Agent SystemsMulti-Agent Systems

Page 6: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -66--

Self-Adaptive MASSelf-Adaptive MASSelf-Adaptive MASSelf-Adaptive MAS

AdaptiveAdaptive

Multi-AgentMulti-Agent

Models Models

and Architecturesand Architectures

Software Engineering

Artificial Intelligence

Distributed Systems

• Reflective Architectures

• Learning techniques

• Ontologies

• MDA

• Replication

Page 7: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -77--

Main projectsMain projectsMain projectsMain projects

Research projectsResearch projects Adaptive Agent architecture Adaptive Agent architecture [Objet’98][IEEE’99] [AAMAS-AISB’04][ALAMAS’05][Objet’98][IEEE’99] [AAMAS-AISB’04][ALAMAS’05]

Self-Adaptive MAS Self-Adaptive MAS [JFIADSMA’00] [IEEE DS’04] [AAMAS’04][JFIADSMA’00] [IEEE DS’04] [AAMAS’04]

Fault-Tolerant MAS Fault-Tolerant MAS [SELMAS’03][AAMAS’03] [AAMAS’04] [SELMAS’05] [SELMAS’03][AAMAS’03] [AAMAS’04] [SELMAS’05]

Meta-DIMA: MDA-based multi-agent engineering methodology Meta-DIMA: MDA-based multi-agent engineering methodology [JFIADSMA’03] [JFIADSMA’03]

……

ApplicationsApplications Simulation of Economics Models (Firms and Organizational Forms) Simulation of Economics Models (Firms and Organizational Forms)

[AAMAS’04bis] [ALAMAS’05] [CEEMAS’O5][EA’05][AAMAS’04bis] [ALAMAS’05] [CEEMAS’O5][EA’05] COGents: Agent-Based Architecture For Numerical Simulation COGents: Agent-Based Architecture For Numerical Simulation [E-work’02] [E-work’02]

[ICAP’03][ICAP’03]

……

PhD Students PhD Students Lilia Rejeb, Ana B. Gonzalez, Othmane Nadjemi, Nora Faci, Tarek Jarraya, Lilia Rejeb, Ana B. Gonzalez, Othmane Nadjemi, Nora Faci, Tarek Jarraya,

Beiting Zhu, David Julien (PhD, Dec. 2004)Beiting Zhu, David Julien (PhD, Dec. 2004)

Page 8: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -88--

OutlineOutlineOutlineOutline

1.1. Fault-Tolerant MAS Fault-Tolerant MAS Team Team MotivationsMotivations

Multi-agent Architecture Multi-agent Architecture

Agent Criticality Agent Criticality

Resources ManagementResources Management

Implementation and Experiments Implementation and Experiments

Conclusions and Future WorkConclusions and Future Work

MAS MAS Z. Guessoum (LIP6)Z. Guessoum (LIP6) S. Aknine (LIP6)S. Aknine (LIP6) J-P Briot (CNRS, LIP6)J-P Briot (CNRS, LIP6) N. Faci (CReSTIC, Reims)N. Faci (CReSTIC, Reims) A. Suna-Elmeida (LIP6)A. Suna-Elmeida (LIP6) J. Malenfant (LIP6)J. Malenfant (LIP6)

Distributed Systems Distributed Systems P. Sens (LIP6 – INRIA)P. Sens (LIP6 – INRIA) M. Bertier (LIP6)M. Bertier (LIP6) O. Marin (LIP6)O. Marin (LIP6)

Page 9: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -99--

Fault-Tolerant MASFault-Tolerant MASFault-Tolerant MASFault-Tolerant MAS

Large-scale multi-agent systems Large-scale multi-agent systems Physically distributedPhysically distributed Limited resourcesLimited resources Dynamic environmentDynamic environment

Types of failuresTypes of failures Software (bugs, deadlocks, ...)Software (bugs, deadlocks, ...) Hardware (Network links, machines,...)Hardware (Network links, machines,...)

» How to avoid failures ?How to avoid failures ?

A

A

AA AA

A

AA

Page 10: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -1010--

Fault, Error, FailureFault, Error, FailureFault, Error, FailureFault, Error, Failure

A A failurefailure occurs when an occurs when an actual running system actual running system deviates from this specified deviates from this specified behavior. The cause of a behavior. The cause of a failure is called an failure is called an errorerror. An . An error represents an invalid error represents an invalid system state, one that is not system state, one that is not allowed by the system allowed by the system behavior specification. The behavior specification. The error itself is the result of a error itself is the result of a defect in the system or fault. defect in the system or fault.

A A faultfault is the root cause of a is the root cause of a failure. That means that an failure. That means that an error is merely the symptom of error is merely the symptom of a fault. A fault may not a fault. A fault may not necessarily result in an error necessarily result in an error

Page 11: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -1111--

Fault Classifications Fault Classifications Fault Classifications Fault Classifications

Based on how a failed component behaves once it has failed, faults Based on how a failed component behaves once it has failed, faults can be classified into 4 categories: crash, omission, timing or can be classified into 4 categories: crash, omission, timing or Byzantine.Byzantine. Crash faults: the component either completely stops operating or Crash faults: the component either completely stops operating or

never returns to a valid state; never returns to a valid state; Omission faults: the component completely fails to perform its service;Omission faults: the component completely fails to perform its service; Timing faults: the component does not complete its service on time; Timing faults: the component does not complete its service on time; Byzantine faults: these are faults of an arbitrary nature.Byzantine faults: these are faults of an arbitrary nature.

** ***

Page 12: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -1212--

ReplicationReplicationReplicationReplication

Existing solution: Replication strategiesExisting solution: Replication strategies Replication of data and/or computation is an effective way to Replication of data and/or computation is an effective way to

achieve fault tolerance in distributed systems. achieve fault tolerance in distributed systems. A replicated software component is defined as a software A replicated software component is defined as a software

component that possesses a representation on two or more component that possesses a representation on two or more hosts.hosts.

Distributed applications:Distributed applications: Small number of components Small number of components Component criticality is staticComponent criticality is static ……

The number of replicas and the replication strategy are explicitly The number of replicas and the replication strategy are explicitly and statically defined by the designer before run timeand statically defined by the designer before run time

Page 13: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -1313--

Agent ReplicationAgent ReplicationAgent ReplicationAgent Replication

Simple MAS: Simple MAS: Small number of agentsSmall number of agents Static organizational structuresStatic organizational structures ……

The agent criticality may be The agent criticality may be statically defined by the designer statically defined by the designer before run timebefore run time

Complex MAS: Complex MAS: Adaptive agentsAdaptive agents Large scaleLarge scale Dynamic and adaptive Dynamic and adaptive organizational structuresorganizational structures ……

The agent cThe agent criticality (the number of replicats and the replication riticality (the number of replicats and the replication strategy) cannot be explicitly and statically defined by the strategy) cannot be explicitly and statically defined by the designer before run timedesigner before run time

Page 14: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

DynamicallyDynamically and and AutomaticallyAutomatically apply replication mechanismsapply replication mechanisms

wherewhere (to which agents) (to which agents) and and whenwhen it is most needed. it is most needed.

DynamicallyDynamically and and AutomaticallyAutomatically apply replication mechanismsapply replication mechanisms

wherewhere (to which agents) (to which agents) and and whenwhen it is most needed. it is most needed.

Page 15: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -1515--

Dynamic ReplicationDynamic ReplicationDynamic ReplicationDynamic Replication

DarX: a new replication frameworkDarX: a new replication framework http://www-src.lip6.fr/darx/http://www-src.lip6.fr/darx/ Large-scale distributed systems Large-scale distributed systems Replication mechanismsReplication mechanisms

• Several replication strategies (active, passive, hybrid…)Several replication strategies (active, passive, hybrid…)

• Dynamic replication: change dynamically the number of replicas Dynamic replication: change dynamically the number of replicas and the replication strategyand the replication strategy

Observation mechanismsObservation mechanisms Fault detection/recovery mechanismsFault detection/recovery mechanisms Encapsulation of the system tasks into the replication groupEncapsulation of the system tasks into the replication group

• TTransparence of the replication regarding the other agentsransparence of the replication regarding the other agents

• RReplication mechanisms are not attached to the DarX servers, they eplication mechanisms are not attached to the DarX servers, they are attached to the replication groupsare attached to the replication groups

• ……

Page 16: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -1616--

Dynamic ReplicationDynamic ReplicationDynamic ReplicationDynamic Replication

• Replication Group (RG)

DARXLocation

DARXLocation

DARXLocation

A.r1 A.r2

Replication group A

A.l

Consistency information + Replication policy for group A

Strategy s1 Strategy s2

Page 17: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -1717--

Automatic ReplicationAutomatic ReplicationAutomatic ReplicationAutomatic Replication

Adaptive Replication Mechanism Adaptive Replication Mechanism Which agents need to be replication and when?Which agents need to be replication and when?What is the number of replicas?What is the number of replicas?Where?Where?

Page 18: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -1818--

Adaptive Control of ReplicationAdaptive Control of ReplicationAdaptive Control of ReplicationAdaptive Control of Replication

Hypothesis and principlesHypothesis and principles Automatic mechanisms Automatic mechanisms Some prior inputs from the designer of the applicationSome prior inputs from the designer of the application

Agents can be either reactive or deliberativeAgents can be either reactive or deliberative Agents can be heterogeneous Agents can be heterogeneous Agents communicate with some ACL (FIPA, …)Agents communicate with some ACL (FIPA, …)

Agent criticality relies on Semantic-level informationAgent criticality relies on Semantic-level information Roles Roles [Selmas’03] [AAMAS’02] [Selmas’03] [AAMAS’02] Interdependence graph [AAMAS’04 [Selmas’05]Interdependence graph [AAMAS’04 [Selmas’05] ……

Page 19: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -1919--

Multi-Agent Architecture Multi-Agent Architecture [AAMAS’04][AAMAS’04]Multi-Agent Architecture Multi-Agent Architecture [AAMAS’04][AAMAS’04]

Micro component (agents) + Macro component Micro component (agents) + Macro component (Interdependence graph)(Interdependence graph)

Page 20: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -2020--

Interdependence GraphInterdependence GraphInterdependence GraphInterdependence Graph

The analysis of an agent dependences allows to define The analysis of an agent dependences allows to define its its importance and the influence of its failure on the behavior and importance and the influence of its failure on the behavior and reliability of the multi-agent systemreliability of the multi-agent system..

The arcs are labeled by any information which is susceptible to The arcs are labeled by any information which is susceptible to enable the detection or anticipation of undesirable behaviors (failure enable the detection or anticipation of undesirable behaviors (failure of agents).of agents).

1

j

k

2

i

Agent_i

Agent_j

Agent_k

w12

More critical

Page 21: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -2121--

Multi-Agent ArchitectureMulti-Agent ArchitectureMulti-Agent ArchitectureMulti-Agent Architecture

Micro component (agents) + Macro component Micro component (agents) + Macro component (Interdependence graph) + Distributed Monitors(Interdependence graph) + Distributed Monitors

Page 22: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -2222--

Multi-Agent ArchitectureMulti-Agent ArchitectureMulti-Agent ArchitectureMulti-Agent Architecture

Agent 1 Agent 2

Agent 3 Agent 4

Monitor 1 Monitor 2

Monitor 3 Monitor 4

Age

nts

Lev

elO

bser

vati

on L

evel

Host-Monitor

Host_j

Host-Monitor

Host_i

node-2Adaptation algorithm

Page 23: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -2323--

Multi-Agent ArchitectureMulti-Agent ArchitectureMulti-Agent ArchitectureMulti-Agent Architecture

Domain Agents Domain Agents Represent the knowledge of the application domainRepresent the knowledge of the application domain May have a perception of the interdependence graphMay have a perception of the interdependence graph ……

MonitorMonitorii Observe the domain agent Observe the domain agent AgentAgentii Read the messages received from the host-monitorRead the messages received from the host-monitor Build/update the interdependences of Build/update the interdependences of AgentAgentii ( (NodeNodeii)) Compute Compute AgentAgentii criticality criticality

wwii= aggregation (wji = aggregation (wji j=1,m)j=1,m)) )

• wwjiijii: its interdependences with agent: its interdependences with agentjj

Inform the host-monitor of local important changesInform the host-monitor of local important changes ……

Page 24: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -2424--

Multi-Agent ArchitectureMulti-Agent ArchitectureMulti-Agent ArchitectureMulti-Agent Architecture

Interdependences AdaptationInterdependences Adaptation Algorithm 1: number of messages (or communication load)Algorithm 1: number of messages (or communication load)

Let NbMLet NbMij ij ((t) be the number of messages sent by t) be the number of messages sent by agentagentii to to

agentagentjj during some interval of time during some interval of time t t

Let NbM be the average number of messages between couples Let NbM be the average number of messages between couples of agents (i,j)of agents (i,j)

wwijij(t + (t + t)= wt)= wijij(t) + (NbM(t) + (NbMij ij ((t) – NbM (t) – NbM (t))/ NbM (t))/ NbM (t)t)

Page 25: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -2525--

Interdependences AdaptationInterdependences Adaptation Algorithm 2:performatives of messagesAlgorithm 2:performatives of messages [M.Colombetti and M.Verdicchio][M.Colombetti and M.Verdicchio]

M.Colombetti and M.Verdicchio proposed six classesM.Colombetti and M.Verdicchio proposed six classes• class 1 =request, query-if, query-ref, subscribe• class 2 = inform, inform-done, inform-ref• class 3 = cfp, propose• class 4 = reject-proposal, refuse, cancel• class 5 = accept-proposal, agree• class 6 = not-understood, failure.

Let Let mєSmєSijij((t) the set of messages t) the set of messages by by agentagentii to to agentagentjj during some during some interval of time interval of time t t

Let WMLet WMij ij be be ∑ ∑ mєSij(mєSij(t)t) weight(m) weight(m)

Let WM (Let WM (t) be the average sum of weight of messages between t) be the average sum of weight of messages between couples of agents (i,j)couples of agents (i,j)

wwijij(t + (t + t)= wt)= wijij(t) + (WM(t) + (WMij ij ((t) - WM(t) - WM(t)) / WM(t)) / WM(t)t)

Page 26: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -2626--

Multi-Agent ArchitectureMulti-Agent ArchitectureMulti-Agent ArchitectureMulti-Agent Architecture

MonitorsMonitors

Domain Agents

Agent’sCriticality

Activity Analysis

Interdependence Analysis & Role

SystemStatistics

InteractionEvents

Darx Server

Observation Replication

Page 27: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -2727--

Multi-Agent ArchitectureMulti-Agent ArchitectureMulti-Agent ArchitectureMulti-Agent Architecture

Host-MonitorsHost-Monitors Build global informationBuild global information

• Read messages received from the monitorsRead messages received from the monitors

• Update local statistics which define aggregation of the host-Update local statistics which define aggregation of the host-monitors parametersmonitors parameters

• Send the new parameters to the agent monitors of the local hostSend the new parameters to the agent monitors of the local host

• Send to the other host monitors the observed parameters which Send to the other host monitors the observed parameters which have significantly changed.have significantly changed.

Resource managementResource management• Allocate replicas to agentsAllocate replicas to agents

Page 28: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -2828--

Resource ManagementResource ManagementResource ManagementResource Management

Number of replicasNumber of replicas An agent is replicated according to:An agent is replicated according to:

• wwii: its criticality: its criticality

• W: the sum of the domain agents' criticalityW: the sum of the domain agents' criticality

• rrmm: the minimum number of replicas, it is introduced by the : the minimum number of replicas, it is introduced by the

designerdesigner

• RRmm: the maximum number of possible simultaneous replicas: the maximum number of possible simultaneous replicas

nbnbii = rounded ( r = rounded ( rmm + w + wii * R * Rmm / W) / W)

Problem: Problem: All the resources are considered as similar. For instance, the All the resources are considered as similar. For instance, the

failure rate of a host is not considered.failure rate of a host is not considered. The hosts are not easy to chooseThe hosts are not easy to choose

Page 29: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -2929--

Resource ManagementResource ManagementResource ManagementResource Management

Our Solution: Economic Model Our Solution: Economic Model Resource cost, budget, and negotiation behaviors of the host-monitorsResource cost, budget, and negotiation behaviors of the host-monitorsto allocate resources.to allocate resources.

CostCost of a resource of a resource CMCMii(t) = CM(t) = CMii(t0) *(1-pp(t0) *(1-ppii(t))(t))

• ppppii(t) is the failure probability of (t) is the failure probability of hosthostii, at time t. , at time t. • CMCMii(t0) is the initial cost of (t0) is the initial cost of hosthostii

• ppppii(t0)=0.(t0)=0.

The The budgetbudget is based on the criticality is based on the criticality Bj(t)=Wj(t) *CM(t) /W(t)Bj(t)=Wj(t) *CM(t) /W(t)

• W(t) =W(t) =∑∑i=1,ni=1,n W Wii(t)(t)• CM(t) = CM(t) = ∑∑i=1,mi=1,m CM CMii(t) * Nb(t) * Nbii

• where n is the number of agents, m is the number of hosts and Nbi where n is the number of agents, m is the number of hosts and Nbi the number of resources of the number of resources of hostihosti

What is the number of replicats and where (which hosts) ?What is the number of replicats and where (which hosts) ?• If Bj(t +If Bj(t +t) > t) > Bj(t +Bj(t +t) then allocate new resourcest) then allocate new resources

– Simple negotiation between Host-MonitorsSimple negotiation between Host-Monitors• If Bj(t +If Bj(t +t) < t) < Bj(t +Bj(t +t) then cancel some allocated resourcest) then cancel some allocated resources

Page 30: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -3030--

Resource ManagementResource ManagementResource ManagementResource Management

Contract Net ProtocolContract Net Protocol Initiator: Host-MonitorInitiator: Host-Monitor Participant: other Host_MonitorsParticipant: other Host_Monitors

Evaluation criteriaEvaluation criteria Communication time between the two hosts Communication time between the two hosts Resource costResource cost

Host-Monitor Host-Monitor

Call for proposal

Accept proposal

Reject proposal

propose

Agent-Monitor

request

Page 31: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -3131--

Implementation Implementation Implementation Implementation

DimaX: A Fault-Tolerant Multi-Agent PlatformDimaX: A Fault-Tolerant Multi-Agent Platform Various services (naming service, fault detection, replication, …)Various services (naming service, fault detection, replication, …) Agent monitors and host-monitorsAgent monitors and host-monitors ……

Agents

Adaptor

Replication

Failure Detection (FD)

AdaptiveReplicationControl

Observation

DIMA

Naming/Localization

DarX

Page 32: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -3232--

ExperimentsExperimentsExperimentsExperiments

Example: Personal assistant Example: Personal assistant agentsagents Interact with the user to receive Interact with the user to receive

their meeting requests and their meeting requests and associated information (a title, a associated information (a title, a description, possible dates, description, possible dates, participants, priority, etc.) ,participants, priority, etc.) ,

Interact with the other agents of Interact with the other agents of the system to schedule meetings. the system to schedule meetings.

Page 33: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -3333--

ExperimentsExperimentsExperimentsExperiments

Monitoring costMonitoring cost N (100, ... 250) agentsN (100, ... 250) agents N/20 hostsN/20 hosts

Two kinds of experiments Two kinds of experiments • without monitoringwithout monitoring

• with monitoringwith monitoring– With Algorithm 1With Algorithm 1

– With Algorithm 2With Algorithm 2

Page 34: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -3434--

ExperimentsExperimentsExperimentsExperiments

Monitoring costMonitoring cost

Page 35: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -3535--

ExperimentsExperimentsExperimentsExperiments

Experiments Experiments Previous protocolPrevious protocol Periods of monitoring (500, 1500, 2500)Periods of monitoring (500, 1500, 2500)

Page 36: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -3636--

ExperimentsExperimentsExperimentsExperiments

RobustnessRobustness 100 agents on 10 machines100 agents on 10 machines Failure simulator: randomly stops the thread of an agentFailure simulator: randomly stops the thread of an agent ScenarioScenario

• 50 meetings50 meetings

• Goal of the MAS: Schedule the 50 meetingsGoal of the MAS: Schedule the 50 meetings Rate of successful simulationsRate of successful simulations

• Number of simulations which did not fail / total number of simulations

4 replication approaches4 replication approaches• RandomRandom

• RolesRoles

• Algorithm 1: Number of messagesAlgorithm 1: Number of messages

• Algorithm 2: PerformativesAlgorithm 2: Performatives

Page 37: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -3737--

ExperimentsExperimentsExperimentsExperiments

RobustnessRobustness

Page 38: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -3838--

Conclusions and Future WorkConclusions and Future WorkConclusions and Future WorkConclusions and Future Work

A new fault-tolerant multi-agent platform (DimaX)A new fault-tolerant multi-agent platform (DimaX) Based on DIMA and DarXBased on DIMA and DarX A new approach to evaluate dynamically the criticality of agentsA new approach to evaluate dynamically the criticality of agents

• Small applications have been developed (meetings scheduling …)Small applications have been developed (meetings scheduling …)

Algorithms to define interdependenceAlgorithms to define interdependence MessagesMessages ACL messages ACL messages Domain task dependencesDomain task dependences

Other categories of faultsOther categories of faults Timing, Byzantine (Master of Parjineh)Timing, Byzantine (Master of Parjineh)

More experimentsMore experiments To validate the proposed approach To validate the proposed approach To better identify:To better identify:

• the potential target application domains (load balancing …)the potential target application domains (load balancing …)• the domains for which the approach is not suitedthe domains for which the approach is not suited

Page 39: Towards Self-Adaptive MAS Fault-Tolerant MAS Zahia Guessoum OASIS ( Objects and Agents for Simulation and Information Systems ) LIP6 (Laboratoire d'Informatique

Zahia GuessoumZahia Guessoum - -3939--

Related publicationsRelated publications(see www-poleia.lip6.fr/~guessoum/Papers.html)(see www-poleia.lip6.fr/~guessoum/Papers.html)

Related publicationsRelated publications(see www-poleia.lip6.fr/~guessoum/Papers.html)(see www-poleia.lip6.fr/~guessoum/Papers.html)

- Z. GuessoumZ. Guessoum, N. Faci and J.-P. Briot. Adaptive Replication of Large-Scale , N. Faci and J.-P. Briot. Adaptive Replication of Large-Scale Multi-Agent Systems - Towards a Fault-Tolerant Multi-Agent Platform, In Multi-Agent Systems - Towards a Fault-Tolerant Multi-Agent Platform, In proc. ICSE'02, 4th International Workshop on Software Engineering for proc. ICSE'02, 4th International Workshop on Software Engineering for Large-Scale Multi-Agent Systems (SELMAS'02), to appear in ACM, Saint-Large-Scale Multi-Agent Systems (SELMAS'02), to appear in ACM, Saint-Louis (US), May 2005. Louis (US), May 2005.

- Z. GuessoumZ. Guessoum, M. Ziane, N. Faci, Monitoring and Organizational-Level , M. Ziane, N. Faci, Monitoring and Organizational-Level Adaptation of Multi-Agent Systems, Third International Joint Conference Adaptation of Multi-Agent Systems, Third International Joint Conference on Autonomous Agents and Multi-Agents Systems (AAMAS’04), ACM, pp. on Autonomous Agents and Multi-Agents Systems (AAMAS’04), ACM, pp. 514-522, New York City, July 2004.514-522, New York City, July 2004.

- Z. GuessoumZ. Guessoum, J.-P. Briot, O. Marin, A. Hamel and P. Sens. Dynamic and , J.-P. Briot, O. Marin, A. Hamel and P. Sens. Dynamic and Adaptative Replication for Large-Scale Reliable Multi-Agent Systems. In Adaptative Replication for Large-Scale Reliable Multi-Agent Systems. In Software Engineering for Large-Scale Multi-Agent Systems, Alessandro Software Engineering for Large-Scale Multi-Agent Systems, Alessandro Fabricio Garcia (ed.), LNCS 2603, May, 2003. Fabricio Garcia (ed.), LNCS 2603, May, 2003.

- Z. GuessoumZ. Guessoum, J.-P. Briot, S. Charpentier, O. Marin and P. Sens. A Fault-, J.-P. Briot, S. Charpentier, O. Marin and P. Sens. A Fault-Tolerant MultiAgent Framework, AAMAS 2002, July 15-19, 2002, Tolerant MultiAgent Framework, AAMAS 2002, July 15-19, 2002, Proceedings pp. 672-673. ACM 2002. Proceedings pp. 672-673. ACM 2002.

- O. Marin, P. Sens, J.-P. Briot and O. Marin, P. Sens, J.-P. Briot and Z. GuessoumZ. Guessoum. Towards Adaptive Fault-. Towards Adaptive Fault-Tolerance for Distributed Multi-Agent Systems'‘, ERSADS'2001, Bertinoro, Tolerance for Distributed Multi-Agent Systems'‘, ERSADS'2001, Bertinoro, Italy, May 2001.Italy, May 2001.