An adaptive and reliable system based on interdependence between agents

An Adaptive and Reliable System Based on Interdependence Between Agents

AKIFUMI TANIMOTO, KEINOSUKE MATSUMOTO, and NAOKI MORIOsaka Prefecture University, Japan

SUMMARY

A multiagent system (MAS) has recently gained pub-lic attention as a technique to solve competition and coop-eration in distributed systems. However, MAS’svulnerability due to the propagation of failures prevents itsapplication to a large-scale system. This paper proposes ageneral composition technique to improve its reliabilityeasily applied to the existent MAS. The proposed systemmonitors messages between agents to detect undesirablebehaviors (failures). Collecting related information, thesystem generates global information of interdependencebetween agents and expresses it in a graph. This interde-pendence graph enables us to detect or predict undesirablebehaviors. This paper also shows that the system can opti-mize performance of MAS and improve adaptively itsreliability under complicated and dynamic environment byapplying the global information acquired from analysis ofthe interdependence graph to a replication system. © 2008Wiley Periodicals, Inc. Electr Eng Jpn, 164(1): 62–68,2008; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/eej.20517

Key words: reliability; robust; interdependencegraph; monitoring; replication; intelligent system.

1. Introduction

In a ubiquitous network society, intelligent homeappliances and other devices and systems in our everydayenvironment would provide services via cooperation witheach other. Such cooperation is supported by a technologycalled multiagent systems (MAS) [1]. For the moment,however, operational multiagent systems are restricted tosmall scale because of vulnerability to fault propagation.The reliability of large-scale multiagent systems must beimproved in order to build an infrastructure for a ubiquitousnetwork society. In this context, we propose here a general

configuration that can easily improve the reliability ofexisting multiagent systems.

The proposed approach monitors messages ex-changed among agents to detect faults or other unexpectedevents. This allows the production of a global picture rep-resented by agent interdependence graphs using relateddata, the detection and prediction of undesired states, andthe acquisition of other information. Using global resultsobtained by the analysis of such interdependence graphs ina replication system [2] could provide multiagent systemswith sufficient reliability and adaptability to complex dy-namic environments.

In a replication system, multiple copies of the samecontent (replicas) are distributed over a network, which isan efficient and easy way to improve the fault tolerance ofdistributed systems without major changes to existing code.In such systems, fault-tolerant applications are imple-mented by providing replicas to replace faulty agents. How-ever, the replication cost increases with the number ofreplicas, which is not suitable for a large-scale environment.

This study considers a multiagent system that canadapt dynamically to a changing environment. In particular,it proposes a dynamic application system to improve per-formance while reducing replication cost, as well as pre-senting a valid replication policy.

2. Problems of Previous Methods

Previous monitoring methods

Monitoring is a technique used to design multiagentsystems with better reliability in case of faults or otherunexpected events. Several monitoring methods have beenproposed [3–5], but they have the following problems:

Immune networks [3] are provided with self-diagno-sis, thus relying on agents’ local knowledge, but fault toler-ance and performance are determined by network design;hence such systems are too complicated.

A monitoring method proposed by Kaminka andcolleagues [4] aims at identification of state inconsistencies

© 2008 Wiley Periodicals, Inc.

Electrical Engineering in Japan, Vol. 164, No. 1, 2008Translated from Denki Gakkai Ronbunshi, Vol. 126-C, No. 4, April 2006, pp. 451–456

62

using a procedural plan recognition model. However, thisapproach deals with static plans while assuming a closedsystem, and thus does not support adaptation to environ-mental changes.

In addition, Horling and colleagues proposed amethod of diagnosis of distributed systems by using a faultmodel [5]. In this method, performance is optimized locallybut not globally, and therefore this method is not applicableto large-scale real-time multiagent systems operated in anopen environment where agents are added and deleted. Inaddition, just as in the case of immune networks, the resultsdepend on the agents’ local knowledge, and monitoring ispossible only on the agent level.

3. Monitoring Based on Interdependence Graphs

In this section, we introduce the interdependencegraph, a data model to express communications betweenagents. We then propose an agent monitoring architecturebased on this model, and present an algorithm to calculateinterdependence graphs.

A method of replication control using the interde-pendence graphs calculated by the algorithm will be pre-sented in Section 4.

3.1 Interdependence graph

As seen in Fig. 1, an interdependence graph showsthe agents in a domain by means of nodes with labeling (N,L, W). Here N is the node set, L is the link set, and W is thelabel set:

Here n is the number of nodes, Li,j denotes a link from nodeNi to node Nj, and Wi,j is the labeled weight of Li,j (a realnumber). The weight Wi,j expresses the importance of in-terdependence between the corresponding agents i and j.

3.2 Monitoring architecture

Monitoring involves two processes: acquisition ofinformation to update the interdependence graph, and graphanalysis to control the domain agents. Such informationmay include, for example, the communications load, theagent’s roles, etc.

Most existing multiagent architectures have a central-ized monitoring mechanism, and the information acquiredvia this mechanism is used off-line for analysis or correc-tion of system behavior. However, in a centralized monitor-ing architecture, acquired information cannot providereal-time adaptation to environmental changes as requiredin large-scale complex systems. Thus, we propose a moni-toring architecture with a distributed monitoring mecha-nism for real-time adaptation to the changing environmentof large-scale complex systems, as shown in Fig. 2.

This distributed monitoring mechanism supportsmultiagent architecture that can respond adaptively to en-vironmental changes. The agents involved in monitoringlevel play the following two roles:

• Supervision and control of domain agents• Generation of global information

These roles are assigned to agents of two types: monitoringagents, which supervise domain agents, and a host moni-toring agent, which manages the monitoring agents. Moni-

(1)(2)(3)

Fig. 1. Interdependence graph. Fig. 2. Monitoring architecture.

63

toring agents are assigned to every domain agent, and a hostmonitoring agent is assigned to all monitoring agents. Inthis monitoring system, there are three agent layers. Everymonitoring agent communicates only with one host moni-toring agent, thus sending information obtained by moni-toring. The host monitoring agent collects informationacquired by individual monitoring agents and arrangesthese pieces of information in the form of global informa-tion (total number of messages, amount of data exchange,etc.).

Monitoring agents revise the interdependence graphaccording to various changes of the domain agents. Sincethe multiagent environment changes in real time, such aninterdependence graph is not static. Instead, it is updateddynamically when domain agents are added or removed.For instance, when a new domain agent is added andcommunicates with existing domain agents, the monitoringagents responsible for the existing domain agents recognizethe new domain agent and report it to the host monitoringagent. As a result, the host monitoring agent assigns amonitoring agent to the new domain agent.

3.3 Weight updating algorithm

The following indicators are used for weight updat-ing:

• Monitoring time interval ∆t• Communications load Q(∆t):

• Number of sent messages NM(∆t):

Qi,j(∆t) and NMi,j(∆t) are, respectively, the communicationsload and the number of messages from agent i to agent j inthe interval ∆t; operator denotes a set operator (the averag-ing operation is usually used).

Below we show an outline of an adaptive algorithmto update the weights Wi,j of the interdependence graph.This algorithm is executed by every monitoring agent inorder to manage the corresponding nodes, and the resultsare accumulated by the host monitoring agent for incorpo-ration into the interdependence graph.

Execution steps:(1) The following procedure is repeated for agent j (j

≠ i).(2) Evaluation of Eqs. (6), (7):

(3) Updating of weights Wi,j:

(4) TerminationHere the parameter α is a discount rate that dictates

the degree to which the existing weights are updated. Thisparameter is set high to make the weights more sensitive toenvironmental changes; on the other hand, the parameter isset low when priority is given to empirical data. Further-more, operator3 is a set operator.

The aforesaid algorithm uses only two kinds of data,the number of sent messages NM(∆t) and the communica-tions load Q(∆t). Here we handle the information through-put in terms of importance. We may think in other terms aswell, such as interdependence graph connectivity to assuremessage routes. However, new aspects can be easily addedby modifying operator. In other words, this algorithm canbe extended to involve other information indicators.

4. Adaptive Multiagent Architecture

4.1 Architecture

The proposed adaptive multiagent architecture isshown in Fig. 3.

This architecture includes a replication server thatmanages the domain multiagents and their replicas, and themonitoring system described above.

4.2 Replication policy

The replication policy determines how to store (backup) the agent’s states, sent/received messages, and so on.Usually, three types of replication policy are used:

• Active• Passive• Semiactive

(5)

(6)

(7)

(4)

(8)Fig. 3. Adaptive multiagent architecture.

64

In an active replication policy, messages receivedfrom other agents are continuously forwarded to all repli-cas. In the passive type, the agent’s states are sent to replicasperiodically, at certain intervals. In the semiactive type, onereplica is nominated as the representative (leader), and themessages received by this leader are continuously sent tothe other replicas, as in the active type.

In an active replication policy, all replicas are syn-chronized on the message level, which allows smooth re-covery switching but has a high replication cost. In contrast,in passive replication, backups can be executed at arbitraryintervals, which is helpful in reducing replication costs. Atthe same time, in case of a fault, the messages generatedsince the last backup must be resent, which degrades recov-ery switching. The semiactive replication policy takes anintermediate position between the active and passive types.

4.3 Adaptive replication system

This study proposes an architecture to switch be-tween replication policies in order to improve efficiency onthe system level. Specifically, two types of replicas, passiveand semiactive, are used to create replication groups ofdomain agents and multiple replicas. When the leader rep-lica in a replication group fails, another replica becomes theleader to recover the system. The following processes arealways run on the replication server that manages the rep-lication groups.

First, the replica creation time and backup data con-sistency are calculated to evaluate the replicas. When adomain agent and a leader replica fails, replicas with highconsistency are chosen as new leaders. Replication groupsare managed as lists; replicas are generated and removed byinstructions from the monitoring systems, which is re-flected in the list.

4.4 Algorithm using interdependence todetermine number of replicas

Analysis of the interdependence graph is helpful inestimating an agent’s importance and condition as well asthe fault tolerance of the multiagent system. In the proposedsystem, labeled weights Wi,j for the input/output links ofevery agent are calculated using operator4 to find theagent’s importance, as in Eq. (9), where m denotes thedegree of node i:

In addition, the importance wi of agent i is used tocalculate the number repi of its replicas in the adaptivereplication system:

The parameter W in the above equation is the total ofwi for all agents, r0 is the initial number of replicas, and rmax

is the overall maximum number of replicas set by thesystem designer.

Thus, replicas are allocated among agents accordingto their importance. This is done in order to optimize systemresources while making the system robust to faults of themost important agents.

5. Simulation Experiments

This section presents simulation experiments thatwere carried out to verify the effectiveness of the proposedmethod.

5.1 System under simulation

Experiments were performed on eMarket MAS tosimulate a virtual market by the following scenario:

(1) Parts purchase agents acting on behalf of productmakers buy parts from parts sales agents acting on behalfof parts makers.

(2) Product makers process purchase parts to manu-facture products.

(3) Product sales agents acting on behalf of productmakers sell products to retail distributors.

The two-stage market model used in the experimentsis illustrated in Fig. 4. The two following markets exist.

• Market A: transactions between parts makers andproduct makers

• Market B: transactions between product makersand retail distributors

(9)

(10) Fig. 4. Two-stage market model.

65

The agents involved in this market can be assigned to fourtypes by their roles.

• Sales Agent• Purchase Agent• Purchase & Sales Agent• Market Management Agent

Purchase and sales agents are engaged only in thepurchase and sales, respectively, of parts or products. Pur-chase and sales agents play different roles depending on themarket (for example, agents of product makers). Finally,there is one market management agent in each market. Thisagent manages the other agents in the market (registration,removal, etc.) and arranges auctions.

Table 1 shows the performance of the computers usedin the experiments and the respective applications. Thesimulated system (eMarket MAS), replication server, andfault generator were run on three machines connected byEthernet. The monitoring system was implemented on ma-chine A only. In addition, market auctions were performedin the following way:

(1) Sales agents submit reserve prices to the marketmanagement agent.

(2) The market management agent accepts bids fromthe purchase agent during a certain time.

(3) Once the bidding is closed, the market manage-ment agents make deals with bidders that have offered thehighest bid prices exceeding the reserve prices.

(4) If there are no such bids, the auction is canceled,and the sales agents submit lower selling prices.

The selling prices as well as the bid prices and bidtiming were determined by using uniform random numbers.

Faults were generated by disabling agents selected byusing uniform random numbers.

5.2 Experiment 1

The objective of Experiment 1 was to elucidate therelationship between the total number of replicas rmax (aparameter set by the system designer) and the system reli-ability.

5.2.1 Experimental conditions

In this experiment, a total of 100 faults were gener-ated during 10-minute simulations, and the number ofsuccessful simulations was measured while varying thenumber of replicas rmax from 0 to 30 in steps of 2.

The success rate was calculated for 40 simulations.The experimental parameters are listed in Table 2.

5.2.2 Experimental results and discussion

The experimental results are presented in Fig. 5. Withrmax set at 10, the success rate is about 80%, but it reaches100% when rmax exceeds 20. In this experimental environ-ment, the system reliability can be maintained by settingrmax at 20 or more, while efficiently reducing the replicationcosts. Similar trends were observed in other cases as well.

Therefore, rmax is an important system parameter thatmust be set appropriately.

5.3 Experiment 2

The objective of Experiment 2 is to examine moni-toring cost, a factor affecting system performance.

Table 1. Specifications of machines and systems run onthem

Table 2. Parameters of Experiment 1

Fig. 5. Relationship between rmax and success rate.

66

5.3.1 Experimental conditions

In this experiment, performance was measured by theexecution time with and without monitoring.

The total number of agents participating in the marketwas varied from 100 to 350 in steps of 50. As in Experiment1, two market management agents were used, and the restof the agents were allocated as shown in Table 3.

As in Experiment 1, a total of 100 faults were gener-ated during 10-minute simulations. The experimental pa-rameters are listed in Table 3.

5.3.2 Experimental results and discussion

The measured performance of the proposed systemwith and without monitoring is shown in Fig. 6.

The difference in performance gives the monitoringcost. As is evident from the diagram, even when the numberof agents is increased, performance does not drop sharplydue to monitoring.

In addition, we compared the performance with thatof a typical conventional system (immune network) com-bined with a replication system. As is evident from Fig. 6,the monitoring cost of the proposed system is sufficientlylow.

We may conclude that in the conventional system,performance drops significantly as the number of agents isincreased, because more agents are required for mutualdiagnosis. Performance might be somewhat improved byreducing the diagnosis area of the immune network; how-

ever, more time would be required for reliability calcula-tion, so that reliability would eventually become impossibleto maintain. This suggests that the proposed system is alsoappropriate for monitoring large-scale multiagent systemsin terms of monitoring cost.

6. Conclusions

This paper has proposed a global monitoring methodthat is capable of analysis on the system level. For thispurpose, we introduced the interdependence graph, whichrepresents the interrelations between agents in a multiagentsystem. We also proposed an adaptive replication system toimprove the tolerance of a multiagent system using globaldata acquired by monitoring.

In addition, we verified the proposed method bysimulations using eMarket MAS. The proposed method canonly work effectively on multiagent systems composed ofagents with a nearly uniform fault occurrence rate per unitthroughput, by monitoring throughput during execution.The parameter rmax is related to the fault rate of domainagents, and hence this parameter must be set with referenceto similar systems and past experience, which requiresfurther examination.

Future research will address the following topics:

• Extension of the set operators used in the weightupdating algorithm; for example, setting of thelearning rate of current importance versus pastimportance.

• Extension of local information used for interde-pendence graph; for example, extraction, evalu-ation, and weighing of message types, orevaluation of message sequences.

• Dynamic switching between replication policiesin replication groups.

REFERENCES

1. Weiss G. Multiagent systems—A modern approachto distributed artificial intelligence. MIT Press; 1999.p 79–120.

2. Hagihara K. Algorithms for fault-tolerant distributedsystems. Inf Process Soc Japan Mag 1993;34:1336–1340. (in Japanese)

3. Ishida Y. An immune network approach to sensor-based diagnosis by self-organization. Complex Syst1996;10:73–90.

4. Kaminka GA, Pynadath DV, Tambe M. Monitoringteams by overhearing: A multi-agent plan-recogni-tion approach. Intell Artif Res 2002;17:83–135.

5. Horling B, Benyo B, Lesser V. Using self-diagnosisto adapt organizational structures. Proc 5th Interna-tional Conference on Autonomous Agents, p 529–536, 2001.

Table 3. Parameters of Experiment 2

Fig. 6. Monitoring cost.

67

AUTHORS (from left to right)

Akifumi Tanimoto (nonmember) completed the first stage of the doctoral program at Osaka Prefecture University in 2005.He is now affiliated with Fujikasai Business Solutions Corporation. His student research concerned reliability of multiagentsystems.

Keinosuke Matsumoto (member) completed the M.E. program at Kyoto University in 1978 and joined Mitsubishi ElectricCorporation. He has been a professor at Osaka Prefecture University since 1996. His research interests are intelligent informationprocessing and software. He received an IEEJ Paper Award in 1984 and IEEJ Progress Award in 2005. He holds a D.Eng. degree,and is a member of IPSJ, IEEE, and other societies.

Naoki Mori (nonmember) completed the M.E. and doctoral programs at Kyoto University in 1994 and 1997, and joinedthe faculty of Osaka Prefecture University as a research associate. He is now an associate professor. His research interests aregenetic algorithms. He received an ISCIE Paper Award in 1999. He holds a D.Eng. degree.

68

Documents

An adaptive and reliable system based on interdependence between agents