[IEEE 2005 IEEE Conference on Control Applications, 2005. CCA 2005. - Toronto, Canada (Aug. 29-31, 2005)] Proceedings of 2005 IEEE Conference on Control Applications, 2005. CCA 2005

Abstract—In this paper, we develop and test multi-agentcoordination strategies for dynamic production scheduling.First, we develop a multi-agent coordination strategy and showthe existence of conjectural equilibrium. Next, we test theproposed coordination strategy and show that multi-agentcoordination performs better than heuristic dispatching rulesfor production scheduling.

I. INTRODUCTION

HIS research deals with the learning of multi-agentcoordination strategies for dynamic productionscheduling. Our problem is that of scheduling a set of

jobs over a given shop floor environment. We useBhattacharyya and Koehler [1] genetic algorithm (GA)based learning simulation model and extend it to incorporatemulti-agent coordination strategies. Specifically,(1) we propose different types of coordination groups,(2) we develop adaptive multi-agent coordination strategies

and show the existence of conjectural equilibrium, and(3) we test and evaluate the performance of multi-agent

coordination based system to the different dispatchingrules.

We assume that the reader is familiar with the shop floorscheduling problem, genetic algorithms and distributedartificial intelligence and don’t provide any details here.

The rest of the paper is organized as follows. First, weprovide an introduction to the evolutionary computationframework for learning multi-agent coordination. After thatwe provide an approach to learn multi-agent coordinationand show the existence of conjectural equilibrium. We thentest and benchmark the performance of proposed multi-agent coordination strategies against popular dispatchingrules. In the end we summarize our findings and providedirections for future research.

II. AN EVOLUTIONARY COMPUTATIONFRAMEWORK FOR MULTI-AGENT COORDINATION

We use Sycara et al. [4] framework for an inherentlydecentralized organization with factory floors divided intowork areas [1]. The work areas are controlled by agentswith independent decision making capabilities. A typicaljob has to undergo several operations across the variouswork areas, and the schedules for job completion are builtincrementally. The incremental schedules have stated

Manuscript received November 21, 2004.P. C. Pendharkar is with the Pennsylvania State University at Harrisburg,

777 W. Harrisburg Pike, Middletown, PA 17057 USA (phone: 717-948-6028; fax: 717-948-6456; e-mail: [email protected]).

objectives and dispatching decisions need to consider locallyrelevant and globally (system-wide) consistent criteria [1].Thus, the framework used should not only consider multipleand often conflicting criteria but also coordinate decisionmaking by coordinating (balancing) the various objectives.

The system used in this research employs an objectoriented approach for modeling the environment. Factoryfloor entities (queues, servers, job-areas, dispatchers),knowledge bases, coordination groups and performancecriteria are modeled as objects. Figure 1, adapted andmodified from Bhattacharyya & Koehler [1] illustrates theframework used in the current research. There are four maincomponents in the simulation test bed. The components area simulation subsystem which models the environment,intelligent dispatchers (intelligent agents) which havedecision making capabilities, a coordination group whichuses tax based coordination mechanisms among thedispatchers in the coordination group, and managerialobjectives with their evaluation based on the performanceobjectives. A dispatcher has a knowledge base and usesgenetic algorithm based learning to update the rules in theknowledge base at periodic intervals of time.

S i m u l a t i o n M o d e l

M a n a g e r i a l O b j e c t i v e s

D i s p a t c h e r 1 D i s p a t c h e r 2 D i s p a t c h e r 3 D i s p a t c h e r 4 D i s p a t c h e r 5

E V A L E V A LE V A L

T a x T a x

C o o r d i n a t i o n G r o u p

Figure 1: A DAI Framework for GA Based Learning

Depending on the layout of the shop floor and theinterdependencies of the operations there are three possibletypes of coordination groups. These three types ofcoordination groups are an inward fork, an outward fork anda straight line coordination group respectively. Figure 2illustrates these three types of coordination groups. In eachcoordination group there is one coordinating agent and oneor more dependent agents. A coordinating agent, labeled asC in Figure 2, may consider the request to process a job of adependent agent’s interest. If coordinating agent processesthe dependent agent’s request then the dependent agent paysa “tax” to the coordinating agent. The direction of thepayment of taxes is shown by dotted line in Figure 2. Theknowledge representation used for current research isdescribed in Bhattacharyya and Koehler [1]. Informationabout tax and its calculation is described in the next section.

An Evolutionary Multi-Agent System for Production SchedulingParag C. Pendharkar

T

Proceedings of the2005 IEEE Conference on Control ApplicationsToronto, Canada, August 28-31, 2005

TB6.6

0-7803-9354-6/05/$20.00 ©2005 IEEE 946

)()()()(

01000101 typtxp

ttpttp

)()(lim)()(lim 01000

0101

0typtxp

ttpttp

tt

)()()(0100

01 typtxpdt

tdp

C

C

C

O u tw ard F o rk C o o rd in a tio n

In w a rd F o rk C o o rd in a tio n

S tra ig h t L in e C o o rd in a tio n

Figure 2: The Three Basic Types of CoordinationGroups

III. LEARNING MULTI-AGENT COORDINATIONCentral element of a multi-agent learning framework isequilibrium concept. Unlike, single-agent system where theagent’s problem is to maximize its own utility, agents inmulti-agent systems optimize simultaneously [5]. Thus, it isimportant to show that a given multi-agent coordinationstrategy will converge to a conjectural equilibrium [5]. Weuse adaptive competitive agent strategy proposed byWellman & Hu (1998) to learn coordination in multi-agentsystem [5].

Each agent is allowed to have its local and/orglobal payoff criteria based on the managerial objectives.The payoff criteria could be for example, minimize flowtime, minimize tardiness of jobs or minimize tardiness of thepriority jobs. Coordination is achieved by adjusting thepayoff estimates of the individual agents. For eachdependent agent Ai that can receive or send a job from/to acoordinating agent C, Ai pays a tax Bi to C to considerprocessing jobs of its interest first. Bi [0,1] is a tax rate thatdetermines the fraction of total payoff that an dependentagent pays to the coordinating agent. The payoff to thecoordinating agent and the dependent agent is adjusted asfollows:

c A i AB

A A i AB'

where A is the total payoff given to the dependentagent(local and global), A

’ is the payoff received by thedependent agent. Bi is the tax rate that determines thefraction of its payoff that a dependent agent pays to thecoordinating agent and c is the payoff received by thecoordinating agent. All A, c, Bi [0,1].

The coordination within a group takes place through theadjustments of the taxes. Taxes are adjusted periodicallyafter a specified number of jobs pass through the system.The taxes are allowed to vary between 5-10% of the totalpayoff value to avoid big variations leading to chaoticrandomized behavior arising from the jobs coming from atail of a Poisson distribution. Initially, at the beginning of

the simulation taxes are set equal to zero. The new tax Bnewis calculated as follows:Bnew = Bold - ( old- now), if 0.05 | ( old- now)| 0.1 ( old- now) = 0.05 , if | ( old- now)| 0.05 and( old- now)=0.1 , if | ( old- now)| 0.1. = -1 if old > now , = 1 otherwise.

In the beginning of the simulation run old=Bold=0 and now

is the average payoff across specified number of initial jobsthrough the system. Learning rate, , is number that isexperimentally determined and is set so that when multipliedwith an approximate average of ( old- now) gives a numberbetween 0.05 - 0.1.

After describing the adaptive multi-agent coordination,we now illustrate the existence of conjectural equilibrium.The coordinating agent will always be in two states: state 1-the coordinating agent is coordinating (tax >0); state 2- thecoordinating agent is not coordinating (tax 0). Since jobskeep coming in the system. The entire process is acontinuous time stochastic process. Let pij(t) represent theprobability that the coordinating agent is in state j at time tgiven that it was in state i at time 0. Since a coordinatingagent can take only one of two states (state 1= 1 and state2=0), we have four probabilities: p00(t), p01(t), p10(t), p11(t).In order to derive the pij(t) functions, we make followingassumptions:1. The process satisfies Markov property.2. The process is stationary.3. The probability of transition from a given state to other

state in a short interval, t, is proportional to t.In the event of no-breakdown of machines and for verysmall values of t all the above mentioned assumptionsstrictly hold. In regards to assumption 3, let

p01( t)=x tp10( t)=y t

where x and y are constants of proportionality (describedlater) called cooperation rate and defection rate. Using aspecial case of the Chapman-Kolmogorov equations, we cancalculate the function value of p01(t+ t), or the probabilitythat the agent is coordinating at time t+ t, given that it wasnot coordinating at time 0, as follows:

p01(t+ t)= p00(t) p01( t)+ p01(t)p11( t).Substituting linear approximations for p01( t) andp11( t)=(1- p10( t)), we get

p01(t+ t)= p00(t) x t+ p01(t)(1-y t)p01(t+ t) - p01(t) = p00(t) x t - p01(t)y t

Taking limit of both sides as t 0

The above equality can be represented as,

947

))((0101 yxtpx

dtdp

)()( 0101 tp

yxx

yxdt

dp

tyxeyx

xyx

xtp )(

01 )(

rearranging the terms and putting p00(t) = 1-p01(t) we get,

Solving the differential equation , we get

Similarly, we can solve for other three functions which canbe expressed in the matrix form as follows.

There are several desirable properties of these functions: (1)All functions converge smoothly to a fixed value between 0and 1, and (2) convergence is rapid.

In the real world situation, x and y numbers couldbe estimated statistically. For example, if the coordinationand no-coordination times follow a negative exponentialdistribution having a cumulative distribution of

te1 then the expected value of the distribution is 1/ .In an event where x=4 and y=3, the expected coordinationtime will be 1/4 and expected defection time is 1/3. Thus,the cooperation rate and defection rates can be interpreted asthe reciprocals of the mean coordination and defectiontimes, respectively.

IV. SIMULATION EXPERIMENTSIn dynamic manufacturing environments, it is hard

to predict the overall system behavior. However, the desiredsystem behavior is often known. For example, if tardinessof priority jobs were extremely high then the desired systembehavior would be one that minimizes tardiness. In manysituations, the desired system behavior may be acombination of multiple conflicting objectives that can beprioritized. For example, in certain manufacturingsituations, a criterion such as, first, minimize earliness of alljobs, second, minimize tardiness of priority jobs, and thenflowtime of non-priority jobs might be desired. Heuristicrules such as first come first served (FCFS), shortestprocessing time (SPT), earliest due date (EDD) etc. that areknown to minimize individual criteria (e.g., flowtime,tardiness) do not fare well in dynamic situations wheremultiple conflicting criteria are to be satisfied. Distributed

multi-agent coordination based learning will be ofadvantageous in situations where conflicting criteria areused [3]. For our research, we use a configuration similar tothat of Bhattacharyya & Koehler [1] which was adaptedfrom semiconductor manufacturing line data provided by theIBM. The simulated shop-floor setup used for testing thecoordination scheme is shown in Figure 3. An outward forkcoordination group was defined with a dispatcher attached toqueue #6 being the coordinating agent and dispatchersattached to queues #7 and #8 as dependent agents.Processing times, shown in Table 1, were set so that queuesaccumulate over the coordinating machine, dependent agentsand other machines that were not part of the coordinationgroup such as the machine serving queue #11. The queueaccumulating machines #6, #7, #8 and #11 were the learningagents.

1

2

3

4

5

1

2

6

1

2

7

8

1

2

9

10

1

2

11[2]

[2 ]

[1 ]

[1 ]

[1 ]

[2 ]

[2 ]

[1 ]

[2 ]

[2 ]

[1 ]

[2 ]

[2 ]

[2 ]

[2 ]

[2 ]

[2 ]

[2 ]

[2 ]

N o . o f S ervers

Q ueue and too l num ber

coord ina tion g roup

Figure 3: Multi-Stage Shop Floor Setup with anOutward Fork Coordination Group

Machine breakdowns and repair times were calculated basedon exponential rates shown in Table 2. Processing time foran operation was allowed to vary uniformly up to 50% aboutthe values given in Table 1. Setup times of 0.15 wereconsidered for change of operations at a machine, and anexponential arrival rate of 0.40 was considered. Thepercentage of priority job was kept 10%. The due dates forjobs were chosen uniformly between 18 and 30 time units.

The objective of our research is to test the performance ofmulti-agent coordination based system against thescheduling heuristic rules such as FCFS, EDD and SPT.We are modeling a dynamic environment where policies toattain the desired behavior are not known a priori. Thedesired behavior, however, can be expressed through a set ofmulti-criteria objectives. For example, if Fi representsflowtime of ith job in the system and Ti represents tardinessof the ith job then an objective defined as (1/Fi+1/Ti)represents a desired system behavior that minimizesflowtime and tardiness of ith job. Since we are usinggenetic learning the desired system behavior can be easilymodeled as a fitness function for the scheduling rules. Todetermine the effectiveness of the multi-agent coordinationbased learning, multiple simulation experiments wereconducted for a conflicting criterion.

tyxtyx

tyxtyx

eyx

yyx

xe

yxy

yxy

eyx

xyx

xe

yxx

yxy

tptp

tptptP

)()(

)()(

1110

0100

)()()()(

)(

948

Table 1: Operation Processing Times

Op.#

Pro. Time Operation # Pross. Time

1 .06 10 .062 .06 11 1.653 .46 12 1.654 .46 13 .065 .46 14 .066 .06 15 .157 .06 16 .0758 .69 17 .069 .06 18 .06

Table 2: Machine Breakdown and Repair TimesTool

NumberMean Time to

FailureMean Time to

Repair1 19.0 1.002 19.0 1.003 17.28 4.324 17.28 4.325 17.28 4.326 15.00 5.007 13.00 7.008 13.00 7.009 19.00 1.00

10 19.0 1.0011 13.00 7.00

The objectives considered in our experiment can bestated as: first, minimize earliness of all jobs, next minimizetardiness of priority jobs and minimize flowtime of non-priority jobs. The objectives aim to achieve on time deliveryof all jobs and minimize tardiness of priority jobs andflowtime of non-priority jobs. The objectives are complexand conflicting for any one of the heuristic rules EDD, SPTand FCFS to perform well. The above mentioned objectivesneed to be transformed into a fitness function so that payoffscan be distributed to the scheduling rules reflecting theirperformance on the desired objectives.Let Fk = total flowtime of kth job through the system,Ck =Completion time for job k, Dk = due-date for job k, Pk =priority for job k (0 or 1), and (x) = 1 for x > 0, 0 otherwise.The fitness function for our desired objective can be writtenfollows.

The weights wl where l [1,2,3] are set to reflect thetradeoffs involved amongst the three criteria. Since thepayoff function that we are using is not continuous anddifferentiable, the values of weights, in our research, areobtained through initial experimentation. The weightsreflect the importance of an objective when compared toanother. We do, however, understand that moresophisticated schemes such as Pareto optimality can beconsidered to obtain better results [2].

A total of 40 simulations were run. Ten different randomscenarios were considered and 4 simulations were run foreach of multi-agent coordination based learning calledcoordination group (CG), first come first server (FCFS)heuristic, earliest due date (EDD) heuristic and shortestprocessing time (SPT) heuristic. In order to compare theperformance on a common ground, we first normalize all theobjectives. Taking the results of a simulation run for fourscheduling strategies and dividing each result by themaximum value does the normalization of objectives. Forexample, total earliness results for the first simulation runfor CG, FCFS, SPT, and EDD were 307588.8, 526230.1,641708.4, and 498590.1 respectively. After normalization,the normalized scores for CG, FCFS, SPT and EDD are0.48, 0.82, 1.00, 0.78 respectively. If Earliness ,

TardinessPriorityJobs , Flowtime is the normalized score for thethree different objectives for a given scheduling strategythen we compute the overall total system performance usingthe following formula:

Where is the standardized overall system performanceover the combined objective and PriorityJobs is the percentageof total jobs that are priority jobs. In our case,

PriorityJobs=0.1. Figure 4 illustrates the overall systemperformance. Since our overall objective is that ofminimization the lower values of are considered better.The pairwise comparisons of the standardized overall systemperformance for different scheduling approaches areillustrated in Table 3.

Table 3: Pairwise Comparisons for Overall StandardizedSystem Performance

CG-FCFS

CG-SPT

CG-EDD

FCFS-SPT

FCFS-EDD

SPT-EDD

Diff. -0.12 -0.38 -0.11 -0.278 0.01 0.27t -9.37 -

29.69-8.27 -48.38 1.91 47.36

Prob 0.00** 0.00** 0.00** 0.00** 0.09 0.00**

** Significant at =0.01, * Significant at =0.05

k

k

kkkkk

kkkk FwP

DCDCw

PCDCD

w 321 )1()()(1)()(1

FlowtimeiorityJobsiorityJobsTardinessiorityJobsEarliness www )1( Pr3PrPr21

949

Overall System Performance

00.20.40.60.81

1.21.41.6

1 2 3 4 5 6 7 8 9 10Experiment Number

Ove

rall

Syst

em P

erfo

rman

ce

CGFCFSSPTEDD

Figure 4: The Overall Standardized Performance of theSimulation Runs

The results in the Table 3 indicate that consideringeverything else multi-agent coordination based learningconsistently minimized the conflicting multi-criteriaobjective. The pairwise comparisons of CG vs. all otherheuristic scheduling approaches showed a significantdifference at 0.01 level of significance. All the otherapproaches except for FCFS and EDD showed a significantdifference in their scheduling performance over a combinedconflicting multi-criteria objective. The two tail t-test fordifference in means between FCFS and EDD showed thatthere is no difference in means of the performance of thetwo scheduling approaches based on the standardizedoverall mean at a 0.05 level of significance.

V. CONCLUSION

We have showed that multi-agent evolutionarycomputation framework can be used for dynamic productionscheduling. We first showed the existence of conjecturalequilibrium, which is a necessary condition for learning inmulti-agent systems. Next we tested the proposed multi-agent framework using simulation. The results of oursimulations indicate that dynamic multi-agent learning is apromising approach over heuristic dispatching rules, whenmultiple conflicting objectives are concerned.

REFERENCES

[1] S. Bhattacharyya, and G. J. Koehler, “Learning by objectives foradaptive shop-floor scheduling,” Decision Sciences, vol. 29, no. 2,1998, pp. 347-376.

[2] S. J. Louis, and G. J. Rawlins, “Pareto optimality, GA-easiness anddeception,” in: S. Forrest (Ed.), Proceedings of the Fifth InternationalConference on Genetic Algorithms, 1993, pp. 118-123.

[3] P. C. Pendharkar, “Theory of designing cooperative informationsystems,” Decision Support Systems and Electronic Commerce, to bepublished.

[4] K. Sycara, S. Rothm, N. Sadeh, and M. Fox, “An investigation intodistributed constraint-directed factory scheduling,” Proceedings ofSixth IEEE Conference on AI Applications, Santa Barbara, CA.March 1990.

[5] M. P. Wellman and J. Hu., Conjectural Equilibrium in MultiagentLearning, Machine Learning, vol. 33, 1998, pp.179-200.

950

Documents

[IEEE 2005 IEEE Conference on Control Applications, 2005. CCA 2005. - Toronto, Canada (Aug. 29-31, 2005)] Proceedings of 2005 IEEE Conference on Control Applications, 2005. CCA 2005