Journal of - uow.edu.auminjie/pub-ps.dir/Power-Dispatch.pdf · Journal of Parallel and Distributed Computing 00 ... leader that knows the number of resources which must be col-

Journal of Parallel and Distributed Computing 00 (2015) 1–16

Journal ofParallel

andDistributedComputing

Decentralised Dispatch of Distributed Energy Resources in Smart Grids viaMulti-Agent Coalition Formation

Dayong Yea∗, Minjie Zhangb , Danny Sutantoc

aSchool of Software and Electrical Engineering, Swinburne University of Technology, VIC 3122 AustraliabSchool of Computer Science and Software Engineering, University of Wollongong, NSW 2522 Australia

cSchool of Electrical, Computer and Telecommunications Engineering, University of Wollongong, NSW 2522 Australia

Abstract

The energy dispatch problem is a fundamental research issue in power distribution networks. With the growing complexity and dimensions ofcurrent distribution networks, there is an increasing need for intelligent and scalable mechanisms to facilitate energy dispatch in these networks.To this end, in this paper, we propose a multi-agent coalition formation-based energy dispatch mechanism. This mechanism is decentralisedwithout requiring a central controller or any global information. As this mechanism does not need a central controller, the single point of failurecan be avoided and since this mechanism does not require any global information, good scalability can be expected. In addition, this mechanismenables each node in a distribution network to make decisions autonomously about energy dispatch through a negotiation protocol. Simulationresults demonstrate the effectiveness of this mechanism in comparison with three recently developed representative mechanisms.

c© 2014 Published by Elsevier Ltd.

Keywords: Distributed Energy Dispatch, Smart Grids, Multi-Agent Systems, Coalition Formation

1. Introduction

Due to recent improvements in power grid technology, elec-trical energy systems are undergoing radical transformations infunctionality to increase efficiency and reliability. These trans-formations are not only in the bulk power transmission systemsbut also in the distribution networks [1]. A distribution networkis the final stage in the delivery of electricity to end users [2].Typically, bulk generation is the only energy resource to a distri-bution network and the direction of power flow is strictly fromthe central generation to downstream electric components [3].Recently, there has been an increasing number of renewablegenerators embedded in distribution networks [4]. This pos-es two challenges for distribution network operators [5]. First,electricity networks are already highly capacity constrained, soadding additional generation, which is not managed effective-ly, may overload the networks. Second, it is very difficult tobalance electricity demand with generation from intermittentrenewable resources. If the operator of the distribution networkfails to balance supply and demand, the network can potential-ly become unstable and this may result in brownouts and in

∗Corresponding author. Tel: +61 3 92144899 Email Addresses:[email protected], [email protected], [email protected]

the worst case, cascading blackouts. Therefore, efficient energydispatch mechanisms are necessary.

Recently, many energy dispatch mechanisms have been pro-posed. These mechanisms can be roughly classified into t-wo categories: centralised mechanisms such as [6] and decen-tralised mechanisms such as [5]. A centralised mechanism re-lies on a centralised control architecture, where each genera-tor is directly coordinated by and communicates with a cen-tral decision-maker. A decentralised mechanism eliminates theneed for a central decision-maker by distributing the decisionmaking to the generators themselves. Centralised mechanismsare easy to implement and control but have a potential singlepoint of failure, because failure of the central decision-makercan cause the whole network to suffer brownouts or even cas-cading blackouts. Decentralised mechanisms can avoid suchrisks because there is no central decision-maker. Some of thecurrent decentralised mechanisms, e.g., [5], however, rely on aleader that knows the number of resources which must be col-lectively provided in the network. In other words, these de-centralised mechanisms require global information to achievedecentralised decision making. Requiring global informationis a drawback, because such information acquisition is time-consuming and this situation is even worse in large distribu-

1

D. Ye, M. Zhang and D. Sutanto / Journal of Parallel and Distributed Computing 00 (2015) 1–16 2

tion networks. Other decentralised mechanisms, e.g., [7], arebased on decomposition techniques. They solve the energy dis-patch problem by decomposing the main problem into multiplesub-problems which can be solved efficiently and in parallelby different local controllers. These decentralised mechanism-s, however, have two drawbacks. First, in these decentralisedmechanisms, it is not clear who divides the main problem intosub-problems and who partitions the power transmission net-work into areas, where each area has a local controller. If thereis a manager or a control center which executes the division andpartition tasks, these mechanisms become centralised in nature,such as the mechanism proposed in [8]. The second drawbackis that because each area is controlled by a local controller, eachsub-problem for an area is still solved in a centralised mannerby the local controller, which implies that the size of each areacannot be too large, as, otherwise, these large areas will sufferthe common drawbacks of centralised mechanisms: the singlepoint of failure and the high computational complexity of localcontrollers to solve local optimisation sub-problems. There-fore, if the size of the power transmission network is large, therewill be a large number of areas. Because the local controller ofeach area can exchange information only with its neighbouringlocal controllers, the overhead of communication and coordina-tion among these local controllers to achieve a globally optimalsolution is very heavy in methods like the one proposed in [7].In other words, the scalability of these decentralised mecha-nisms may not be very good.

In this paper, a multi-agent coalition formation mechanis-m is proposed to address the energy dispatch problem. Un-like centralised mechanisms, the proposed mechanism is de-centralised and has no central controller, so the single point offailure, inherently associated with centralised mechanisms, canbe avoided. Unlike some decentralised mechanisms which needglobal information before a solution can be obtained, the pro-posed mechanism does not require global information but on-ly needs local information. Unlike other decentralised mecha-nisms which need to partition the network into several areas andassign a local controller to each area to solve a sub-problem, theproposed mechanism needs neither such a partition process norany local controllers for management and supervision. In theproposed mechanism, each component, e.g., a load, a generatorand a distribution line, in the network makes decisions indepen-dently based only on its local information without managementand supervision of local or central controllers. Each componentis modelled as an agent and the network is modelled as a multi-agent system which consists of several components. Here, anintelligent agent is an entity which is able to make rational deci-sions autonomously in a dynamic environment, namely blend-ing pro-activeness and reactiveness, showing rational commit-ments to decision making and exhibiting flexibility when fac-ing an uncertain and changing environment [9]. A multi-agentsystem is composed of several intelligent agents and individualagents may perform different roles. The agents in a multi-agentsystem can work autonomously, make decisions independent-ly and interact with each other to achieve global goals. Multi-agent systems, as a new paradigm which can facilitate distribut-ed control [10], have been adopted in power systems for various

purposes, such as voltage support [11], power restoration [12]and system management [13].

The rest of the paper is organised as follows. Section 2 re-views current related studies. Section 3 introduces our multi-agent coalition formation-based energy dispatch mechanism.Section 4 presents the properties of the proposed mechanism.Section 5 investigates the performance of the proposed mecha-nism in comparison with other energy dispatch mechanisms viasimulation. Finally, Section 6 concludes the paper and outlinesfuture work.

2. Related Work

Many energy dispatch mechanisms have been presented overpast decades. As described above, these mechanisms can beroughly classified into two categories: centralised mechanismsand decentralised mechanisms. Most of the current centralisedenergy dispatch mechanisms are based on computational intel-ligence techniques, such as genetic algorithms, particle swarmoptimisation and differential evolution, etc. [14, 15, 16]. Thesemechanisms can obtain optimal results of energy dispatch butthe calculation process is centralised. Such centralisation has apotential single point of failure.

Zhao et al. [17] developed a multi-agent based particleswarm optimisation mechanism for optimal energy dispatch.Their mechanism integrates a multi-agent system and a parti-cle swarm optimisation algorithm, where each agent representsa particle in the particle swarm optimisation algorithm. In theirmechanism, the environment is organised as a lattice-like struc-ture and each agent is fixed on a lattice point. Such a lattice-likestructure is distributed and this means that each agent can com-pete and cooperate only with its neighbours. However, in orderto accelerate the diffusion of information and then obtain theoptimal solution, each agent has to use not only its own ex-perience but also the experience of the ‘best agent’ among allthe agents in the environment. Although the best agent is not acentral controller, it is a special entity in the environment. If thebest agent is out of order or is difficult to find, their mechanismmay not work properly.

Abido [18] investigated and evaluated the effectiveness ofPareto-based multi-objective evolutionary algorithms for solv-ing the power dispatch problem. He first designed a procedurefor quality measurement of different techniques. Then, he de-veloped a feasibility check procedure to restrict the search ofmulti-objective evolutionary algorithms to a feasible region ofthe problem space and used a hierarchical clustering algorithmto provide the power system operator with a representative andmanageable Pareto-optimal set. Finally, he presented a fuzzyset theory-based approach to extract one of the Pareto-optimalsolutions as the best compromise one. His methods, howev-er, are centralised in nature, as these methods are based on theassumption that all the necessary information is already known.

Dai et al. [19] proposed a seeker optimisation algorithm-based method for power dispatch, where the search direction isbased on the empirical gradient by evaluating the response tothe position changes and the step length is based on uncertain-ty reasoning using a simple fuzzy rule. The search process of

2


their method is decentralised. However, their method operateson a set of solutions and these solutions are obtained in a cen-tralised manner. Shaw et al. [20, 6] proposed improved seekeroptimisation algorithm-based methods for power dispatch butthe methods still have the same drawback as Dai et al.’s methodhas.

Dominguez-Garcia and Hadjicostis [21, 1] developed a setof distributed algorithms for the control and coordination ofloads and distributed energy resources in distribution network-s. Their algorithms are relevant for load curtailment controlin demand response programs and also for coordination of dis-tributed energy resources for the provision of ancillary services.Their algorithms assume that the total number of resources thatneed to be collectively provided by distributed generators isknown to a leading node which acts as a central controller. In afully distributed environment, this information is not known inadvance.

Khazali and Kalantar [22] presented a harmony search algo-rithm for the optimal power dispatch problem. Their algorithmis used to find the settings of control variables such as genera-tor voltages, tap positions of tap changing transformers and thenumber of reactive compensation devices to optimise a certainobject. Objects include power transmission loss, voltage sta-bility and voltage profile which are optimised separately. Theharmony search algorithm outperforms other basic heuristic al-gorithms, like the genetic algorithm, because in the harmonysearch algorithm, in order to generate the elements of the newvector solution, all of the existing vector solutions are consid-ered while in basic heuristic algorithms, only some of the exist-ing vector solutions are taken into consideration. In the harmo-ny search algorithm, however, the control variables are selectedrandomly and centrally by the system operator.

Because centralised mechanisms have a potential single pointof failure, decentralised mechanisms have also been developed.Some of these decentralised mechanisms, however, have to useglobal information to operate.

Vytelingum et al. [23] modelled the optimal dispatch prob-lem as the trading of electricity between nodes in a network. Intheir approach, each node needs to know the topology of theentire network. In large and distributed networks, however, itis difficult for each node to maintain such system-wide knowl-edge and in dynamic networks, where new nodes may enter andexisting nodes may leave the network at any time, maintainingsuch system-wide knowledge becomes unfeasible. In addition,their approach was tested only in a 16-node network, so it is un-clear whether their approach can be applied to larger networks.

Kranning et al. [24] developed a decentralised method forsolving the optimal power scheduling problem which is to min-imise the total network objective subject to the device and lineconstraints, over a given time horizon. Specifically, their methodis iterative and distributes computation across every device inthe network. At each iteration, every device passes simple mes-sages to its network neighbours and then solves an optimisationproblem that minimises the sum of its own objective functionand a simple regularisation term that only depends on the mes-sages it has received from its network neighbours in the previ-ous iteration. Their method is time efficient and can converge

to an optimal solution if the device objective functions are con-vex. However, in their method, the stopping criterion and thealgorithm to update the scaling parameter require global devicecoordination.

Dominguez-Garcia et al. [25] later proposed a low-complexityiterative algorithm for optimal dispatch of distributed energyresources. Their algorithm uses locally obtained informationthrough exchange of information between neighbouring dis-tributed generators. Their algorithm, however, requires an it-erative broadcasting process to make local information global-ly known, where a generator broadcasts the information aboutthe number of resources provided by it to its neighbours and itsneighbours in turn broadcast such information to their neigh-bours. Continuing this process, each generator in the networkwill finally obtain the information. This iterative broadcastingprocess allows each generator to know the number of resourcesprovided by the other generators.

Miller et al. [5] modelled the optimal energy dispatch prob-lem as a decentralised multi-agent coordination problem andformalised it as a distributed constraint optimisation problem.They then modelled the power system as a factor graph andused a dynamic programming algorithm to solve the problem.Their algorithm consists of two steps. First, a node waits untilit has received power cost messages from all of its child nodesbefore computing its own power cost message which it sends toits parent node. Then, when the root node receives power costmessages from all of its child nodes, it starts to calculate its op-timum power output. Obviously, the root node needs to havethe global information before it can yield an optimal result.

Other decentralised mechanisms solve the energy dispatchproblem by decomposing the main problem into multiple sub-problems which can be solved efficiently and in parallel by d-ifferent local controllers. However, the decomposition of themain problem is either not clearly described or based on a cen-tralised method and each sub-problem is still solved in a cen-tralised manner by a local controller.

To solve the optimal power flow problem, Baldick et al. [8]divided the overall power system and the overall optimal powerflow problem into geographical regions. Then, a separate pro-cessor is assigned to each region to perform computations foreach region in parallel. Their approach itself is decentralised, soit is very efficient and robust. However, the division of the over-all power system is based on a centralised method, i.e., auxiliaryproblem principle [26][27], and computations for each region isstill performed in a centralised manner by a separate processor.

Hug-Glanzmann and Andersson [28] extended the multi-area control method, which is based on approximate Newtondirections [29], to allow for the application and evaluation of themethod to arbitrarily defined areas. Specifically, the approachproposed in [29] can only work in non-overlapping areas. AfterHug-Glanzmann and Andersson’s extension, the new approachcan work not only in non-overlapping areas but also in overlap-ping and limited areas. However, as Hug-Glanzmann and An-dersson’s approach is based on Nogales et al.’s approach [29],even though the central agent does not need to update any in-formation, a central agent is necessary to collect and distributeinformation for the whole system.

3


Dall’Anese et al. [7] used semidefinite programming re-laxation techniques to solve the distributed optimal power flowproblem for microgrids operating in an unbalanced setup. Theydecomposed the main problem into multiple sub-problems andpartitioned a microgrid into several areas. Each area has a lo-cal controller. The sub-problems, then, can be solved efficientlyand in parallel by local controllers. Each local controller solvesan optimisation sub-problem and then exchanges simple mes-sages with its neighbouring local controllers. Their method canfind the globally optimal solution and its worst-case computa-tional complexity is quantifiable. However, in each area of amicrogrid, the local controller still solves an optimisation sub-problem in a centralised manner, since the local controller hasall the area information. Thus, when the size of a microgrid islarge, the number of areas and local controllers will be large aswell1. Then, the overhead of communication and coordinationamong local controllers to achieve a globally optimal solutionis high, because local controllers can communicate only withtheir neighbours and thus, communication between two localcontrollers, which are not neighbours, has to be relayed by oneor more controllers, which increases the overhead of those re-laying controllers.

In addition to the above centralised and decentralised mech-anisms, there are some other studies which are somewhat re-lated to our study but focus on different aims. For example,Mohsenian-Rad et al. [30] presented a demand-side energymanagement system which allows users to autonomously sched-ule the daily usage of their household appliances and loadsbased on the pricing tariffs of the energy source. The studyof demand-side management focuses on the design of users’scheduling strategies on the basis of the pricing tariffs of theenergy source, whereas this paper studies how users and energysources adjust their offers to achieve agreements for the sale ofenergy. In other words, the study of demand-side managementfocuses only on users’ strategies, while this paper focuses on thestrategies of both sides: users and energy sources. Nedic andOzdaglar [31] proposed and analysed dual sub-gradient meth-ods using averaging to generate approximate primal optimal so-lutions. Sub-gradient methods can be used to develop decen-tralised cross-layer resource allocation mechanisms [32, 33],which means that potentially, they can also be used in devel-oping energy dispatch mechanisms. However, in existing net-working applications, global information is still required. Forexample, Low and Lapsley [32] proposed an optimisation ap-proach to flow control where the objective is to maximise theaggregate source utility over their transmission rates. Low andLapsley considered the dual problem, the structure of whichsuggests treating the network links and the sources as proces-sors of a distributed computation system to solve the dual prob-lem using a gradient projection method. Each processor exe-cutes a local algorithm but has to report its computation resultto other processors, which implies that their approach requires

1This is because if the number of areas is small, then there must be somelarge areas. Therefore, these large areas will suffer the common drawbacksof centralised methods: the single point of failure and the high computationalcomplexity of local controllers to solve local optimisation sub-problems.

global information.As can be seen from the above description, both existing

centralised and decentralised mechanisms have drawbacks. Inthis paper, a decentralised multi-agent coalition formation mech-anism for energy dispatch is devised, which can avoid the draw-backs of existing centralised and decentralised mechanisms.

3. The Multi-Agent Coalition Formation Mechanism

3.1. Model DescriptionAn electricity distribution network, which is composed of

generators, loads and distribution lines, is modeled as a dis-tributed multi-agent system. The energy dispatch problem isthen studied as a multi-agent resource allocation problem, wheregenerators, loads and distribution lines in a distribution networkare represented by agents in a distributed multi-agent system.In many applications of multi-agent systems [34], groups of a-gents need to dynamically join together in a coalition to com-plete a complex task, which none of them can complete inde-pendently. A coalition is defined as a group of agents that havedecided to cooperate in order to perform a common task [35].An agent may be a member of more than one coalition. As s-tated in [36], the use of a coalition formation mechanism canincrease the efficiency of group task execution and can lead tooutstanding task performance and this will be displayed in Sec-tion 5. In a distribution network, a load needs energy and theenergy may be supplied by more than one generator. Thus, theload agent has to form a coalition with the generator agentswhich provide energy to the load.

Figure 1 demonstrates a sample electricity distribution net-work. Each component in the network can be considered as anagent. It should be noted that the proposed mechanism is notdesigned on the basis of a specific network topology. Instead,the proposed mechanism is topology independent, so it has thepotential to be suitable in many network topologies.

Figure 1. A sample electricity distribution network

To be more specific, we consider that there is an electric-ity distribution network which consists of n generators G =

{G1, ...,Gn}, m loads L = {L1, ..., Lm} and k distribution linesC = {C1, ...,Ck}. Each generator Gi has a maximum power out-put Oi ∈ R+ and a unit price for its power pi ∈ R+. Eachload Li has a set of discrete energy requirements ER(Li) =

4


{erit1 , ..., eri

tn }, where eritx∈ R+ means how much energy is re-

quired and the subscript of eritx

, i.e., tx, indicates how long theenergy is required. Each tx has a start time point, S tart(tx) ∈Z+, and an end time point, End(tx) ∈ Z+, where S tart(tx) <End(tx) and tx = End(tx) − S tart(tx). In this paper, when anenergy requirement of a load is created, the value of the energyrequirement is fixed. Each energy requirement eri

txalso has a

degree of importance deg(eritx

) which indicates how importantan energy requirement is for the load, where 0 < deg(eri

tx) < 1.

Each distribution line Ci has a maximum distribution capacityDCi ∈ R+, where the power which goes through the line cannotexceed the maximum capacity of the line. In the multi-agentsystem, which is used to model the distribution network, eachnode (a generator, a load or a line) is represented as an agent.Two agents are neighbours if and only if they are connected bya distribution line. The neighbour set of agent i is representedas Ni and the number of neighbours of agent i is represented as|Ni|. In the network, we assume that (i) each agent only has alocal view, (ii) no central controller exists and (iii) global in-formation is not available for any agent. Now, the problem isthat based on the three assumptions, how can agents achieveefficient energy dispatch?

As described above, the energy dispatch problem in powerdistribution networks is modeled as a distributed resource al-location problem in multi-agent systems, which will be solvedvia negotiation (bargaining)-based coalition formation amongagents. Each agent, either a load (buyer) agent or a genera-tor (seller) agent, is an autonomous entity which has its owndecision-variables and constraints. For a buyer, decision-variablesinclude (i) the price that it would like to pay for the required en-ergy, and (ii) when more than one seller can provide the energy,which seller is the best choice to buy the energy from. Theconstraints of a buyer include (i) the maximum amount of pay-ment that the buyer can spend for the required energy, and (ii)the latest start time from which the energy is required. Similar-ly, for a seller, decision-variables include (i) the price that theseller asks for its energy, and (ii) when more than one buyerwant to buy the energy, which buyer is the best choice to sellthe energy to. The constraints of a seller include (i) the lowestprice that the seller can sell its energy, and (ii) the total amountof the energy that it can sell. Since the proposed mechanismis independent of specific network topologies and each agentis an autonomous entity, the decision-variables and constraintsare also independent of specific network topologies. In the pro-posed mechanism, the objective of a buyer is to buy the requiredenergy as cheaply as possible whereas the objective of a selleris to sell the energy as expensively as possible. Therefore, thereis not a specific objective function for either a buyer or a sell-er. A buyer and a seller achieve an agreement for the requiredenergy via bargaining with each other under their constraints.For the negotiation process, certain agent interaction protocols,such as the contract net protocol [37], can be employed. Forthe message-exchange during the negotiation process, certaincurrent communication technologies, such as the internet com-munication, can be used. The overall flow chart of the proposedmechanism is displayed in Figure 2 and the detailed mechanismdesign will be illustrated in the following sub-sections.

Figure 2. The overall flow chart of the proposed mechanism

3.2. Mechanism Design

The coalition formation mechanism used by a load agent tofind appropriate generator agents for energy supply is describedin Algorithm 12. Algorithm 1 considers the underlying net-work structure, because it only allows agents to make energyrequest from its neighbours. If the neighbours cannot supplyenough energy to a load, the load agent will select one of itsneighbours as a mediator and will request the mediator’s neigh-bours for energy supply. This process continues until the starttime of the energy supply is due. Thus, the network structurehas influence in the number of agents that potentially will par-ticipate in a negotiation, because an agent with more neighbourscan have more negotiation partners.

In Algorithm 1, for each energy requirement eritx

, load a-gent i first initialises its neighbour set to find the directly con-nected agents (Line 3). The neighbour set of each load or gen-erator agent can be built through communication. When a loador generator agent wants to build a neighbour set, it sends arequest message to each line agent which connects to it. It isassumed that each load or generator agent knows which linesare connected to it. When a line agent receives a request mes-sage, it sends the request message to the other agent connectedto it. Then, the two agents that are connected by the line becomeneighbours of each other. For example, a line, k, connects twoloads, i and j. The two loads do not know the existence of each

2The idea of the proposed coalition formation mechanism is based on ourprevious work [38] which has been significantly revised in this paper to suit theenergy dispatch problem.

5


Algorithm 1: Coalition Formation Mechanism1 \* We take load agent i for example. *\2 for each energy requirement eri

tx in load agent i’s energyrequirement set ER(Li) do

3 Initialise(Ni);4 while time < S tart(tx) do5 \* time is the real time. *\6 for each agent j in Ni do7 if agent j is a generator agent and the generator’s

available energy can meet full or part of energyrequirement eri

tx then8 Negotiate(i, j);

9 i selects agent k from i’s neighbours to forward energyrequirement eri

tx to k’s neighbours;10 Ni ← Ni ∪ Nk;

11 i selects the ‘best’ temporary agreements from existingtemporary agreements and the agents involved in the besttemporary agreements form a coalition, Ceri

tx, with i;

other but both of them know that line k is connected to them.When load agent i wants to build a neighbour set, i sends a re-quest message to line agent k and then, k relays the messageto load agent j. Once j receives the message, j sends back aresponse message to line agent k. k then forwards the responsemessage to i. Now, i and j become neighbours of each other. Inthis paper, the role of a line agent is to relay messages amongload and generator agents.

Then, before the start time of eritx

, agent i inquires of it-s neighbouring generator agents whether the generators haveavailable energy. If so, i will negotiate with the generator agents(Lines 6-8)3. As loads need to buy energy from generators, annegotiation protocol is required for load agents and generatoragents to achieve agreements for the sale of energy. The detailsof negotiation will be described in the next sub-section. Afteragent i inquires and negotiates with its neighbours, i selects aneighbouring agent, k, to forward energy requirement eri

txto

k’s neighbours (Lines 9 and 10). This selection is based on thenumber of neighbours that each neighbour of agent i has. Themore neighbours an agent has, the higher probability that theagent will be selected. A sample selection method is that theprobability of selecting agent k is calculated as |Nk |∑

y∈Ni |Ny |. When

the start time of eritx

approaches, i stops the coalition formationprocess and the agents involved in the ‘best’ temporary agree-ments form a coalition, Ceri

tx, with i (Line 11). How does agent

i select the ‘best’ temporary agreements will be described in

3It should be noted that a generator, which has available energy, may nothave enough energy for i’s energy requirement eri

tx . For example, energy re-quirement eri

tx needs 20kW energy but the generator has only 10kW energyavailable. In this case, agent i negotiates with this generator for part of energyrequirement eri

tx . It should also be noted that the actual energy provided by agenerator to load i is restricted by the maximum capacity of the line connect-ing them. For example, energy requirement eri

tx needs 20kW energy and thegenerator has 30kW available energy. However, the maximum capacity of theline connecting them is only 15kW. Therefore, load agent i has to find othergenerators to provide the rest of energy requirement eri

tx .

Section 3.6.A situation can arise where two load agents send requests to

the same generator agent with different energy demands at thesame time, but the generator agent cannot meet both of their de-mands. In this case, how should the generator agent respond?If the generator agent can meet one of the two load agents’ de-mands but cannot meet the sum of their demands, the generatoragent still negotiates with both of the load agents to achievetwo agreements with them if possible, where each agreemen-t is made with one of the two load agents. This can increasethe possibility that the generator agent can finally sell its en-ergy successfully, because either of the two load agents mightcancel the agreement. A load agent may cancel an agreementwith a generator agent, when the load agent finds another gen-erator agent which can provide the load agent the same amountof energy with a lower price. In the case that neither of the t-wo load agents cancels the agreement, the generator agent willselect one load agent to sell the energy to and will cancel theagreement with the other load agent. This selection is basedon the reward that the generator agent can have, i.e., the dif-ference between the payment and the penalty. The payment isobtained from the buyer load agent, whereas the penalty is paidto the other load agent with which the generator agent cancelsthe agreement. The generator agent will make a selection whichmaximises its reward.

3.3. The Negotiation ProtocolIn Line 8 of Algorithm 1, if a generator has available ener-

gy, the load agent will negotiate with the generator agent aboutthe energy price. In this sub-section, we propose a negotiationprotocol for load agents and generator agents to achieve agree-ments for energy sale. The proposed negotiation protocol ex-tends the alternating offers protocol [39] by allowing an agentto make multiple agreements with other agents and to canceltemporary agreements by paying a penalty to the trading part-ners. The alternating offers protocol is a well known negotiationprotocol and is very useful in the application of bilateral bar-gaining [40, 41]. In this paper, the negotiation between a buyer(load) agent and a seller (generator) agent is bilateral bargain-ing. Thus, we employ the alternating offers protocol as the basisof our protocol. The proposed negotiation protocol is formallydescribed in Algorithm 2.

In Algorithm 2, in a negotiation thread, a pair of buyer andseller agents bargain by making offers to each other. In eachround, one agent makes an offer first (Line 3), then the otheragent has three choices in the bargaining stage: (i) accept theoffer; (ii) reject the offer; or (iii) make a counter-offer (Lines4, 8 and 10, respectively). It is assumed that sellers alwaysprovide offers first to buyers during negotiation and many buy-ers and sellers can bargain simultaneously where the bargain ofeach buyer-seller pair is in a negotiation thread. If the buyeraccepts the seller’s offer, the negotiation terminates with a tem-porary agreement (Lines 5-7), where AT (i) is the set of tem-porary agreements made by buyer i for energy requirement eri

tx

with other sellers andAT ( j) is the set of temporary agreementsmade by seller j for j’s energy with other buyers. Here, the u-nion sign in Algorithm 2 (Lines 5, 6, 13, 14) means adding a

6


Algorithm 2: Negotiate(i, j)1 \* Suppose that agent i is a buyer and agent j is a seller and

they are bargaining the price for energy eritx . *\

2 while time < S tart(tx) do3 j creates an offer, o, to i;4 if i accepts o then5 AT (i)← AT (i) ∪ {o};6 AT ( j)← AT ( j) ∪ {o};7 break;

8 else if i rejects o then9 break;

10 else11 i creates a counter-offer, o′, to j;12 if j accepts o′ then13 AT (i)← AT (i) ∪ {o′};14 AT ( j)← AT ( j) ∪ {o′};15 break;

16 else if j rejects o′ then17 break;

18 else19 continue;

new offer, i.e., a new temporary agreement, to a temporary a-greement set,AT . A temporary agreement set contains severaltemporary agreements made by a pair of buyer and seller a-gents. When a buyer (or a seller) proposes an offer and the offeris accepted by the other party, this offer will become a tempo-rary agreement and will be added in both the buyer’s and theseller’s temporary agreement sets. If the buyer rejects the sell-er’s offer, the negotiation terminates with no agreement (Line9). If the buyer makes a counter-offer, the bargaining proceedsto another round and likewise, the seller can accept the offer,reject the offer or make a counter-offer (Lines 11-19). After atemporary agreement is made, either a seller or a buyer has theopportunity to cancel the temporary agreement and the agentwhich cancels the agreement has to pay a penalty to the otherparty.

This paper focuses on modelling the bilateral interaction be-tween two agents, because an agent does not reveal to its negoti-ation partner the number of other agents with which the agent isnegotiating. In this negotiation protocol, an agent can simulta-neously negotiate with several agents and the negotiation resultwith one agent is affected by the negotiation results with oth-er agents. However, an agent will not reveal such informationto any other agents and each interaction, i.e., each negotiationthread, remains private between only two agents, so the bilat-eral interaction model is more suitable for our problem thanthe multi-lateral interaction model. The multi-lateral interac-tion model is used only in situations where two agents cannotachieve an agreement without the joining of other agents [42].Here is an example. A seller has goods A and B, but it does notsell the two goods separately. Now, a buyer only wants to buygoods A. Thus, another buyer, which wants to buy goods B,needs to join the negotiation to form a multi-lateral negotiation

to achieve an agreement among the three agents.In Algorithm 1 and Algorithm 2, there are three issues

which have to be addressed.

1. How does an agent create an offer (Lines 3 and 11 ofAlgorithm 2)?

2. When an agent receives an offer, how does it choose fromamong the three choices: (i) accept the offer; (ii) rejectthe offer; or (iii) make a counter-offer (Lines 4, 8 and 10of Algorithm 2)?

3. For a single energy requirement, several temporary a-greements are made. Then, how does an agent select thebest temporary agreements? (Line 11 of Algorithm 1)?

The solutions to the three issues will be given in the nextthree sub-sections.

3.4. How does an agent create an offer?

An offer, o, from seller j to buyer i consists of four el-ements: energy, payment, penalty and re f undRate. energymeans the amount of energy that seller j can provide to buy-er i. payment is the payment that buyer i must pay to seller j.penalty is the penalty that the agent, which cancels the tempo-rary agreement, must pay to the other party. re f undRate is therefund rate. During the power supply period, i.e., from S tart(tx)to End(tx), if for some reason, seller j cannot provide the agreedamount of energy to buyer i, seller j must refund some paymentto buyer i. re f undRate means how much a seller should refundto a buyer for reducing a unit of energy supply (see Equation 3for detail). The calculation of the four elements is described asfollows.

payment can be calculated using Equation 1.4

payment = p j · energy · tx + expB j · |AT ( j)|, (1)

where p j is the unit price of the energy generated by gen-erator j, energy means the amount of energy that seller j canprovide to buyer i, tx is the length of the time period duringwhich seller j provides the energy to buyer i, expB j is seller j’sstandard expected benefit and |AT ( j)| is the number of tempo-rary agreements that seller j has made with other buyers. Here,energy, tx and expB j are constant during negotiation process.In Equation 1, the only changing variable is |AT ( j)|, the num-ber of temporary agreements that seller j has made with otherbuyers for the energy. The rationale behind Equation 1 is thatthe price that seller j will ask from buyer i is based on howmany temporary agreements have been made by seller j withother buyers for the energy. The more temporary agreementsseller j has made, the higher price it will ask from buyer i.

penalty is calculated using Equation 2.

penalty = α · payment, (2)

4Typically, the cost function associated with a generator is either quadraticor piece-wise linear. As this paper mainly focuses on the interaction processbetween energy buyers and energy sellers, the cost function for the generatorsis simplified as a linear function. We leave the study on quadratic and piece-wise linear cost functions in our mechanism as one of our future studies.

7


where 0 < α < 1 is a coefficient used to decide a penaltyfrom a payment.

The refund rate is calculated using Equation 3.

re f undRate = β ·paymentenergy

, (3)

where 1 < β is a coefficient. The actual refund that seller jpays to buyer i is based on the reduced amount of energy whichseller j provides to buyer i. For example, suppose payment =

100, energy = 20kW and β = 2, so the refund rate is 2 · 10020 =

10. Now, if for some reason, seller j has to reduce 5kW powersupply to buyer i, then, seller j has to refund buyer i the amount50, where 50 = 2 · 100

20 · 5.If buyer i is not happy with seller j’s offer, buyer i can create

a counter-offer o′. Likewise, a counter-offer o′ consists of fourelements: energy, payment′, penalty′ and re f undRate′. Themeanings of the four elements in o′ are the same as the ones in oand the calculation of the four elements in o′ is almost the sameas that in o. However, the calculation of payment′ is slightly d-ifferent from that of payment. payment′ can be calculated usingEquation 4.

payment′ = resPi−expS i·(|AT (i)|+S tart(tx) − time

S tart(tx)),(4)

where resPi is the reserved payment of buyer i, namely themaximum payment that buyer i can spend, expS i is buyer i’sstandard expected saving, |AT (i)| is the number of temporaryagreements that buyer i has made with other sellers, time meansthe current time and S tart(tx) means the latest start time, i.e.,deadline, when buyer i’s required energy should be supplied.In Equation 4, resPi, expS i and S tart(tx) are constant duringthe negotiation process. The changing variables are |AT (i)| andtime, where time means current time and |AT (i)| is the numberof temporary agreements made by buyer i with other sellers forthe energy. The rationale behind Equation 4 is that the pricethat buyer i offers seller j depends on the current time and howmany temporary agreements have been made by buyer i withother sellers for the energy. The more temporary agreementsbuyer i has made, the lower the price it will offer seller j, andthe closer the time is to buyer i’s deadline, S tart(tx), the higherprice it will offer seller j.

3.5. How does an agent respond to an offer?

As described above, there are three choices faced by an a-gent, when it receives an offer: (i) accept the offer; (ii) rejectthe offer; or (iii) make a counter-offer. We develop a Q-learningalgorithm to handle this issue (Algorithm 3). Q-learning is oneof the simplest algorithms to implement reinforcement learning,while reinforcement learning is one kind of learning methodwhich can be employed by an agent to learn its optimal ac-tion through trial-and-error interactions with a dynamic envi-ronment [43]. The benefit of reinforcement learning is that anagent does not need a teacher to learn how to solve a problem.The only signal used by the agent to learn from its actions in dy-namic environments is reward, a number which tells the agentif its last action was good or not [44].

Specifically, the three choices, ‘accept the offer’, ‘reject theoffer’ and ‘make a counter-offer’, are represented as three ac-tions a1, a2 and a3, respectively. An unspecified action is rep-resented as ax. For each action, ax, there is a Q-value corre-sponding to it, which expresses the reward for doing the action.In addition, there is a probability distribution over the three ac-tions π = 〈π1, π2, π3〉, where each πx means the probability ofselecting action ax and π1 + π2 + π3 = 1.

Algorithm 3: An agent makes choices regarding how torespond to offers1 \* Suppose that buyer i and seller j are in a negotiation thread

and make offers and counter-offers alternately. This algorithmis from buyer i’s perspective and demonstrates how buyer imakes choices to respond to offers created by seller j. Thealgorithm that is from seller j’s perspective is similar. *\

2 Let ζ, ε and τ be the learning coefficients;3 For each action, ax, initialise its Q-value, Qx, to 0 and the

probability of selecting it, πx, to 13 , where 3 is the number of

actions;4 repeat5 When buyer i receives an offer, it estimates the reward, rx,

of selecting each action, ax, in the current situation.6 i updates Q-value for each action, ax,

Qx ← Qx + πx · ζ · (rx −∑

1≤y≤3 πyQy);7 i then updates probability of selecting each action, ax,

πx =

1 − ε, if Qx is the largest

ε · eQxτ

/∑1≤y≤3∧y,argmaxxQx e

Qyτ , otherwise

8 buyer i selects an action based on the probabilitydistribution over the three actions, π = 〈π1, π2, π3〉.

9 until the negotiation thread is terminated;

In Line 5 of Algorithm 3, when buyer i receives an offer,it estimates the reward, rx, for selecting each action, ax, in thecurrent situation. This estimation is based on the price of sellerj’s offer, the number of agreements which buyer i has made,the start time of buyer i’s energy requirement and the degreeof importance of buyer i’s energy requirement. The estimatedreward, r1, for selecting action a1, ‘accept the offer’, can becalculated using Equation 5.

r1 = R1 · deg(eritx

) · (resPi − payment−∑1≤k≤|AT (i)|

penaltyi→k) (5)

If buyer i accepts the offer, it pays seller j for the resource andgets reward resPi − payment. Buyer i, however, may need topay penalties to other parties that are involved in temporary a-greements for the same energy requirement, which are negativerewards. deg(eri

tx) is the degree of importance of buyer i’s ener-

gy requirement eritx

, where 0 < deg(eritx

) < 1. The coefficients,R1 (and R2, R3 which will appear later), are used for formalis-ing the values of different items in the same magnitude, becausedifferent items have different units and they cannot be directlysummed up.

The estimated reward, r2, for selecting action a2, ‘reject the

8


offer’, can be calculated using Equation 6.

r2 =

R1 ·

1deg(eri

tx ) · (resPi + penaltyi→k)−

R2 ·deg(eri

tx )S tart(tx)−time , ifA

T (a j) = ∅

R1 · penaltyi→k, otherwise

(6)

If buyer i rejects seller j’s offer and it does not have any tem-porary agreements (case 1 in Equation 6), buyer i can save themoney for buying the energy and save the money for payingthe penalty (the first item in case 1). However, buyer i’s ener-gy requirement may not be met and there is time pressure forthe energy requirement (the second item in case 1). If buyer irejects seller j’s offer and it already has some temporary agree-ments (case 2 in Equation 6), buyer i can save the money forpaying the corresponding penalty.

The estimated reward, r3, for selecting action a3, ‘make acounter-offer’, can be calculated using Equation 7.

r3 =

−R2 ·deg(eri

tx )S tart(tx)−time , ifA

T (a j) = ∅

0, otherwise(7)

If buyer i makes a counter-offer and it does not have any tempo-rary agreements (case 1 in Equation 7), buyer i’s energy require-ment may not be met because seller j may reject the counter-offer and there is time pressure for the energy requirement. Ifbuyer i makes a counter-offer and it already has some temporaryagreements (case 2 in Equation 7), buyer i’s reward is 0.

After estimation of rewards for each action, in Line 6 of Al-gorithm 3, buyer i updates the Q-value for each action basedon the estimated rewards. Then, in Line 7, buyer i calculatesthe probability distribution over the three actions based on theupdated Q-values. Here, we use an example to explain the prob-ability distribution calculation. Suppose that after the Q-valueupdate (Line 6), we have Q1 = 30, Q2 = 20 and Q3 = 10. Be-cause the value of Q1 is the largest among the three Q values,the probability for selecting Q1 is 1 − ε. Then, the probabilities

for selecting Q2 and Q3 are ε · eQ2τ

eQ2τ +e

Q3τ

and ε · eQ3τ

eQ2τ +e

Q3τ

, respec-

tively. Such probability distribution calculation for action se-lection can efficiently balance exploitation and exploration [45].Finally, buyer i selects an action based on the probability distri-bution over the three actions.

3.6. How does an agent select the best temporary agreements?

To meet a single energy requirement, several temporary a-greements have to be made. In these temporary agreements,there are redundant temporary agreements. For example, sup-pose that a single energy requirement of buyer i is 20kW. Buyeri has made five temporary agreements with five different seller-s: 5kW with seller j1, 10kW with seller j2, 10kW with seller j3,15kW with seller j4 and 20kW with seller j5. Now, buyer i hasto select a set of sellers whose total energy supply is larger thanor equal to buyer i’s energy requirement, 20kW, and the totalreward for buyer i for selecting the set of sellers should be aslarge as possible. Here, the reward for buyer i for selecting sell-er j is based on the total payment that buyer i has to make, thereliability of seller j and the degree of importance of i’s energy

requirement. The total payment includes two parts: (i) buyer ipays seller j for the energy supply and (ii) buyer i pays a penal-ty to other sellers for the agreement cancellation. The reliabilityof a seller is evaluated by a buyer via previous energy dispatch-es. Initially, for buyer i, seller j’s reliability Reliij is 0 becausei lacks evidence about whether j is a good seller. Then, duringthe energy consumption period, if j does not leave i’s coalition,Reliij is increased by 1; otherwise, Reliij is decreased by 1. Thereward for buyer i for selecting seller j can be computed usingEquation 8,

rij = R3 · deg(eri

tx) · Reliij · energy j−

R1 · (payment j +∑

penalty),(8)

where rij is the reward for buyer i for selecting seller j, deg(eri

tx)

means the degree of importance of the energy, energy j is theamount of energy that seller j provides to buyer i, payment j isthe payment that buyer i pays to seller j and

∑penalty is the

total penalty that buyer i pays to those sellers for agreementcancellation. Intuitively, buyer i wants to select a seller whosereliability is as high as possible and the total payment for se-lecting it is as low as possible.

A simple greedy algorithm is employed for solving the se-lection problem in this paper (Algorithm 4). First, buyer i sortsall the temporary agreements in AT (i) by decreasing rewardfor buyer i for selecting each seller, ri

j (Line 2). Then, buyer igreedily picks sellers in this decreasing order until the sum ofenergy provided by selected sellers is larger than or equal to theenergy requirement of buyer i (Lines 4-6).

Algorithm 4: An agent selects a set of sellers for energysupply1 \* Suppose that buyer i has got a temporary agreement set,AT (i). *\

2 Sort all the temporary agreements inAT (i) by decreasingreward, ri

j, of buyer i for selecting each seller;3 set j = 0 and coalition Ceri

tx= ∅;

4 while∑

energy j < eritx do

5 Ceritx← Ceri

tx∪ { j};

6 j + +;

7 return Ceritx

;

4. Properties of the Proposed Mechanism

Based on the analysis of the proposed mechanism, the fol-lowing three properties can be obtained.

Property 1. The negotiation protocol in the proposed mech-anism is not monotonic.

During a negotiation process between a buyer and a seller,they take turns making offers to each other. Typically, duringa negotiation process, the agents’ concessions are monotonicby insisting on the prices of their previous offers or increas-ing/decreasing the prices of their offers monotonically until anagreement is reached [41]. However, in the proposed mecha-nism, dynamic market competition is taken into account, where

9


the price of an offer that an agent makes to a trading partnerdepends on the negotiation outcomes of this agent with othertrading partners. Generally, when an agent has fewer competi-tors and more trading partners, this agent does not need to makebig concessions, but when the agent has more competitors andfewer trading partners, it has to make large concessions to se-cure at least one final agreement. In a dynamic environment,the market competition changes dynamically. Thus, the nego-tiation protocol in the proposed mechanism is not monotonic.For example, according to Equation 1, when more temporaryagreements are made by seller j, it will ask a higher price frombuyer i. However, these temporary agreements may be canceledby the trading partners and in that case, seller j will ask a lowerprice from buyer i. Therefore, the price which is asked by sell-er j from buyer i is not monotonic but depends on the currentnumber of temporary agreements.

Property 2. Algorithm 3 can efficiently balance explo-ration and exploitation.

During the learning process, if an agent always selects theaction with the largest values, it may achieve a sub-optimal pol-icy which has never visited a large part of the state space. Thus,it is also important that the agent selects exploration actions,which do not have the largest values, and tries to increase itsknowledge of the environment. Exploration actions, however,may incur some loss of immediate reward. Therefore, the a-gent faces the problem of the exploration/exploitation dilemmawhich is to find a strategy for choosing between explorationand greedy actions in order to achieve the greatest reward in thelong run [46]. In Algorithm 3, Line 7, the probability distribu-tion computation method combines the advantages of ε-greedyand Boltzmann exploration rules. The action with the largestvalue is selected with probability 1− ε and this part is exploita-tion, while other actions are selected with Boltzmann distribu-tion and this part is exploration. One advantage of ε-greedy isthat exploration of specific data need not be memorised [45],so it is suitable for large environments. The Boltzmann ex-ploration rule [47] computes probabilities for actions based ontheir Q-values. It uses a temperature variable τ which is usedfor annealing the amount of exploration. The Boltzmann ruleuses much exploration in states where Q-values for differen-t actions are almost equal, whereas little exploration in stateswhere Q-values are very different. This feature of the Boltz-mann rule is helpful for risk minimisation purposes [48], whereactions, which are significantly worse than others, are not pre-ferred to explore. In this paper, the energy dispatch problemis studied in large power grid systems, where there are manybuyers (loads) and sellers (generators), so ε-greedy is suitablefor our problem. Moreover, Algorithm 3 is used for agents toselect an action from among three actions: ‘accept the offer’,‘reject the offer’ and ‘make a counter-offer’. Then, if any ac-tion’s Q-value is significantly smaller than the other two, thisaction is considered to be a bad choice and should be assigneda very low probability for selection. Thus, the Boltzmann ruleis suitable for the exploration in our problem. By combiningε-greedy and the Boltzmann rule, advantages of both methodscan be included into our mechanism.

Property 3.

Theorem 1. The selection problem in Section 3.6 that a buyerselects a set of sellers whose total energy supply is larger than orequal to the buyer’s energy requirement and the total reward forthe buyer for selecting the set of sellers should be as large aspossible is NP-complete (Non-deterministic Polynomial time-complete).

Proof. To prove that the problem is NP-complete, we need onlyto translate the problem to another known NP-complete prob-lem. Here, we translate the problem to the 0-1 knapsack prob-lem [49] which is well known NP-complete. The 0-1 knapsackproblem is described as follows.

There is a knapsack with capacity c > 0 and there are nitems. Each item has a value, v j, and a weight, w j. A set ofitems is selected which satisfies

∑1≤ j≤n γ jw j ≤ c (which is the

constraint part) and∑

1≤ j≤n γ jv j is maximised (which is the op-timisation part), where γ j is 1 if item j is selected and 0 if itemj is not selected.

Now, in our problem, sellers that have temporary agree-ments with the buyer can be treated as items. The energy re-quirement of the buyer can be considered as the capacity of theknapsack, c. The reward for the buyer for selecting a seller canbe treated as a value, v j, and the amount of energy supplied by aseller can be treated as a weight, w j. The constraint part of ourproblem can be rewritten as

∑−γ jenergy j ≤ −eri

txand the op-

timisation part of our problem can be written as∑γ jri

j, whichare equivalent to the constraint and optimisation parts of the 0-1knapsack problem, respectively. Thus, the selection problem isNP-complete.

5. Simulation and Discussion

In order to empirically evaluate the performance of the pro-posed Multi-Agent Coalition Formation based mechanism (MACF),a set of simulations was conducted in a simulated distributionnetwork. The topology of the simulated distribution networkis similar to the one displayed in Figure 1 but the size is dif-ferent. The simulations were run in Java on a 3.4GHz Intel i5CPU with 8GB of RAM. The proposed MACF mechanism iscompared with three recently developed energy dispatch mech-anisms. One is a centralised mechanism (named SOA [6]) andother two are decentralised mechanisms (named DYDOP [5]and DeOPF [7]).

• In [6], Shaw et al. developed a Seeker Optimisation Al-gorithm (SOA) for energy dispatch. The seeker optimisa-tion algorithm is a population-based heuristic search al-gorithm. It regards the optimisation process as an optimalsolution obtained by a seeker population. Each individualin this population is called a seeker. In their algorithm, asearch direction and a step length are computed separate-ly for each seeker on each variable at each time step. Thealgorithm first calculates the search direction and the steplength of each seeker. The algorithm then updates theseekers’ position and finally lets seekers learn from eachother. The algorithm is centralised and it is quite efficientwhen the scale of the distribution network is small. Such

10


a centralised approach, however, suffers a potential sin-gle point of failure and its efficiency may decrease whenthe scale of the distribution network is large.

• In [5], Miller et al. presented a DYnamic programmingDecentralised OPtimal energy dispatch (DYDOP) algo-rithm. In their algorithm, the distribution network is mod-eled as a factor graph. Their algorithm consists of twosteps. First, a node waits until it has received power costmessages from all of its child nodes before computingits own power cost message which it sends to its parentnode. Then, when the root node receives power cost mes-sages from all of its child nodes, it starts to calculate itsoptimum power output. Their algorithm is decentralisedand can overcome the potential single point of failure.However, in their algorithm, the root node needs globalinformation before it can yield an optimal solution.

• In [7], Dall’Anese et al. decomposed the main probleminto multiple sub-problems which could be solved effi-ciently and in parallel by local controllers in a microgrid.Each local controller solves an optimisation sub-problemand then exchanges simple messages with its neighbour-ing local controllers. Their method is based on decompo-sition, so it is called Decomposition-based Optimal Pow-er Flow mechanism (DeOPF). Their method is decen-tralised and does not need global information. However,in each area of a microgrid, the local controller still solvesan optimisation sub-problem in a centralised manner.

5.1. Simulation Setup

The maximum power output of each generator is a uniform-ly random number between 180kW and 210kW. The maximumdistribution capacity of each line which directly connects to agenerator is 250kW (e.g., line C1 in Figure 1) and each linewhich connects two loads is 120kW (e.g., line C4 in Figure 1).For each load, at a time unit, an energy requirement is creat-ed with probability P, where the amount of energy required inthe energy requirment is a uniformly random number between50kW and 80kW. We introduce such randomness in the simu-lation to evaluate the adaptability of the proposed mechanism.In order to avoid the waste of energy, in this simulation, it isassumed that the generators generate electricity only when theyare requested and they generate as much as the loads need.

The shape of the simulated distribution network is similarto the one in Figure 1, which means that the number of loadsbetween two generators remains at 3. For example, in Figure1, between generators G1 and G4, there are three loads, L1, L4and L7. Other parameters, however, vary. The simulation wasconducted in the following three settings.

• Setting 1: The number of generators varies from 6 to 30.The average start time of an energy requirement is fixedat 15 time units after the energy requirement is generated,where the exact start time is a uniformly random numberin [15 − 3, 15 + 3]. The probability P with which an en-ergy requirement is created is fixed at 0.4. This setting

is used to evaluate how the three mechanisms work indistribution networks with different scales.

• Setting 2: This setting is similar to Setting 1, but the net-work structure is modified, where the middle distributionlines, i.e., C7 and C9 (see Figure 1), are removed. Inthis situation, the number of neighbours of some agents,e.g., L4, L5 and L6, is reduced and thus, the number ofagents that potentially will participate in a negotiation isalso reduced. This setting is used to evaluate the influ-ence of network structure on the performance of thesemechanisms.

• Setting 3: The average start time of an energy require-ment varies from 5 to 25 time units after the energy re-quirement is generated. The number of generators is fixedat 18 and the probability P with which an energy require-ment is created is fixed at 0.4. This setting is used toevaluate how the three mechanisms work under differenttime constraints.

• Setting 4: The probability P with which an energy re-quirement is created on a load fluctuates from 0.2 to 0.6.The number of generators is fixed at 18. The average s-tart time of an energy requirement is fixed at 15 time unitsafter the energy requirement is generated. This setting isused to evaluate how the three mechanisms work in dis-tribution networks with different levels of energy compe-tition. Since the number of generators is fixed, with theincreasing probability of energy requirement generation,the level of energy competition increases as well.

In addition, the values of the aforementioned coefficientsare α = 0.1, β = 2, ζ = 0.1, ε = 0.2, τ = 0.04, R1 = 5, R2 = 10,R3 = 3 and a time unit is set to 30ms. The values of thesecoefficients are hand-tuned by attempting different settings ofvalues of coefficients to achieve good results. As stated in [50],it is difficult to design a systematic method to autonomouslyand optimally choose values of coefficients, because no set ofvalues of coefficients is best across various situations. We haveactually attempted different settings of the values of these co-efficients and have found that different settings of the valueswill yield different simulation results. However, the differenceis small. Thus, for simplicity, we use only one set of values torun the whole simulation and present the results. For example,when the penalty coefficient, α, is large, it means that if an a-gent wants to cancel an agreement with its partner, it will paya large penalty to its partner. Therefore, the number of tempo-rary agreements that an agent makes with other parties for therequired energy will be low, because otherwise, the agent hasto pay a heavy penalty to other parties for cancellation of tem-porary agreements (as the agent needs only one final agreementwhile cancels other temporary agreements). The low number oftemporary agreements means low communication overhead, aseach temporary agreement is achieved by several rounds of ne-gotiation. However, the low number of temporary agreementsincreases the possibility that a buyer’s energy requirement maynot be met due to sellers’ cancellation of agreements. For exam-

11


ple, in order to avoid a heavy penalty, a buyer makes a tempo-rary agreement with only one seller. However, the seller finallycancels the temporary agreement and leaves the buyer very littletime to find an alternative. Similarly, different values of othercoefficients can also affect the final simulation results. Thus,according to this example, it can be seen that the value of eachcoefficient has to be carefully selected to make a good trade-off.

The three criteria used to evaluate the performance of thethree mechanisms include average utility of agents, average timeconsumption and average number of messages for energy re-quirements. SOA assumes that in the distribution network, thereis a central controller which can automatically have the infor-mation belonging to each node. In this simulation, it is stipulat-ed that in SOA, for an energy requirement, the central controllercollects information from each node in the network before thecentral controller computes a solution. The central controllerbroadcasts a message to each node to request information fromeach node, and then each node replies to the central controllerwith a message, which contains the node’s information. Thisstipulation is reasonable, because in order to have the up-to-dateinformation about the network to yield an optimal solution, thecentral controller has to collect the information from each nodeonce a new energy requirement arrives. Similarly, in DeOPF, itis stipulated that, for an energy requirement in an area, the lo-cal controller of that area collects information from each nodein the area before the central controller computes a solution.The local controller broadcasts a message to each node in thearea to request information from each node, and then each nodereplies to the local controller with a message, which containsthe node’s information.

The utility of a load agent, if an energy requirement is met,can be calculated using Equation 9.

u = (resP −∑

payment − loss)+∑re f und + (penaltyin − penaltyout),

(9)

where resP is the reserved price of the load agent,∑

paymentis the sum of payments that the load agent pays to generatoragents, loss means the loss of the load agent if generators stopproviding energy during the energy consumption period of theload (see Equation 10),

∑re f und is the total refund that the

agent receives from generator agents if generators stop provid-ing energy during the energy consumption period of the load,penaltyin is the penalty that the load agent receives from thegenerator agents and penaltyout is the penalty that the load a-gent pays to the generator agents for agreement cancellation.

In Equation 9, loss of the load, loss, can be calculated usingEquation 10.

loss =∑ End(tx) − timei

End(tx) − S tart(tx)·

paymentienergyi

· energy′i , (10)

where S tart(tx) means the start time of using the energy, End(tx)means the end time of using the energy, timei is the time pointwhen generator i stop providing energy to the load, paymentiis the payment that the load agent pays to the generator agentfor energy supply, energyi is the amount of energy that gener-ator i should provide to the load and energy′i is the amount of

energy that generator i stops providing to the load. For exam-ple, suppose paymenti = 100, energyi = 20kW, S tart(tx) = 15and End(tx) = 25, which means that generator i should provideenergy 20kw to the load from time point 15 to time point 25.Now, due to some reason, generator i has to reduce 5kW energysupply, i.e., energy′i = 5kw, to the load from time point 21, i.e.,timei = 21. Then, the loss of the load is 25−21

25−15 ·10020 · 5 = 10.

If an energy requirement is not met, the utility of a loadagent can be calculated using Equation 11.

u = penaltyin − penaltyout (11)

The utility of a generator agent can be calculated using E-quation 12.

u = (payment − p · energy · tx) −∑

re f und+

(penaltyin − penaltyout),(12)

where payment is the payment that the generator agent receivesfrom the load agent, p · energy · tx is the generator’s cost togenerate the power and

∑re f und is the refund that the gener-

ator agent pays back to the load agents if the generator stopsproviding energy to the loads, penaltyin is the penalty that thegenerator agent receives from the load agents and penaltyout isthe penalty that the generator agent pays to the load agents foragreement cancellation.

For SOA, DYDOP and DeOPF, the utility of a load agent isresP if its energy requirement is met and 0 if not. As payment isnot taken into account in SOA, DYDOP and DeOPF, the utilityof generator agents is not considered in this simulation. Eachsimulation result is obtained through averaging 1000 simulationruns. The error intervals have been considered. As the intervalsare stable and small during the simulation process, for claritypurpose, they are not marked in the figures.

5.2. Simulation Results and Discussion5.2.1. For Setting 1: evaluation of scalability of the mecha-

nismsFigure 3 demonstrates the performance of the four mech-

anisms in different scales of distribution networks. In Figure3(a), with the increase of network scales, all of the four mecha-nisms can achieve a good average utility. In comparison, MACFachieves a slightly more average utility than the three othermechanisms can. This is because in MACF, agents can nego-tiate among themselves to find the best partners autonomously,while in the three other mechanism, there is no negotiation a-mong agents and the optimal solution is yielded on the basisof sellers’ announced prices which may be decreased if nego-tiation is enabled. In Figs. 3(b) and 3(c), with the increase inthe scale of the network, the average time consumption and theaverage number of messages remain steady in MACF but in thethree other mechanisms, they rise very sharply. In MACF, for asingle energy requirement, the time is consumed and the mes-sages are generated during the negotiation period and there isa start time of the energy requirement, which means that thenegotiation must stop before the start time. In this setting, thestart time of an energy requirement is fixed at 15 time units af-ter the energy requirement is generated. Therefore, no matter

12


(a) Average Utility (b) Average Time Consumption (×100ms) (c) Average Number of Messages

Figure 3. Performance of the four mechanisms in different scales of distribution networks

how many generators or loads exist in a distribution network,the average time consumption and the average number of mes-sages for energy requirements are restricted by the start timesof these energy requirements. However, because both SOA andDYDOP need global information to yield optimal solutions, thetime is consumed and the messages are generated while the al-gorithms collect global information. Although DeOPF does notneed global information, it requires the collection of informa-tion in an area of the network and the communication amonglocal controllers to exchange information. Thus, with the in-creasing number of nodes in the distribution network, beforea solution can be derived, SOA, DYDOP and DeOPF have tospend more time and send more messages to acquire informa-tion. In DeOPF, in an individual area, the local controller man-ages the area in a centralised manner while the communicationand cooperation among local controllers are decentralised. Theperformance of DeOPF is quite good in a small scale network,where the number of generators is less than 18. This is becausewhen the scale of a network is small, the number of areas isfew, so the overhead of communication and cooperation amonglocal controllers is not heavy and thus, a solution can be quicklyand efficiently obtained. However, when the scale of a networkis large, where the number of generators is more than 18, thenumber of areas is also large. Therefore, the overhead of com-munication and cooperation among local controllers is heavyand thus, a solution cannot be easily obtained, which reflectsthe performance decrease of DeOPF.

5.2.2. For Setting 2: evaluation of influence of network struc-ture on the mechanisms

Figure 4 shows the performance of the four mechanisms indifferent scales of modified distribution networks. After com-paring Figure 3 and Figure 4, it can be seen that the perfor-mance of the four mechanisms in different settings is similar.However, in the modified distribution networks, the average u-tility achieved by the four mechanisms is less than that in theoriginal distribution networks (see Figure 3(a) and Figure 4(a)).This is due to the fact that in the modified distribution network-s, some loads, e.g., L4, L5 and L6, may not obtain enough en-ergy supply and thus, their utility decreases. This is because,

for example, in the original distribution networks (see Figure1), power supplied to L4 can go through distribution line C7.However, in the modified distribution networks, as middle dis-tribution lines, e.g., C7, have been removed, power suppliedto L4 which originally goes through C7 has to go through C4or C12. However, power supplied to L1 and L7 may also haveto go through C4 and C12. As described in Section 3.1, eachdistribution line has a maximum distribution capacity. Thus,after removing C7, power may overflow C4 and C12. In orderto protect distribution lines, some power has to be abandonedand thus, some loads will not obtain enough energy supply andtheir utility will correspondingly decrease. The average timeand number of messages used by agents in the modified distri-bution networks are less than those in the original distributionnetworks (see Figures 3(b), 4(b) and Figures 3(c), 4(c). This isbecause after middle lines are removed, the number of neigh-bours of some agents is reduced and thus, the number of agentsthat potentially will participate in a negotiation is also reduced.As less agents participate in negotiation, the time and commu-nication overhead correspondingly decrease.

5.2.3. For Setting 3: evaluation of adaptability of the mecha-nisms under time constraints

Figure 5 displays the performance of the four mechanism-s under different time constraints. In Figure 5, it can be seenthat when the time constraint is loose, the proposed MACF canachieve a higher average utility than the three other mechanism-s can but it spends more time and sends more messages. Thisis because in MACF agents are allowed to negotiate with eachother. When the time constraint is loose and the start time isextended, each load agent can negotiate with more generatoragents and each negotiation thread can be repeated in morerounds. Therefore, each load agent has more opportunity tofind better sellers than it has when the time constraint is tight.Certainly, more negotiation means more time spent and moremessages sent. It should also be noted that in Figure 5, whenstart time is set larger than 18 time units, the performance ofMACF remains almost steady. This may be due to the fact thatmost agreements can be achieved before 18 time units. Thisphenomenon implies that loose time constraint cannot always

13



Figure 4. Performance of the four mechanisms in different scales of modified distribution networks


Figure 5. Performance of the four mechanisms under different time constraints

increase the performance of MACF. As neither SOA, DYDOPnor DeOPF allow negotiation, their performance is not greatlyaffected with the change of time constraints. However, thereis an interesting point in Figure 5(a). When the start time is setquite small (less than 10 time units), the average utility achievedby SOA, DYDOP and DeOPF is also very low. This is becausewhen the time is very urgent, SOA, DYDOP and DeOPF maynot have enough time to yield a solution for a load agent. In thiscase, the load agent will get 0 utility and this is harmful to theaverage utility of all the load agents.

5.2.4. For Setting 4: evaluation of adaptability of the mecha-nisms under energy competition

Figure 6 shows the performance of the four mechanismswith different levels of energy competition. As the number ofgenerators in the distribution network is fixed, with the increaseof the energy requirement creation probability, the level of ener-gy competition rises. In Figure 6(a), the average utility achievedby all the four mechanisms decreases with the increase of theenergy requirement creation probability. This can be explainedby the fact that when more energy requirements are created, theenergy in the network is not enough to concurrently supply allthese requirements. Hence, some energy requirements cannotbe met and this results in the decrease of average utility. InFigs. 6(b) and 6(c), it can be seen that in MACF, the amount of

time and the number of messages decrease with the increase ofthe energy requirement creation probability, while in the threeother mechanisms, the amount of time spent and the numberof messages sent remain steady. This is because in MACF, asthe level of energy competition grows, each load agent is undermore pressure to make agreements with generator agents be-cause, otherwise, generator agents may sell their energy to oth-er load agents. Such pressure will accelerate the achievementof an agreement. Thus, the total amount of time and number ofmessages decrease in MACF. Because SOA, DYDOP and De-OPF do not have a negotiation process, their time and messagecosts are not greatly affected.

Overall, through the simulation, it can be seen that the pro-posed MACF works better under various circumstances thanSOA, DYDOP and DeOPF considering that MACF is a decen-tralised mechanism which does not require a central or localcontroller, or any global information.

6. Conclusion and Future Work

This paper has introduced a decentralised multi-agent coali-tion formation-based energy dispatch mechanism which enablesagents to autonomously find partners via negotiation. Whileexisting mechanisms need either central control or global infor-mation, the proposed mechanism does not need either of them.

14



Figure 6. Performance of the four mechanisms with different levels of energy competition

Thus, it can overcome the limitations of existing mechanism-s. Simulation results have shown that the proposed mechanismcan achieve good results.

The main part of this mechanism is a negotiation protocoland the performance of this mechanism relies heavily on thisnegotiation protocol. Therefore, we believe that a more effi-cient negotiation protocol can increase the performance of thismechanism and this is one of our future studies. As stated inSection 3.1, the value of an energy requirement is fixed once anenergy requirement of a load is created. In the future, we wouldlike to relax this assumption by allowing dynamical variationof energy requirements’ values. Also, as described in Section3.4, the cost function for the generators is simplified as a linearfunction in this paper. The study on quadratic and piece-wiselinear cost functions in our mechanism is also one of our futurestudies. The determination on whether an agent is a good sell-er is simplified in this paper. In the future, trust and reputationmetrics will be introduced to assist agents to make such de-termination. As described in Section 5.1, in different scenarios,different parameter values can yield different simulation results.In the future, we will systematically study the impact of param-eter values on simulation results in different scenarios. In addi-tion, the distribution network used in this paper is static, whereexisting nodes cannot leave, new nodes cannot join and lineswill never lose. In the future, we will study how the mechanis-m works in a dynamic distribution network, where nodes canenter and leave the network and lines may lose unexpectedly.Finally, this paper focuses on theoretical and simulation study,where we revise our previous work [38] and attempt to applyit to power grid systems for energy dispatch. The simulationstudied in this paper is based on a simulated network. Thus, inthe future, we intend to implement and evaluate the mechanismin IEEE test systems.

7. Acknowledgments

This work was originally created at University of Wollon-gong, NSW 2522 Australia. It was supported by two ARC Dis-covery Projects (DP150101775 and DP140100974) from Aus-tralian Research Council, Australia.

References

[1] A. D. Dominguez-Garcia, C. N. Hadjicostis, Distributed algorithms forcontrol of demand response and distributed energy resources, in: Proc. ofthe 50th IEEE Conf. on Decis. and Cont., 2011, pp. 27–32.

[2] I. Baxevanos, D. Labridis, Implementing multiagent systems technologyfor power distribution network control and protection management, IEEETrans. on Pow. Deliv. 22 (1) (2007) 433–443.

[3] M. Nordman, M. Lehtonen, Distributed agent-based state estimation forelectrical distribution networks, IEEE Transactions on Power Systems20 (2) (2005) 652–658.

[4] J. Kok, M. Scheepers, I. Kamphuis, Intelligence in Electricity Networksfor Embedding Renewables and Distributed Generation, Springer, 2010,Ch. 8, pp. 179–209.

[5] S. Miller, S. D. Ramchurn, A. Rogers, Optimal decentralised dispatch ofembedded generation in the smart grid, in: Proc. of AAMAS 2012, 2012,pp. 281–288.

[6] B. Shaw, V. Mukherjee, S. P. Ghoshal, Solution of economic dispatchproblems by seeker optimization algorithm, Expert Systems with Appli-cations 39 (2012) 508–519.

[7] E. Dall’Anese, H. Zhu, G. B. Giannakis, Distributed optimal power flowfor smart microgrids, IEEE Transactions on Smart Grid 4 (3) (2013)1464–1475.

[8] R. Baldick, B. H. Kim, C. Chase, Y. Luo, A fast distributed implementa-tion of optimal power flow, IEEE Transactions on Power Systems 14 (3)(1999) 858–864.

[9] M. Wooldridge, N. R. Jennings, Intelligent agents: Theory and practice,Knowledge Engineering Review 10 (1995) 115–152.

[10] S. D. Ramchurn, P. Vytelingum, A. Rogers, N. R. Jennings, Putting the‘smarts’ into the smart grid: A grand challenge for artificial intelligence,Communications of the ACM 55 (2011) 86–97.

[11] M. E. Baran, I. M. El-Markabi, A multiagent-based dispatching schemefor distributed generators for voltage support on distribution feeders,IEEE Transactions on Power Systems 22 (2007) 52–59.

[12] D. Ye, M. Zhang, D. Sutanto, A hybrid multiagent framework with q-learning for power grid systems restoration, IEEE Transactions on PowerSystems 26 (2011) 2434–2441.

[13] F. Ren, M. Zhang, D. Sutanto, A multi-agent solution to distribution sys-tem management by considering distributed generators, IEEE Transac-tions on Power Systems 28 (2013) 1442–1451.

[14] Q. Wu, Y. Cao, J. Wen, Optimal reactive power dispatch using an adap-tive genetic algorithm, International Journal of Electrical Power & EnergySystems 20 (1998) 563–569.

[15] V. Miranda, D. Srinivasan, L. Proenca, Evolutionary computation in pow-er systems, International Journal of Electrical Power & Energy Systems20 (1998) 89–98.

[16] Y. delValle, G. K. Venayagamoorthy, S. Mohagheghi, J. C. Hernandez,R. G. Harley, Particle swarm optimization: Basic concepts, variants andapplications in power systems, IEEE Transactions on Evolutionary Com-putation 12 (2008) 171–195.

[17] B. Zhao, C. X. Guo, Y. J. Cao, A multiagent-based particle swarm op-

15


timization approach for optimal reactive power dispatch, IEEE Transac-tions on Power Systems 20 (2005) 1070–1078.

[18] M. A. Abido, Multiobjective evolutionary algorithms for electric powerdispatch problem, IEEE Transactions on Evolutionary Computation 10(2006) 315–329.

[19] C. Dai, W. Chen, Y. Zhu, X. Zhang, Seeker optimization algorithm foroptimal reactive power dispatch, IEEE Transactions on Power Systems24 (2009) 1218–1231.

[20] B. Shaw, V. Mukherjee, S. P. Ghoshal, Seeker optimisation algorithm:Application to the solution of economic load dispatch problems, IET Gen-er. Transm. Distrib. 5 (2011) 81–91.

[21] A. D. Dominguez-Garcia, C. N. Hadjicostis, Coordination and controlof distributed energy resources for provision of ancillary services, in:The First IEEE International Conference on Smart Grid Communications,2010, pp. 537–542.

[22] A. H. Khazali, M. Kalantar, Optimal reactive power dispatch based onharmony search algorithm, Electrical Power and Energy Systems 33(2011) 684–692.

[23] P. Vytelingum, S. D. Ramchurn, T. D. Voice, A. Rogers, N. R. Jennings,Trading agents for the smart electricity grid, in: Proc. of AAMAS 2010,2010, pp. 897–904.

[24] M. Kraning, E. Chu, J. Lavaei, S. Boyd, Message passing for dynamicnetwork energy management, Tech. rep., Information Systems Laborato-ry, Stanford University (2012).

[25] A. D. Dominguez-Garcia, S. T. Cady, C. N. Hadjicostis, Decentralizedoptimal dispatch of distributed energy resources, in: Proc. of the 51stIEEE Conf. on Decision and Control, 2012, pp. 3688–3693.

[26] G. Cohen, Auxiliary problem principle and decomposition of optimiza-tion problems, Journal of Optimization Theory and Applications 32 (3)(1980) 277–305.

[27] B. Kim, R. Baldick, Coarse-grained distributed optimal power flow, IEEETransactions on Power Systems 12 (2) (1997) 932–939.

[28] G. Hug-Clanzmann, G. Andersson, Decentralized optimal power flowcontrol for overlapping areas in power systems, IEEE Transactions onPower Systems 24 (1) (2009) 327–336.

[29] F. J. Nogales, F. J. Prieto, A. J. Conejo, A decomposition methodologyapplied to the multi-area optimal power flow problem, Annals of Opera-tions Research 120 (2003) 99–116.

[30] A. Mohsenian-Rad, V. W. S. Wong, J. Jatskevich, R. Schober, A. Leon-Garcia, Autonomous demand-side management based on game-theoreticenergy consumption scheduling for the future smart grid, IEEE Trans. onSmart Grid 1 (3) (2010) 320–331.

[31] A. Nedic, A. Ozdaglar, Approximate primal solutions and rate analy-sis for dual subgradient methods, SIAM Journal on Optimization 19 (4)(2009) 1757–1780.

[32] S. H. Low, D. E. Lapsley, Optimization flow control - i: Basic algorithmand convergence, IEEE/ACM Transactions on Networking 7 (6) (1999)861–874.

[33] R. Srikant, The Mathematics of Internet Congestion Control, Springer,2004.

[34] M. Sims, C. Goldman, V. Lesser, Self-organization through bottom-upcoalition formation, in: Proc. Intl Joint Conf. Autonomous Agents andMultiagent Systems (AAMAS 03), 2003, pp. 867–874.

[35] O. M. Shehory, K. Sycara, S. Jha, Multi-agent coordination through coali-tion formation, Intelligent Agents IV Agent Theories, Architectures, andLanguages 1365 (1998) 143–154.

[36] O. Shehory, S. Kraus, Formation of overlapping coalitions forprecedence-ordered task-execution among autonomous agents, in: Proc.of International Conference on Multi-Agent Systems, 1996, pp. 330–337.

[37] R. G. Smith, The contract net protocol: High-level communication andcontrol in a distributed problem solver, IEEE Trans. on Computers C-29(1980) 1104–1113.

[38] D. Ye, M. Zhang, D. Sutanto, Self-adaptation-based dynamic coalitionformation in a distributed agent network: A mechanism and a brief survey,IEEE Trans. on Paral. and Distr. Syst. 24 (2013) 1042–1051.

[39] A. Rubinstein, Perfect equilibrium in a bargaining model, Econometrica50 (1982) 97–109.

[40] B. An, V. Lesser, D. Irwin, M.Zink, Automated negotiation with decom-mintment for dynamic resource allocation in cloud computing, in: Proc.of AAMAS 2010, 2010, pp. 981–988.

[41] B. An, V. Lesser, K. M. Sim, Strategic agents for multi-resource negotia-

tion, AAMAS 23 (2011) 114–153.[42] U. Endriss, Monotonic concession protocols for multilateral negotiation,

in: Proc. of the 5th AAMAS, 2006, pp. 393–399.[43] C. Watkins, P. Dayan, Q-learning, Machine Learning 8 (1992) 279–292.[44] L. P. Kaelbling, M. L. Littman, A. W. Moore, Reinforcement learning: A

survey, Journal of AI Research 4 (1996) 237–285.[45] M. Tokic, G. Palm, Value-Difference Based Exploration: Adaptive Con-

trol between Epsilon-Greedy and Softmax, Springer Berlin, 2011, Ch. KI2011: Advances in Artificial Intelligence, pp. 335–346.

[46] S. Thrun, Efficient exploration in reinforcement learning, Tech. rep.,Carnegie Mellon University (1992).

[47] M. Wiering, Explorations in efficient reinforcement learning, Ph.D. the-sis, University of Amserdam (1999).

[48] M. Heger, Consideration of risk in reinforcement learning, in: Proc. of11th ICML, San Francisco, CA, 1994, pp. 105–111.

[49] G. Gallo, B. Simeone, On the supermodular knapsack problem, Mathe-matical Programming Studies 45 (1988) 295–309.

[50] T. Hester, M. Lopes, P. Stone, Learning exploration strategies in model-based reinforcement learning, in: Proc. of AAMAS 2013, 2013, pp.1069–1076.

16

Documents

Journal of - uow.edu.auminjie/pub-ps.dir/Power-Dispatch.pdf · Journal of Parallel and Distributed Computing 00 ... leader that knows the number of resources which must be col-