New Mexico State Universitywyeoh/DCR2013/docs/dcr... · Preface The Distributed Constraint Reasoning (DCR) workshop, now in its fourteenth edition, continues a long sequence of meetings

IJCAI 2013 Workshop

Distributed Constraint Reasoning

Beijing, August 4, 2013

This workshop is dedicated to Pragnesh Jay Modi.

In memoriam.

William Yeoh and Roie Zivan,

Workshop Organizers.

Preface The Distributed Constraint Reasoning (DCR) workshop, now in its fourteenth edition,

continues a long sequence of meetings dedicated to distributed constraint satisfaction and

optimization. These problems arise when variables or constraints are distributed among

several autonomous agents and cannot be solved in a centralized form. Historically, the

workshop has rotated among the conferences CP (2000, 2004, 2007), IJCAI (2001, 2003,

2005, 2007, 2009, 2011), ECAI (2006), and AAMAS (2002, 2006, 2008). This edition is

associated with the IJCAI 2013 conference.

In this edition, we received 7 submissions. Each was reviewed by three members of the PC,

and finally 6 papers were accepted for presentation. All 6 papers are full length (15 pages).

Papers consider the following topics: distributed constraint satisfaction and optimization

(both systematic and local search), robustness, uncertainty, new approaches, and applications.

Papers are allocated 20 minutes for presentation plus 5 minutes for questions.

Because of the small number of submissions and accepted papers, we decided to structure the

workshop differently than in previous workshops in order to have more discussions on topics

that concern the DCR community. Thus, we plan to have three discussions on subjects such

as cooperation, innovative algorithmic approaches, and new approaches motivated by new

applications.

Following the previous workshop, this workshop is dedicated to Pragnesh Jay Modi, who

suddenly passed away on April 9, 2007. Jay was an important member of this small

community. He was one of the creators of the well-known ADOPT algorithm, which was

the main topic of his PhD thesis written under the advisement of Wei-Min Shen and Milind

Tambe in 2003, at USC. After his degree, Jay moved to CMU as a post-doctoral researcher,

and in 2005 Jay enlisted in Drexel University, where he was an assistant professor. Jay

contributed to the sequence of DCR workshops, as an author and as a PC member, with a

bright mind and many research ideas. There is no better occasion than during a DCR workshop

to acknowledge his contribution.

William Yeoh & Roie Zivan,

Workshop organizers

�

Workshop Organization

Programme Chairs

William Yeoh

Roie Zivan

Programme Committee

Juan Antonio Rodriguez Aguilar

Christian Bessiere

Jesus Cerquides

Boi Faltings

Alessnadro Farinelli

Rachel Greenstadt

Katsutoshi Hirayama

Manish Jain

Christopher Kiekintveld

Amnon Meisels

Pedro Meseguer

Marius Silaghi

Matthew E. Taylor

Meritxell Vinyalls Makoto Yokoo

Table of Contents

Pseudo-Tree Based Hybrid Algorithm for Distributed Constraint Optimization……………...………..1

Tenda Okimoto, Makoto Yokoo, Yuko Sakuri and Katsumi Inoue.

Combining Fairness and Efficiency in Dynamic Task Allocation with Spatial and Temporal

Constraints……………………………………………………………………………………..……….16

Sofia Amador, Steven Okamoto and Roie Zivan

Distributed Gibbs: A Memory-Bounded Sampling-Based DCOP Algorithm………………….………30

Nguyen Duc Thien, William Yeoh and Hoong Lau

Solving Customer-Driven Microgrid Optimization Problems as DCOPs……………………….……..45

Saurabh Gupta, Palak Jain, William Yeoh, Satish Ranade and Enrico Pontelli

Large Scale Multi-Agent-Based Simulation using NetLogo for implementation and

evaluation of the distributed constraints………………………………………………………….….…60

Ionel Muscalagiu, Horia Popa and Jose Vidal

Applying Max-Sum to DCOP_MST………………………………………………………………..…..75

Harel Yedidsion and Roie Zivan

Pseudo-Tree Based Hybrid Algorithm forDistributed Constraint Optimization

Tenda Okimoto∗ †, Makoto Yokoo††, Yuko Sakurai††, and Katsumi Inoue†

∗Transdisciplinary Research Integration Center, Tokyo 1018430, Japan†National Institute of Informatics, Tokyo 1018430, Japan

††Kyushu University, Fukuoka 8190395, Japan∗ †{tenda,inoue}@nii.ac.jp, ††{ysakurai,yokoo}@inf.kyushu-u.ac.jp

Abstract. A Distributed Constraint Optimization Problem (DCOP) isa fundamental problem that can formalize various applications relatedto multi-agent cooperation. Considering pseudo-tree based search algo-rithms is important in DCOPs, since their memory requirements arepolynomial in the number of agents. However, these algorithms requirea large run time. Thus, how to speed up pseudo-tree based search al-gorithms is one of the major issues in DCOPs. In this paper, we pro-pose a novel hybrid algorithm which combines a complete algorithm withan incomplete algorithm. Specifically, we use a state-of-the-art completesearch algorithm (BnB-ADOPT) and utilize the bounds obtained by anapproximate algorithm (p-optimal algorithm) in preprocessing. In theevaluations, we show that this hybrid algorithm outperforms the state-of-the-art DCOP search algorithm. Furthermore, we verify experimen-tally that a pseudo-tree based approximate algorithm is well-suited witha pseudo-tree based search algorithm.

1 Introduction

A Distributed Constraint Optimization Problem (DCOP) [11] is a fundamentalproblem that can formalize various applications related to multi-agent cooper-ation. A DCOP consists of a set of agents, each of which needs to decide thevalue assignment of its variables so that the sum of the resulting rewards/costsis optimized. Many application problems in multi-agent systems can be formal-ized as DCOPs, e.g., distributed resource allocation problems including sensornetworks [6], meeting scheduling [7], and the synchronization of traffic lights [5].

It is important to develop a complete algorithm for DCOPs. Various com-plete algorithms have been developed, e.g., ADOPT [11], BnB-ADOPT [17],DPOP [13], and OptAPO [8]. ADOPT is one of the pioneering DCOP searchalgorithms and BnB-ADOPT is the state-of-the-art DCOP search algorithm.These two algorithms use a graph structure called a pseudo-tree [15] and findan optimal solution. ADOPT and BnB-ADOPT have identical memory require-ments and communication frameworks. The main difference is their search strate-gies. ADOPT employs a best-first search strategy while BnB-ADOPT utilizes a

1

depth-first branch-and-bound search strategy. DPOP is a representative pseudo-tree based inference algorithm that adapts the bucket elimination principle [2] toa distributed setting. DPOP works on a DFS traversal of the constraint graph.

Considering a pseudo-tree based search algorithm is important in DCOPs.Since each agent has only a fixed amount of available memory, solving a DCOPwith memory bounded algorithms is desirable. The memory requirements ofpseudo-tree based search algorithms such as ADOPT and BnB-ADOPT arepolynomial in the number of agents.

However, a pseudo-tree based search algorithm requires a large run time. Thenumber of messages sent between agents is exponential in the number of agentsin the worst case. Thus, how to speed up pseudo-tree based search algorithms isone of the major issues in DCOPs.

In this paper, we propose a novel hybrid algorithm which combines a completealgorithm with an incomplete algorithm. Specifically, we use a state-of-the-artcomplete search algorithm BnB-ADOPT and utilize the bounds obtained by apseudo-tree based p-optimal algorithm [12] in preprocessing. In the evaluations,we show that this hybrid algorithm outperforms a state-of-the-art DCOP searchalgorithm. Furthermore, we verify experimentally that a pseudo-tree based ap-proximate algorithm is well-suited with a pseudo-tree based search algorithm.

Several preprocessing techniques have been introduced to speed up DCOPsearch algorithms [1, 3, 9]. In BnB-ADOPT, once an approximate solution isgiven, it is used for pruning, i.e., in principle, BnB-ADOPT can be combined withany approximate algorithm. In this paper, we focus on the p-optimal algorithmwhich is a pseudo-tree based approximate algorithm. Our working hypothesisis that, a pseudo-tree based approximate algorithm should have a good chem-istry with a pseudo-tree based search algorithm. More specifically, the p-optimalalgorithm provides the information on lower and upper bounds for each agentof a pseudo-tree. Compared to using the information on a global solution, wecan expect the synergy effect when we utilize the information obtained by thep-optimal algorithm. We verify this hypothesis experimentally.

Related Work

ADOPT-DP2 [1] uses dynamic programming based preprocessing technique andsearch algorithm1. This technique evaluates the global lower bound for all con-straints. The bound is passed up to the root node of the pseudo-tree and then isused to guide the search thresholds in ADOPT. This technique gives the opti-mal solution in the case that a pseudo-tree has no back edges. When a pseudo-tree has back edges, it only estimates the lower bound since ADOPT-DP2 is amemory bounded algorithm. Compared to ADOPT-DP2, our hybrid algorithmutilizes the p-optimal algorithm to generate lower and upper bounds for eachagent of a pseudo-tree. Furthermore, this algorithm uses ADOPT while our hy-brid algorithm utilizes BnB-ADOPT. In our experimentations, we implementBnB-ADOPT-DP2 and use it instead of ADOPT-DP2 for the comparison.1 In [1], three preprocessing techniques have been introduced, DP0, DP1, and DP2that trade-off between the computation time and quality of the lower bound.

2

Our contributions are twofold:

– We develop a novel pseudo-tree based hybrid algorithm which is faster com-pared to the state-of-the-art DCOP search algorithm.

– We show empirically that a pseudo-tree based p-optimal algorithm is well-suited with a pseudo-tree based search algorithm BnB-ADOPT. In our al-gorithm, the p-optimal algorithm can provide the detailed information onlower and upper bounds for each agent of a pseudo-tree, which can be usedfor pruning the search space in BnB-ADOPT.

The rest of this paper is organized as follows. Section 2 formalizes DCOPand describes existing approximate and search algorithms for DCOPs. Section 3introduces a novel hybrid algorithm for DCOPs, Section 4 evaluates the perfor-mance of our hybrid algorithm. Finally, we conclude this paper in Section 5 andprovide some perspectives for future work.

2 Preliminaries

In this section, we briefly describe the formalization of Distributed ConstraintOptimization Problems (DCOPs) and introduce the p-optimal algorithm, ADOPTand BnB-ADOPT which are pseudo-tree based DCOP algorithms.

2.1 DCOP

A Distributed Constraint Optimization Problem (DCOP) is defined by a set ofagents S, a set of variables X, a set of constraint relations C, and a set of rewardfunctions F . An agent i has its own variable xi. A variable xi takes its valuefrom a finite, discrete domain Di. A constraint relation (i, j) means there existsa constraint relation between xi and xj . For xi and xj , which have a constraintrelation, the reward for an assignment {(xi, di), (xj , dj)} is defined by a rewardfunction r(di, dj) : Di ×Dj → R. For a value assignment to all variables A, letus denote

R(A) =∑

(i,j)∈C,{(xi,di),(xj ,dj)}⊆A

r(di, dj).

Then, an optimal assignment A∗ is given as argmaxA R(A), i.e., A∗ is an as-signment that maximizes the sum of the value of all reward functions. A DCOPcan be represented using a constraint graph, in which a node represents anagent/variable and an edge represents a constraint.

Without loss of generality, we make the following assumptions for simplicity.Relaxing these assumptions to general cases is relatively straightforward:

– Each agent has exactly one variable.– All constraints are binary.– Each agent knows all constraint related to its variable.– the maximal value of each reward function is bounded, i.e., we assume ∀i,∀j,

where (i, j) ∈ C, ∀di ∈ Di,∀dj ∈ Dj , 0 ≤ ri,j(di, dj) ≤ rmax holds.

3

Fig. 1. Figure (left) shows a DCOP with three variables x1, x2 and x3. Figure (right)represents a pseudo-tree based on the total ordering < x1, x2, x3 > where x1 is the rootnode.

Example 1 (DCOP). Figure 1 (left) shows a DCOP with three variables x1, x2

and x3. r(xi, xj) is a reward function where i < j. Each variable takes its valueassignment from a discrete domain {a, b}. The optimal solution that maximizesthe sum of the value of all reward functions is {(x1, b), (x2, a), (x3, a)}, and theoptimal value is twelve.

A pseudo-tree [15] is a graph structure, which is widely used in DCOP algo-rithms, e.g., ADOPT and BnB-ADOPT. In a pseudo-tree, there exists a uniqueroot node, and each non-root node has a parent node. The pseudo-tree containsall nodes and edges of the original constraint graph, and the edges are categorizedinto tree and back edges.

Example 2 (Pseudo-tree). Figure 1 (right) shows a pseudo-tree based on thetotal ordering < x1, x2, x3 >. x1 is the root node of this pseudo-tree. The edgebetween x1 and x3 represents a back edge and others are tree edges.

2.2 p-optimal algorithm

The p-optimal algorithm [12] is an approximate DCOP algorithm that can pro-vide guarantees on the quality of the solutions. This algorithm is based on apseudo-tree. It is a one-shot type algorithm, which runs in polynomial-time inthe number of agents n assuming p is fixed. In the p-optimal algorithm, agentscan adjust parameter p so that they can trade-off better solution quality againstcomputational overhead.

The basic idea of this algorithm is that we remove several edges from aconstraint graph so that the induced width [2] of the remaining graph is bounded.Then, we compute the optimal solution (p-optimal solution) of the remaininggraph, which is used as the approximate solution of the original graph. Inducedwidth can be used as a measure for checking how close a given graph is to a

4

Fig. 2. Example for p=1-optimal algorithm

tree. For example, if the induced width of a graph is one, it is a tree. Also, theinduced width of a complete graph with n variables is n− 1.

Definition 1 (Width of pseudo-tree). For a pseudo-tree and a node xi, wecall the number of xi’s ancestors as the width of xi.

Definition 2 (Induced width of pseudo-tree). For a pseudo-tree, we callthe maximal number of width of all nodes as the induced width of the pseudo-tree.

The p-optimal algorithm has the following two phases:

Phase 1: Generate a subgraph by removing several edges, so that the inducedwidth of the remaining graph is bounded by parameter p.

Phase 2: Find an optimal solution to the graph obtained in Phase 1 using apseudo-tree based complete DCOP algorithm.

In Phase 1, this algorithm simplifies a problem/pseudo-tree instance by re-moving some edges so that (i) it can solve the simplified problem efficiently, and(ii) it can bound the difference between the solution of the simplified problemand an optimal solution.

Let us describe how we remove the edges from the original pseudo-tree. Fornode i and j, we say an edge (i, j) is a back-edge, if j is i’s ancestor (butnot i’s parent). Also, when (i, j1), (i, j2), . . . , (i, jk) are all back-edges of i, andj1 ≺ j2 ≺ . . . ≺ jk holds, where ji ≺ ji+1 means that ji appears before ji+1 in theordering, we call (i, j1), (i, j2), . . . , (i, jk) as first back-edge, second back-edge,. . ., k-th back-edge, respectively. Clearly, a node has at most w∗ − 1 back-edges,where w∗ is the induced width of the pseudo-tree. For obtaining the pseudo-treewhose induced width is p, each agent i simply removes its first back-edge, second

5

back-edge, . . . , (w∗ − p)-th back-edge. Intuitive, we remove its back-edges fromoutside of original pseudo-tree.

In Phase 2, any complete DCOP algorithms can be utilized to find a p-optimal solution. The p-optimal algorithm uses the obtained p-optimal solutionas an approximate solution of the original graph. In particular, since we alreadyobtained a pseudo-tree whose induced width is bounded, utilizing pseudo-tree-based DCOP algorithms would be convenient. In this paper, we use a represen-tative pseudo-tree based inference algorithm DPOP [13] for solving a p-optimalsolution in Phase 2.

Example 3 (p=1-optimal algorithm). Figure 2 (i) shows a DCOP with threevariables. The induced width of (i) is two. Assume that we want to have a p=1-optimal solution. (ii) shows the remaining graph which is obtained by removingthe back edge between x1 and x3 from (i). p=1-optimal solutions of (ii) are{(x1, a), (x2, b), (x3, a)} and {(x1, b), (x2, a), (x3, b)}. The approximate valuesof (i), i.e., the sum of the rewards obtained by p=1-optimal solutions, are tenand nine.

2.3 ADOPT

The Asynchronous Distributed OPTimization (ADOPT) [11] is one of the rep-resentative DCOP search algorithms. ADOPT utilizes a pseudo-tree and findsan optimal solution employing a best-first search strategy.

We briefly describe the execution of ADOPT.

1. Each node evaluates the cost of the current solution and the cost allocation.The node selects the value of its variable according to the evaluation. Thevalue is notified to descendants which are related by constraints (VALUEmessage).

2. Each node notifies the cost of the current solution to its parent (COSTmessage).

3. Each node decides the cost allocation between itself and its children. Thecost allocation is notified to children (THRESHOLD message).

After repeating the above process, the lower and upper bounds of evaluated costbecome equal in the root node. The root node then selects the optimal value ofits variable and notifies termination to children. The children search the optimalvalue of their variables and terminate. Finally, all nodes terminate and theirallocated values are the optimal solution. The number of messages sent betweenagents can be exponential in the number of agents that is given by O(|Dmax|n),where |Dmax| is the maximal domain size and n is the number of agents.

2.4 BnB-ADOPT

The Branch-and-Bound ADOPT (BnB-ADOPT) [17] is the state-of-the-art DCOPsearch algorithm. This algorithm utilizes a pseudo-tree and finds an optimal so-

6

lution using depth-first branch-and-bound search strategy. BnB-ADOPT is in-troduced as a search algorithm for minimization problems 2. In this paper, wemodify BnB-ADOPT for solving a maximization problem.

BnB-ADOPT is quite similar to ADOPT. This algorithm shares most of thedata structures and messages of ADOPT. The main difference is their searchstrategies. ADOPT employs a best-first search strategy while BnB-ADOPT uti-lizes a depth-first branch-and-bound search strategy. Also, it has been shown [17]that BnB-ADOPT outperforms ADOPT. The worst case complexity of BnB-ADOPT is the same as ADOPT, which is given by O(|Dmax|n).

3 Pseudo-Tree Based Hybrid Algorithm

In this section, we develop a novel pseudo-tree based hybrid algorithm calledBnB-ADOPTp which utilizes the p-optimal algorithm and BnB-ADOPT. Thebasic idea of this hybrid algorithm is that we use the p-optimal algorithm inpreprocessing and generate lower and upper bounds for each agent. The boundsare then used to guide the search thresholds in BnB-ADOPT. In this algorithm,we assume that each agent knows the pseudo-tree of a constraint graph andrmax, i.e., the maximal value of each reward function.

This hybrid algorithm has the following two phases:

Phase 1: Find a p-optimal solution of the simplified problem by removing sev-eral constraints from a problem.

Phase 2: Find an optimal solution to the original problem using the informa-tion obtained by Phase 1.

In Phase 1, we compute a p-optimal solution of the sub-tree obtained byremoving several edges from a pseudo-tree, i.e., we compute an optimal solutionof the remaining graph and use it as the lower bound of the original graph. Then,each node/agent i has the following information.

– A∗i,p: p-optimal solution of sub-tree rooted at i.

– R(A∗i,p): reward obtained by A∗

i,p.– d∗i,p: value of xi in A∗

i,p.– mi: the number of removed back edges from sub-tree rooted at i in Phase 1.

In Phase 2, we utilize the information obtained by Phase 1, i.e., A∗i,p, R(A∗

i,p),d∗i,p and mi, and find an optimal solution to the original problem. More specifi-cally, we use the following information as an initial value in BnB-ADOPT.

– d = d∗i,p: value assignment of xi is d.– THi = R(A∗

i,p): threshold THi is given by R(A∗i,p).

– ubi = R(A∗i,p) + (mi × rmax): upper bound of xi is given by the sum of

R(A∗i,p) and (mi × rmax).

2 Most search algorithms, e.g., ADOPT and BnB-ADOPT, have been developed forsolving a minimization problem.

7

We use the following notations. For each node i,

– Ci: set of children of i in a pseudo-tree.– CDi: set of its descendants (including its children) that it is involved in edges

with.– pi: parent of i.– Ai: set of its ancestors (including its parent).– SCAi: set of its ancestors (including its parent) that it or one of its descen-

dants is involved in edges with.– CAi: set of its ancestors (including its parent) that it is involved in edges

with.

The pseudo-code of this algorithm is quite similar to that of BnB-ADOPT.The significant differences are that this algorithm solves a maximization problemwhile BnB-ADOPT solves a minimization problem 3. Furthermore, this algo-rithm uses the detailed information for a p-optimal solution obtained by Phase1. More specifically, each agent has the detailed upper bound as an initial value(line 16) while each agent in BnB-ADOPT has its upper bound ∞.

Each agent i maintains a threshold THi(= R(A∗i,p)) which is initialized to

the sum of the rewards of sub-tree rooted at i (line 5). The threshold of theroot agent is p-optimal value which is used for pruning during the depth-firstsearch. Each agent i uses the condition max{THi, LBi} to determine whetherit should change its own value. If it holds UBi(di) ≤ max{THi, LBi}, then ittakes on the new value di = argmaxd∈Di{UBi(d)} (line 25 and 26). When agent isends a VALUE message to its child i′ ∈ Ci, the message includes the thresholdmax{THi, UBi} − δi(di) −

∑i′′∈Ci,i′′ 6=i′ ubi,i′′(di) for the child (line 31). This

threshold is chosen such that UBi(di) for the agent reaches max{THi, LBi}and the agent thus takes on a new value when LBi′ for the child reaches thisthreshold. The agent i changes its context when it receives a VALUE messagewith a context which is different from its current context. The threshold is alsochanged to the new threshold in the VALUE message. The agent i initializeslbi,i′(d) = hi,i′(d), ubi,i′(d) = R(A∗

i,p) + (mi × rmax) (line 15 and 16), LBi(d),UBi(d), LBi and UBi for all values d ∈ Di and children i′ ∈ Ci (line 20-24). Theagent i takes on the new value di = argmaxd∈Di{UBi(d)} (line 26) and repeatsthe process until terminate execution.

Pseudo-code of BnB-ADOPTp

procedure Start()[01] Find a p-optimal solution;[02] d := d∗i,p;[03] Xi := {(i′, d∗i′,p, 0) | i′ ∈ SCAi};3 It is possible to solve a minimization problem with our algorithm. In this paper,since p-optimal algorithm solves maximization problems, we modified BnB-ADOPTto solve maximization problems. Otherwise, we use the original BnB-ADOPT andshould modify the p-optimal algorithm for minimization problems.

8

[04] IDi := 1;[05] THi := R(A∗

i,p);[06] forall i′ ∈ Ci, d ∈ Di

[07] InitChild(i′, d);[08] Backtrack();[09] loop forever[10] if(message queue is not empty)[11] while(message queue is not empty)[12] pop msg off message queue;[13] When Received (msg);[14] Backtrack();

procedure InitChild (i′, d)[15] lbi,i′(d) := hi,i′(d);[16] ubi,i′(d) := R(A∗

i,p) + (mi × rmax);

procedure InitSelf()[17] di := argmaxd∈Di{δi(d) +

∑i′∈Ci

lbi,i′(d)};[18] IDi := IDi + 1;[19] THi := R(A∗

i,p);

procedure Backtrack()[20] forall d ∈ Di

[21] LBi(d) = δi(d) +∑

i′∈Cilbi,i′(d);

[22] UBi(d) = δi(d) +∑

i′∈Ciubi,i′(d);

[23] LBi = maxd∈Di{LBi(d)};[24] UBi = maxd∈Di{UBi(d)};[25] if(UBi(di) ≤ max{THi, LBi})[26] di := argmaxd∈Di{UBi(d)}[27] IDi := IDi + 1;[28] if((i is root and UBi = LBi) or termination received)[29] Send (TERMINATE) to each i′ ∈ Ci;[30] terminate execution;[31] Send (VALUE, i, di, IDi, max{THi, LBi} − δi(di)

−∑

i′′∈Ci,i′′ 6=i′ ubi,i′′(di)) to each i′ ∈ Ci;

[32] Send (VALUE, i, di,IDi, ∞) to each i′ ∈ CDi \ Ci;[33] Send (COST, i, Xi, LBi, UBi) to pi if i is not root;

procedure When Received (VALUE, p, dp, IDp, THp)[34] X ′ := Xi;[35] PriorityMerge ((p, dp, IDp), Xi);[36] if(!Compatible (X ′, Xi))[37] forall i′ ∈ Ci, d ∈ Di

[38] if(p ∈ SCAi′)[39] InitChild (i′, d);[40] InitSelf();[41] if(p = pi)[42] THi := THp;

procedure When Received (COST, c, Xc, LBc, UBc)[43] X ′ := Xi;[44] PriorityMerge (Xc, Xi);[45] if(!Compatible(X ′, Xi))

9

[46] forall i′ ∈ Ci, d ∈ Di

[47] if(!Compatible({(i′′, d′′, ID′′) ∈ X ′ | i′′ ∈ SCAi′},Xi))[48] InitChild(i′,d);[49] if(Compatible(Xc, Xi))[50] lbi,c(d) := max{lbi,c(d), LBc}

for the unique (i′, d, ID) ∈ Xc with i′ = i;[51] ubi,c(d) := min{ubi,c(d), UBc}

for the unique (i′, d, ID) ∈ Xc with i′ = i;[52] if(!Compatible(X ′, Xi))[53] InitSelf();

procedure When Received (TERMINATE)[54] record termination message received;

4 Experimental Evaluation

In this section, we evaluate our hybrid algorithm and compare with DCOPsearch algorithms BnB-ADOPT-DP2 and BnB-ADOPT. Since BnB-ADOPToutperforms ADOPT, we implement BnB-ADOPT-DP2 and use it instead ofADOPT-DP2 [1] for our comparison. Note that we solve maximization problems.In our evaluations, we use the following problem instances. The domain size ofeach variable is three, and we choose the reward value uniformly at randomfrom the range [0,1000]. Each data point in a graph represents an average of 50problem instances 4. We generate random graphs with a fixed induced width.For comparison, we mostly use setting p=1. For a minimization problem withpositive cost, one knows the optimistic bound which is zero. On the other hand,there is no corresponding value when we solve a maximization problem. Sincewe know that rmax, i.e., the maximal value of each reward function, is equal to1000, we set (rmax ×m) for the upper bound of BnB-ADOPT instead of ∞ 5,where m is the number of all constraints of a graph. We implemented our hybridalgorithm in Java and carried out all experiments on 2.53GHz core with 4GB ofRAM.

Let us explain how we measure the performace of algorithms in our com-parison. We use Non-Concurrent Constraint Checks (NCCC) [10]. NCCC area weighted sum of processing and communication time. Every agent holds acounter of computation steps. Every message carries the value of the sendingagent’s counter. When an agent receives a message, it stores the data receivedtogether with the corresponding counter. When the agent first uses the receivedcounter it updates its counter to the largest value between its own counter andthe stored counter value which was carried by the message. By reporting the costof the search as the largest counter held by some agent at the end of the search,a measure of non-concurrent search effort that is close to Lamport’s logical time

4 The results are included Phase 1 and 2.5 When we set the upper bound ∞ in maximization problems, BnB-ADOPT is bruteforce approach.

10

0

5000

10000

15000

20000

25000

30000

35000

0 1 2 3 4 5

NC

CC

p

BnB-ADOPT_p

Fig. 3. Results for graphs with 10 nodes, induced width 5, varying the parameter p from0 to 5. When p=0, the result shows the average number of NCCCs in BnB-ADOPT.

is achieved. If instead of steps of computation, the number of NCCC is counted,then the local computational effort of agents in each step is measured.

First, we show that our hybrid algorithm can find an optimal solution quicklywhen the parameter p increases. Figure 3 shows the performance of BnB-ADOPTp

for graphs with 10 nodes, induced width 5, varying the parameter p from 0 to 5.When the parameter p is zero, the result shows the average number of NCCCsin BnB-ADOPT. In case that the parameter p is five, our algorithm uses theinformation for optimal solutions in Phase 2. We can see that the average num-ber of NCCCs in our hybrid algorithm becomes smaller when the parameter pincreases. The average number of NCCCs in BnB-ADOPT is 33887 and 4175 inBnB-ADOPTp=5. This is because the number of removed edges is small, i.e., thep-optimal algorithm can provide more detailed information on lower and upperbounds for BnB-ADOPT (Phase 1). In our hybrid algorithm, we can adjust pa-rameter p, so that we can trade-off smaller run time against memory overhead.If the relaxed problem is not so different from the original problem, i.e., theinduced width is small, our algorithm can find an optimal solution quickly.

Next, we show that our hybrid algorithm outperforms a state-of-the-artsearch algorithm. For the comparison, we use BnB-ADOPT-DP2 and BnB-ADOPT. Figure 4 represents the average number of NCCCs in BnB-ADOPTp=1,BnB-ADOPT-DP2 and BnB-ADOPT for graphs with induced width 3, density0.3, varying the number of nodes. BnB-ADOPTp=1 utilizes the information ofp=1-optimal solution, i.e., the optimal solution of the tree obtained by removingseveral edges from a pseudo-tree. We can see that BnB-ADOPTp=1 outperformsBnB-ADOPT-DP2 and BnB-ADOPT. When the number of nodes is 19, the av-erage number of NCCCs is 28063 in BnB-ADOPTp=1, 33567 in BnB-ADOPT-

11

0

10000

20000

30000

40000

50000

60000

11 13 15 17 19

NC

CC

Number of nodes

BnB-ADOPT_p=1BnB-ADOPT

BnB-ADOPT-DP2

Fig. 4. Comparison of BnB-ADOPTp=1, BnB-ADOPT-DP2 and BnB-ADOPT forgraphs with induced width 3, density 0.3, varying the number of nodes.

DP2, and 51306 in BnB-ADOPT. BnB-ADOPTp=1 performs approximately 16%better than BnB-ADOPT-DP2 and 45% better compared to BnB-ADOPT. Also,in case that the number of nodes is 11, the average number of NCCCs is 344in BnB-ADOPTp=1, 412 in BnB-ADOPT-DP2, and 640 in BnB-ADOPT. BnB-ADOPTp=1 performs approximately 16% better than BnB-ADOPT-DP2 and46% better compared to BnB-ADOPT. We obtained similar results varying thedensity and the induced width.

Furthermore, we verify our hypothesis experimentally, i.e., a pseudo-treebased approximate algorithm should have a good chemistry with a pseudo-treebased search algorithm. Figure 5 represents the average number of NCCCs inBnB-ADOPTp=1 for pseudo-trees where the depth is nine, with 15 nodes, in-duced width 3, varying the depth from 0 to 9. In case that the depth is zero, BnB-ADOPTp=1 does not use the information obtained by Phase 1, i.e., it behaveslike BnB-ADOPT. When the depth is one, we give the information obtained byPhase 1 only to the root agent of a pseudo-tree, that is, BnB-ADOPTp=1 utilizesonly global lower and upper bounds of all constraints. In case that the depth isnine, the result shows the average number of NCCCs in BnB-ADOPTp=1 whereevery agent uses the information obtained by Phase 1. We can see that theaverage number of NCCCs becomes smaller when the depth increases. This isbecause the p-optimal algorithm can provide the information on lower and upperbounds to each agent of the pseudo-tree which is used in BnB-ADOPT (Phase2), that is, the pseudo-tree based p-optimal algorithm is well-suited with pseudo-tree based search algorithm BnB-ADOPT. Also, in case that the depth is zero,the average number of NCCCs in BnB-ADOPTp=1 (BnB-ADOPT) is smallercompared to that in BnB-ADOPTp=1 where the depth is one. This is because

12

4200

4400

4600

4800

5000

5200

5400

5600

5800

6000

0 1 2 3 4 5 6 7 8 9

NCCC

depth

BnB-ADOPT_p=1

Fig. 5. Results for pseudo-trees whose depth is nine, with 15 nodes, induced width3. When the depth is zero, the result shows the average number of NCCCs in BnB-ADOPT.

the run time of preprocessing does not count since it behaves like BnB-ADOPTwhen the depth is zero.

Moreover, the average number of NCCCs is almost the same when the depthis less than six. However, it reduces extremely at the point where the depth issix. We examined the number of agents who used the information at this critical(tipping) point and it was 9 of 15 agents, i.e., 60% of all agents. We obtainedthe similar results with different parameter p. For BnB-ADOPTp=2, the criticalpoint appears at the point where the depth is five. Our future works include moredetailed analysis for this critical point. We will examine the number of agentswho use the information at the critical point in different parameter settings,e.g., the number of agents, density and induced width, and also in differentgraph structures such as scale-free and small world graphs.

In summary, these experimental results reveal that (i) our hybrid algorithmcan find an optimal solution quickly when the parameter p increases, (ii) BnB-ADOPTp=1 outperforms BnB-ADOPT-DP2 and BnB-ADOPT, and (iii) thepseudo-tree based p-optimal algorithm have a good chemistry with pseudo-treebased BnB-ADOPT.

Let us describe why BnB-ADOPTp=1 outperforms BnB-ADOPT-DP2. Thisis because our algorithm can utilize the detailed information on the lower andupper bounds obtained by the p-optimal algorithm for each agent, which can beused for pruning the search space in BnB-ADOPT. On the other hand, BnB-ADOPT-DP2 can utilize only the global upper bound, i.e., only the root nodehas the information. Our future works include more detailed analysis for pruning,e.g., examining the frequency of pruning and at which node pruning occurs.

13

From the verification results of our hypothesis, we expect that our hybridalgorithm outperforms other hybrid algorithms which utilize BnB-ADOPT andnot pseudo-tree based approximate algorithms, e.g., DALO [16] and the boundedmax-sum algorithm [14]. Since they are not pseudo-tree based algorithms, theycan provide only global upper bounds. On the other hand, our algorithm canprovide lower and upper bounds for each agent.

5 Conclusion

In this paper, we focus on preprocessing and develop a novel pseudo-tree basedhybrid algorithm called BnB-ADOPTp which utilizes the p-optimal algorithmand the representative, state-of-the-art search algorithm BnB-ADOPT. This al-gorithm uses the p-optimal algorithm in preprocessing to generate lower andupper bounds for each agent of a pseudo-tree, which can be used for pruningthe search space in BnB-ADOPT. In the evaluations, we showed that our hybridalgorithm outperforms BnB-ADOPT-DP2 and BnB-ADOPT. Furthermore, weverified experimentally that the pseudo-tree based p-optimal algorithm is well-suited with pseudo-tree based BnB-ADOPT.

As future works, we will examine detailed analysis for critical points in differ-ent parameter settings and graph structures. Furthermore, we will analyze thefrequency of pruning in search space. ADOPT(k) [4] is a search algorithm thatgeneralizes ADOPT and BnB-ADOPT. It behaves like a hybrid algorithm ofADOPT and BnB-ADOPT when 1 < k < ∞. The comparison with ADOPT(k)is our future work. Also, we will compare our algorithm with BnB-ADOPT uti-lizing other preprocessing techniques [3, 9].

Acknowledgment

This research is supported by grant for the Systems Resilience project from theTransdisciplinary Research In- tegration Center in Japan.

References

[1] S. M. Ali, S. Koenig, and M. Tambe. Preprocessing techniques for acceleratingthe DCOP algorithm ADOPT. In Proceedings of the 4th International Conferenceon Autonomous Agents and Multiagent Systems, pages 1041–1048, 2005.

[2] R. Dechter. Constraint Processing. Morgan Kaufmann Publishers, 2003.[3] J. Denzinger and K. Randall. Enhancing treebased (stochastic) search by learn-

ing from previous experience. In Workshop on stochastic search algorithms, pages37–42, 2003.

[4] P. Gutierrez, P. Meseguer, and W. Yeoh. Generalizing ADOPT and BnB-ADOPT.In Proceedings of the 23th International Joint Conference on Artificial Intelli-gence, pages 554–559, 2011.

[5] R. Junges and A. L. C. Bazzan. Evaluating the performance of DCOP algo-rithms in a real world, dynamic problem. In Proceedings of the 7th InternationalConference on Autonomous Agents and Multiagent Systems, pages 599–606, 2008.

14

[6] V. Lesser, C. Ortiz, and M. Tambe, editors. Distributed Sensor Networks: AMultiagent Perspective, volume 9. Kluwer Academic Publishers, 2003.

[7] R. T. Maheswaran, M. Tambe, E. Bowring, J. P. Pearce, and P. Varakantham.Taking DCOP to the real world: Efficient complete solutions for distributed multi-event scheduling. In Proceedings of the 3rd International Conference on Au-tonomous Agents and Multiagent Systems, pages 310–317, 2004.

[8] R. Mailler and V. R. Lesser. Solving distributed constraint optimization problemsusing cooperative mediation. In Proceedings of the 3rd International Conferenceon Autonomous Agents and Multiagent Systems, pages 438–445, 2004.

[9] T. Matsui, M.-C. Silaghi, K. Hirayama, M. Yokoo, and H. Matsuo. Directed softarc consistency in pseudo trees. In Proceedings of the 8th International Conferenceon Autonomous Agents and Multiagent Systems, pages 1065–1072, 2009.

[10] A. Meisels, E. Kaplansky, I. Razgon, and R. Zivan. Comparing performance ofdistributed constraints processing algorithms. In Proceedings of the 13th Interna-tional Workshop on Distributed Constraint Reasoning, pages 86–93, 2011.

[11] P. Modi, W.-M. Shen, M. Tambe, and M. Yokoo. ADOPT: asynchronous dis-tributed constraint optimization with quality guarantees. Artificial Intelligence,161(1-2):149–180, 2005.

[12] T. Okimoto, Y. Joe, A. Iwasaki, M. Yokoo, and B. Faltings. Pseudo-tree-basedincomplete algorithm for distributed constraint optimization with quality bounds.In Proceedings of the 17th International Conference on Princicles and Practice ofConstraint Programming, pages 660–674, 2011.

[13] A. Petcu and B. Faltings. A scalable method for multiagent constraint opti-mization. In Proceedings of the 19th International Joint Conference on ArtificialIntelligence, pages 266–271, 2005.

[14] A. Rogers, A. Farinelli, R. Stranders, and N. Jennings. Bounded approximatedecentralised coordination via the max-sum algorithm. Artificial Intelligence,175(2):730–759, 2011.

[15] T. Schiex, H. Fargier, and G. Verfaillie. Valued constraint satisfaction problems:Hard and easy problems. In Proceedings of the 14th International Joint Conferenceon Artificial Intelligence, pages 631–639, 1995.

[16] M. Vinyals, E. Shieh, J. Cerquides, J. A. Rodriguez-Aguilar, Z. Yin, M. Tambe,and E. Bowring. Quality guarantees for region optimal DCOP algorithms. In Pro-ceedings of the 10th International Conference on Autonomous Agents and Multi-agent Systems, pages 133–140, 2011.

[17] W. Yeoh, A. Felner, and S. Koenig. BnB-ADOPT: An asynchronous branch-and-bound DCOP algorithm. Journal of Artificial Intelligence Research, 38:85–133,2010.

15

Combining Fairness and Efficiency in Dynamic TaskAllocation with Spatial and Temporal Constraints

Sofia Amador, Steven Okamoto and Roie Zivan,Department of Industrial Engineering and Management,

Ben-Gurion University of the Negev,Beer-Sheva, Israel

{amador,okamotos,zivanr}@bgu.ac.il

Abstract. Realistic multiagent team applications often feature distributed dy-namic environments with soft deadlines that penalize late execution of tasks. Thisputs a premium on quickly allocating tasks to agents, but finding the optimal al-location is NP-hard due to temporal and spatial constraints that require tasks tobe executed sequentially by agents.We propose a novel task allocation algorithm that allows tasks to be easily se-quenced to yield high quality solutions by finding allocations that are fair (envy-free), balancing the load and sharing important tasks between agents, and efficient(Pareto optimal). We compute such allocations in polynomial time using a Fishermarket with agents as buyers and tasks as goods, then sequence the allocationsby maximizing utility at each step.We empirically compare our algorithm to two state-of-the-art incomplete meth-ods on synthetic problems and on realistic law enforcement problems inspired byreal police logs. The results show a clear advantage for our algorithm in measurescommonly used by law enforcement authorities.

1 Introduction

Many realistic multiagent team applications feature four properties that make task allo-cation extremely challenging: task and agent heterogeneity, cooperative task execution,spatial and temporal constraints, and a dynamic environment. In a law enforcement ap-plication, police officers conduct routine patrols and respond to reported incidents, eachof which has an importance that ranges from low (e.g., noise complaint) to high (e.g.,murder) and a workload indicating the amount of work that must be completed for theincident to be processed. Depending on their training and experience, officers may havedifferent capabilities for handling each kind of task, and multiple officers may worktogether on especially important tasks. Furthermore, delays in arriving to the scene ofan incident allow perpetrators to escape and situations to escalate. This is reflected by asoft deadline for each task; the utility an officer receives for performing a task decreaseswith the time of his arrival. The time it takes an officer to arrive at a task depends notonly on the relative locations, but also on the sequence of other tasks that the officeris scheduled to perform. Finally, because new incidents may be reported at any time, itis sometimes necessary for officers to interrupt execution of low-priority tasks in orderto respond to new, more important incidents, even though this negatively impacts the

16

performance of the interrupted task. Motivated by this domain, we term the problem offinding an allocation that maximizes team utility under these four conditions a Law En-forcement Problem (LEP), although it also arises in disaster response, multirobot, andmilitary applications.

LEP is naturally modeled as a distributed problem, with the agents making deci-sions on what tasks to perform based on local observations and information receivedfrom other parts of the team. However, current operating procedures dictate that all in-formation be reported to police headquarters where all allocation decisions are made.As technology is integrated into police equipment (such as squad cars), we expect thatorganizational policies will adapt to the new capabilities to reap the benefits of distribu-tion, such as more flexibility and responsiveness, and the avoidance of communicationand computational bottlenecks.

In this paper we take the first step toward proposing a new distributed task alloca-tion algorithm for this problem by presenting a centralized algorithm for solving thedistributed problem. This algorithm serves as a bridge between current police prac-tice and future police practice. It will also serve as a benchmark against which laterdistributed algorithms can be compared, which will be important for establishing thetrade-offs between centralized and distributed approaches. In future work we intend todevelop a distributed version of the algorithm presented in this paper, comparing againstboth state-of-the-art distributed allocation algorithms and our centralized approach.

The LEP is NP-hard even for a single agent, via reduction from the traveling sales-man problem, yet dynamism and soft deadlines require that solutions be found quicklybefore the environment changes or utility is lost due to agents’ late arrivals to tasks.We thus develop a heuristic, FMC TA, to compute high quality solutions in worst-casepolynomial time. Because incidents are reported to police headquarters before officersare dispatched, we focus on a centralized approach.

The key challenge is that an agent’s utility depends not just on its allocated tasks,but on the order in which it performs them. Instead of searching this space directly, weallocate to each agent a set of tasks that can be ordered according to a simple heuris-tic while still maintaining high utility. Our approach borrows two concepts from non-cooperative multiagent systems, envy-freeness and Pareto optimality. An allocation isenvy-free if no agent would prefer to have the allocation of another agent to his own(i.e., no agent envies another), and is Pareto optimal if the team utility (also called socialwelfare) cannot be increased without any agent suffering a decrease in individual utility.We hypothesize that in the LEP, envy-freeness balances the task load among agents toavoid long delays, while Pareto optimality works to allocate tasks to agents with highcapability, and thus solutions with both properties will be of high quality.

A major advantage of this approach is that envy-free, Pareto optimal allocationscan be found in polynomial time by using the Fisher market clearing model from eco-nomics [1]. In the Fisher model, n buyers, each with a personal allotment of money,wish to divide m divisible goods among themselves. Each buyer has a preference vec-tor over the goods that specifies the amount of utility she derives for a single unit of eachgood, with fractional allocations awarding corresponding fractional utilities. A marketclearing solution is a price for each product so that all money is spent and all goods arebought.

17

We model agents as buyers and tasks as goods; preferences are the utilities agentsgain for performing tasks independently. Buyers are allotted the same amount of money1

and the market clearing solution and corresponding allocation is computed. The tasksfor each agent are then ordered according to a greedy heuristic that constructs a path tothe agent’s tasks by maximizing utility at each step, taking travel time, workload, andsoft deadlines into account. Individual tasks, especially important ones, may be dividedbetween multiple agents who share the workload. The construction of the Fisher mar-ket input, computation of the market clearing solution, and ordering of the allocationall take polynomial time. This is superior to heuristics that reduce the running times inmany cases but retain exponential worst case complexity [10, 6].

Our empirical evaluation includes both random synthetic problems and realisticproblems based on real log files provided by the police force in our home city. On bothsetups, we compared the our algorithm with a benchmark incomplete algorithm, sim-ulated annealing [11, 13], and with an enhanced version of the CFLA algorithm usedto solve the coalition formation with spatial and temporal constraints problem (CF-STP) [10]. The results indicate that the proposed approach is effective for dynamic taskallocation. We also show that on realistic problems, our algorithm surpasses competi-tors on various measurements that are used by real-world law enforcement authoritiesto evaluate their performance.

2 Related Work

Market-based approaches for task allocation are typically negotiation- or auction-basedmethods [14, 2] inspired by the way in which businesses supply or purchase goods andservices. In keeping with the business analogy, these approaches are most often used toallocate tasks with hierarchical structures mimicking product supply chains, or used forsystems of self-interested agents, although there have also been applications for coop-erative agents [5]. A more significant shortcoming for LEP is that solution quality orrunning time may be very poor; in some cases termination is not even guaranteed [14].

A similar problem to the problem we address is the coalition formation with spa-tial and temporal constraints problem (CSFTP) [9]. In CFSTP, agents form coalitionsto jointly work on tasks with spatial constraints and deadlines. While we consider onlyadditive utility functions, CFSTP permits arbitrary utility functions for coalitions; how-ever these must be provided as input, which lead to an exponential increase in problemsize in general. More importantly, CFSTP maximizes the number of tasks completedwhile LEP maximizes the utility of tasks completed, and CFSTP considers hard dead-lines (no value after the deadline) while LEP considers soft deadlines (diminishing val-ues). CFSTP was originally solved using the ad hoc heuristic Coalition Formation withLook-Ahead (CFLA) [10], and later solved with the max-sum DCOP algorithm [9, 6].LEP is very challenging for the max-sum algorithm because the soft deadlines lead tocomplete constraint graphs, which max-sum is known to require exponential time tosolve [3]. Furthermore, max-sum is a distributed algorithm. We intend to reconsider acomparison to max-sum when we complete the design and implementation of the dis-

1 This use of money is purely an internal mechanism of the allocation algorithm.

18

tributed version of our algorithm. Thus we compare our algorithm with a version ofCFLA enhanced to handle soft deadlines.

The preliminary work on market clearing (equilibrium) models is more than a cen-tury old; it includes the work of Fisher and Walras from the end of the 19th centuryand the beginning of the 20th century. Later work by Eisenberg and Gale [4] proposeda convex program for solving the Fisher model. The Arrow-Debreu model, in contrastto Walras’s preliminary work, is a market exchange model that guarantees the existenceof a solution under some assumptions. More recent work has focused on developingpolynomial-time algorithms for solving market clearing models [1]. Our work is, to thebest of our knowledge, the first attempt to apply a market clearing approach to realisticdynamic task allocation problems.

An approach that is similar to ours was used by Google to assign TV advertis-ing slots to advertisers through an ascending combinatorial auction [7]. As in the Fishermodel, the proposed auction mechanism increments prices until an equilibrium is reached.The main difference from the Fisher market is that slots are uniquely assigned to ad-vertisers (no sharing of products). Our mechanism allows agents to share tasks, i.e.,collaborate in solving problems that require multiple units to solve. In addition we con-sider temporal and spatial constraints.

3 Law Enforcement Problem

The Law Enforcement Problem (LEP) is to assign and schedule tasks to agents in or-der to maximize the team utility in the presence of soft deadlines, task interruption,and dynamic task arrival. We begin by considering the simpler static problem beforeconsidering the dynamic version.

3.1 Static Problem

In the static LEP there are n cooperative agents (police units) a1, . . . , an ∈ A and mtasks v1, . . . , vm ∈ V (patrols and reported incidents) situated in a city, with the set ofall possible locations denoted by L. The time it takes to travel between two locationsis given by the function ρ : L × L → [0,+∞). With slight abuse of notation we writeρ(ai, vj) or ρ(vj , vj′) to denote the travel times between locations of an agent and atask or between locations of two tasks, respectively.

The function Cap : A × V → [0,+∞) specifies the intrinsic capability of agentsto perform tasks.

There are two kinds of tasks: patrols of neighborhoods and events that require apolice response. Each task vj has an importance I(vj) > 0; patrols are generally ofless importance than events. Events also have a workload w(vj) specifying how muchwork (in time units, e.g., securing a crime scene, taking statements, filling out paper-work) must be performed before the task is completed. Patrols have no workload butare ongoing tasks that are never completed. We assume that each task can be performedby a single agent but that multiple agents can also share a single task with additive con-tributions (e.g., officers interviewing different witnesses); shared events also divide theworkload.

19

An allocation of tasks to agents is denoted by the n × m matrix X where entryxij is the fraction of task vj that is assigned to ai. Because agents can only performa single task at a time, the tasks allocated to agent ai must be ordered into a scheduleσi = (vs1 , t1, t

′1), . . . , (vsMi

, tMi , t′Mi

) which is a sequence of Mi tuples of the form(vsk , tk, t

′k), where vsk is the task performed from time tk to t′k. The time spent on each

task must equal ai’s assigned share of the workload, so that t′k− tk = xiskw(vsk). Dueto spatial constraints (agents can only perform tasks at their current location), we mustfurther have that agents have sufficient time to move between tasks, that is tk+1 − t′k ≥ρ(vsk , vsk+1

) for 1 ≤ k < Mi.The utility that an agent derives from working on a task depends on his capability,

the task’s importance, and the soft deadline function, δ(vj , t) : V × [0,+∞) → (0, 1]where t is the time when the agent begins working on vj and δ is monotonically non-increasing for all t ≥ 0. Using the delay until a task is started is informed by real-world law enforcement settings, where conditions tend to worsen (perpetrators escape,confrontations escalate, etc.) until a police officer arrives. Following consultation withlaw enforcement officers, we use soft deadline function that decays exponentially withtime for events:

δ(vj , t) = βγt

where β ∈ [0, 1] and γ ≥ 0 are constants. There is no deadline for a patrol vj soδ(vj , t) = 1.

When an agent is assigned a fraction of a task, it receives a corresponding fractionof the utility of the task. Thus the total utility that an agent derives from its allocation is

U(ai) =

Mi∑l=1

xilCap(ai, vl)I(vl)δ(vl, tl)

and the total team utility is the sum of utilities for every agent

U(A) =∑ai∈A

U(ai).

3.2 Dynamic Problem

In the dynamic problem, tasks arise over time as incidents are reported to the police ornew patrol routes are added. We denote the arrival time of a task v by α(v). Only tasksthat have arrived can be assigned to agents. We modify the soft deadline to decrease thevalue of tasks starting from their arrival time, so that δ(vj , t) = βγ(t−α(v)) for events(it is unchanged for patrols).

Because the arrival of new tasks is unpredictable, the dynamic problem is repre-sented as a sequence of static problems instantiated when each new task arrives. At thetime a new task arrives, agents may be busy performing an existing task called the cur-rent task and denoted CTi for ai. Agents can interrupt the performance of their currenttask. For example, if an officer is processing a (low importance) loitering complaintwhen a (high importance) murder report is received, he may be ordered to stop what he

20

4

4 0 2

2

Area 2

Area 1 Area 3

Area 4

a1

a3

a2 a4 v2

v3

v4

v1

V6 [2]

V5 [2]

New event

Current event

Police unit

ai

vj

Unit i

Event j

[k] Event type k

Fig. 1. Example 1.

is doing and attend to the murder. However, doing so incurs a penalty, which dependson the task vj and the amount of work ∆w that has already been performed when thetask is interrupted. We denote this by the penalty function π(vj , ∆w). After consulta-tion with law enforcement officials, we assume that the penalty decreases exponentiallywith the amount of work remaining, down to a minimum value that is always incurredfor interrupting a task. This reflects real-world law enforcement events, where the firstfew minutes commonly solve the main issues and the remaining time is mostly formal-ities, which are more easily interrupted. For an event vj this penalty function has theform

π(vj , ∆w) = max{I(vj)c∆w, φ · I(vj)},

where c ∈ [0, 1) and φ > 0 are constants, with φI(vj) denoting the minimum penalty.There is no penalty for interrupting a patrol, so π(vj , ∆w) = 0 in that case.

Letting U ′(ai) =∑Mi

l=1 xilCap(ai, vl)I(vl)δ(vl, tl), the utility for ai’s allocationin the dynamic problem is thus

U(ai) =

{U ′(ai)− π(CTi, ∆w), CTi 6= v1

U ′(ai), otherwise.

3.3 Examples

In this subsection we present two examples of the LEP. There are four types of eventsof decreasing importance, with type 1 (e.g., murder) being most important and type 4(e.g., noise complaint) being least important.

Figure 1 depicts an illustration of a small city of size 4 × 4 (distance units, e.g.,kilometers) divided into 4 square 2× 2 areas, v1, . . . , v4 (neighborhoods), and 4 policeunits, a1, . . . , a4. There are tasks to patrol each neighborhood. In the example, 3 of the4 units are on patrol (CT1 = v1, CT2 = v2, CT4 = v4) and the fourth unit is handlingan event (CT3 = v5), depicted by the unit placed on a square. At this time a new event,v6(the star), is reported. The new event is in Neighborhood 3, and the closest unit isa3. However, a3 is currently handling a type 2 event. Another option is to send one ofthe units that are on patrol. Units a1 and a4 are located at distance d16 = 1.58 and

21

4

4 0 2

2

Area 2

Area 1 Area 3

Area 4

a1

a3

a2

a4 v2

v3

v4

v1

v6[2]

v8[2]

V5 [4]

v9[1]

New event

Current event

Police unit

ai

vj

Unit i

Event j

[k] Event type k

v7[4]

Fig. 2. Example 2.

d46 = 1.63, respectively, from l6. Thus, we prefer to allow a3 to complete the handlingof v5 and assign unit a1 to the new event v6.

A second example is depicted in Figure 2. Here, all 4 units are handling events:CT1 = v6, CT2 = v8, CT3 = v5, CT4 = v7. This time the new event v9 is of type 1.This event has high magnitude, and if we had free units we would probably considerassigning two units to handle it. However, all units are busy and we need to compromise.The closest unit is a2 with distance d29 = 1.37, but unit a2 is handling a type 2 event.The second closest unit is a4 at distance d49 = 1.71 and currently handling a type 4event. In this situation we have to decide whether to immediately dispatch one or moreunits to the new task, or to wait until one of them has completed its current task. Thisdepends in part on the penalty that incurred by interrupting each task.

4 FMC-Based Task Allocation

In this section we propose an innovative task allocation algorithm, FMC TA, that isbased on Fisher market clearing (FMC) and designed to solve law enforcement prob-lems (LEP) in polynomial time. The stages of the algorithm are:

1. Generate a Fisher market instance.2. Find a Fisher market clearing solution and acquiring an allocation of tasks to agents.3. Order the tasks allocated to each agent.

A standard Fisher market includes n buyers, each endowed with an amount ofmoney, and m goods. An n × m utility matrix R represents the preferences of buy-ers over products. A market clearing solution includes a price vector p specifying aprice pj for each good j that allows each buyer i to spend all her money on goods thatmaximize bang-per-buck (rij/pj) while all goods in the market are sold. An FMC allo-cation is an n×mmatrixX in which each entry 0 ≤ xij ≤ 1 determines the fraction ofgood j allocated to buyer i. It is determined according to the fractions of the prices ofgoods paid by the agents in the market clearing solution. An FMC allocation is Paretooptimal and, when monetary endowments are equal, envy-free as well [12].

A natural way to represent an LEP as a Fisher market is to represent agents andtasks as buyers and goods, respectively, and to endow each agent with an equal amount

22

of money. The main difficulty is to construct the matrixR such that the resulting alloca-tion will achieve our goal. We note thatR can only specify personal (unary) preferencesover tasks, i.e., it does not allow the representation of preferred orders over tasks allo-cated to a single agent or conditioned preferences. Thus, we cannot specify the actualutility function as described in Section 3. Instead, we fill the matrix with scalars thatare calculated while taking into consideration the importance of the tasks, the capabili-ties of the agents for performing them, and the current tasks the agents are performing.Specifically, the entry rij in matrix R is composed of a positive reward for performingtask vj , a penalty on the distance from the event, and a penalty for leaving a task thatthe agent is currently performing:

rij = xijCap(ai, vj)I(vj)− ρ(ai, vj)− π(CTi, ∆w),

where the notations are the same as described in Section 3.In the second stage of the algorithm we use the efficient algorithm proposed by

Devanur et al. [1] to find market clearing prices and produce the allocation matrix X ,as described in [12]. We determine the actual allocation using this matrix X as follows:for every entry in X where xij > 0, we add the xij fraction of the task vj to the set oftasks allocated to agent ai.

It is clear that the matrix R is not expressive enough to represent the complex teamutility function we describe in Section 3. However, the fairness and Pareto optimalityproperties of the FMC allocation ensure that we achieve an efficient allocation that isbalanced over the agents. We demonstrate in Section 5 that this results in higher teamutility than simply trying to maximize the utility represented by R.

In the final stage of the algorithm we determine the order in which the agents willperform the fractions of tasks allocated to them. Here we use a greedy heuristic thatmaximizes the utility that agents will derive from fulfilling a task. Formally, the tasksare ordered by the value of the product of the relevant entries in the X and R matricesfrom high to low, e.g., the value that ai assigns to performing the fraction of task vjallocated to it is xijuij .

5 Experimental Evaluation

In order to evaluate the success of the proposed Fisher market clearing-based task allo-cation algorithm (FMC TA), we compare the quality of the allocations it produces witha benchmark incomplete algorithm, simulated annealing (SA) [11, 13]2. In contrast toFMC TA, SA considers only discrete predefined states (allocations). Thus, we selectedin advance the maximal number q of agents that can share a task and considered allallocations in which xij = z/q, where z ∈ {0, 1, . . . , q}. For comparison we appliedthese restrictions to FMC TA by rounding the allocation matrix X .

We also compared FMC TA with CFLA+ , a version of the CFLA algorithm en-hanced for LEPs. As mentioned in Section 2, CFLA is a state-of-the-art heuristic for

2 We tested other incomplete algorithms such as hill climbing with random restarts [8], but foundthat SA dominated these. This is consistent with previous studies on SA for task allocationproblems [13]

23

020406080

100120140160180

0 0.5 1 1.5 2 2.5Ut

ility

Time Units

FMC_TASALPGreedyCFLA+

Fig. 3. Accumulated team utility as a function of time for abstract problems.

assigning tasks with hard deadlines to coalitions of agents, but it does poorly on LEPsdue to soft deadlines, heterogeneous agents and tasks, and all tasks being able to beperformed by individual agents. Instead of checking the feasibility of completing pairsof tasks before the hard deadlines in the look-ahead phase, CFLA+ computes the themaximum utility for pairs of tasks taking into account the soft deadlines.

In addition, we present the results of two baseline approaches. The first is a greedyallocation that allocates tasks sequentially according to their importance. Each task isallocated to the agent that would derive the highest utility from performing it. Thesecond is identical to FMC TA except that instead of using FMC in the second stepto calculate X it uses a linear program (LP) that directly maximizes team utility asrepresented by R. Differences in allocations found by FMC TA and LP thus illustratethe effects of envy-freeness and Pareto optimality.

The first experiment involved synthetic, static LEPs where agents took no time tomove between tasks. These problems were similar to abstract task allocation problemswith resource-bounded agents and no spatial constraints. The problems included 10agents (n = 10) and 10 heterogeneous tasks (m = 10). Each task could be dividedamong at most four agents, i.e., q = 4. All tasks had the same workload and importance,and the capabilities of agents to perform tasks were drawn independently and uniformlyat random between zero and twenty. The discount function used was δ(v, t) = 0.9t forall v. The results presented here average over 50 independent runs of each algorithm onrandomly generated inputs.

Figure 3 presents the average accumulated team utility derived by the different algo-rithms as a function of time since the tasks were allocated. From the results, FMC TAand SA both accumulate more utility than CFLA+ and the baseline algorithms, butFMC TA accumulates utility faster. It is also notable that the greedy algorithm initiallyaccumulates utility as quickly as SA, but does not reach solutions of the same quality.In contrast, CFLA+ accumulates utility more slowly but reaches a higher amount of ac-cumulated utility than the greedy algorithm. The LP algorithm performs worst. Similarresults were obtained for different values of discount factors and problems sizes and areomitted for lack of space.

These results indicate that FMC TA finds high quality solutions. Moreover, whenusing FMC TA, agents perform tasks that result in greater utility earlier than when using

24

0

10000

20000

30000

40000

50000

0 100 200 300 400 500Ut

ility

Time (minutes)


Fig. 4. Accumulated team utility derived by law enforcement units as a function of time.

SA. These results motivate the use of FMC TA in problems where temporal constraintsfavor solutions where tasks are performed sooner.

The realistic problems used in our experiments were dynamic LEPs. Each problemwas situated in a city represented by a rectangular section of the Euclidean plane of size6× 6, divided into 9 neighborhoods of size 2× 2, each with a patrol task. We simulated8-hour shifts as in real police departments. At the beginning of a shift, 9 police units arepatrolling, one in each neighborhood. The shifts included different numbers of tasks pershift (i.e., loads), with tasks arriving at a fixed rate during each shift. As in Section 3.3,there were four types of events of decreasing importance from type 1 to type 4. Eventtypes were selected randomly according to the distribution of real event types providedby law enforcement authorities in our home city: in each shift, 30% of events were type1 events, 40% were type 2 events, 15% were type 3 events, and 15% were type 4 events.The locations of events were selected uniformly at random in the 6× 6 planar region.

The workloads of events were determined according to their type. Here, we alsoused the real estimates provided by law enforcement authorities. The workloads ofevents were drawn from exponential distributions with means 58, 55, 45, 37 for eventsof type 1 through 4, respectively.

In addition to the accumulated team utility, we also include two measurements em-ployed by law enforcement authorities for evaluating the performance of their forces.These are the average delay before arriving to events and the percentage of events thatwere attended but not completed. The results presented are an average of simulationsof 20 shifts.

Figure 4 presents the accumulated utility derived by the agents using the differentalgorithms as a function of time in shifts with 60 events. The advantage of FMC TAover the other algorithms is apparent. We note that larger differences are obtained forshifts with a smaller number of events. It is notable that on these problems LP outper-forms CFLA+ and the greedy algorithm.

Figure 5 presents the average arrival time of agents to events for shifts with differentloads. The arrival time is the time from the arrival of an event to the time when the firstunit arrives at its location and starts handling it. Rapid response (low arrival times) toincidents are highly valued by police departments, especially for more important events.The results are clearly in favor of FMC TA. The advantage is especially pronounced for

25

0

5

10

15

20

25

30

0 20 40 60 80 100Aver

age

Tim

e (m

inut

es)

# of Events


Fig. 5. Average arrival time of law enforcement units to events as a function of the number ofevents per shift.

0

5

10

15

20

25

30

35

type 1 type 2 type 3 type 4

Aver

age

Tim

e (m

inut

es)


Fig. 6. Average arrival time of law enforcement units to events of different types.

shifts with lighter loads, but the difference is still substantial for shifts with 60 eventsas well.

Figure 6 presents the average arrival times for agents to each type of event for shiftswith 60 events. The results presented demonstrate that the average differences in favorof FMC TA depicted in Figure 5 and the higher rate of utility accumulation shown inFigure 4 are mostly influenced by the differences in the arrival time to the most im-portant types of events (types 1 and 2). These results are somewhat counterintuitive, asgreedy in particular allocates high importance tasks first, and so might be expected tocomplete important events very quickly. LP also takes the importance of tasks into ac-count in its allocation, and furthermore it uses the same ordering heuristic as FMC TA.

The reason for FMC TA’s superiority is that it shares tasks to a much greater de-gree than the other algorithms, as shown in Figure 7. This occurs because of the envy-freeness of the FMC allocation mechanism, which is especially likely to share highimportance tasks. By dividing the workload of important tasks, FMC TA is able to getagents to begin working on subsequent tasks earlier than LP, greedy, and CFLA+ can.

Figure 8 depicts the average percentage of the events that were started but not com-pleted by agents during shifts with different loads. These results demonstrate that theadvantages of FMC TA over the other algorithms do not come at the expense of com-pleting events. In contrast, agents using SA do not complete most of the tasks they

26

0

0.2

0.4

0.6

0.8

1

1.2

0 20 40 60 80 100Pe

rcen

tage

of S

hare

d Ta

sks

# of Events


Fig. 7. Average percentage of events that were shared among more than one agent as a functionof the number of events per shift.

00.10.20.30.40.50.60.70.80.9

0 50 100

Perc

enta

ge o

f Inc

ompl

ete

Even

ts

# of Events


Fig. 8. Average percentage of events that were not completed as a function of the number ofevents per shift.

perform. The strong performance of LP, greedy, and CFLA+ on this measure can beexplained by their low rates of task sharing. Without sharing tasks, agents patrol moreoften and so do not need to be interrupted when new tasks arise. This is especially truefor shifts with low loads.

6 Conclusions

Dynamic task allocation problems require fast decision-making. On the other hand,when such problems include spatial and temporal constraints they are computationallyhard to solve optimally. Thus, the study of approximation methods for this type ofproblem is essential.

In this paper we proposed a new approach for dynamic task allocation that gener-ates fair (envy-free) and efficient (Pareto optimal) task allocations. The hypothesis weexamined was that the combination of these two properties results in high quality solu-tions for task allocation problems in which we want all agents to contribute efficientlyin order to achieve the group goal.

Our experiments support this hypothesis. While the Fisher model used to com-pute the fair and efficient allocations only permits a limited representation of utilities,

27

the fairness property causes the load to be distributed evenly among the agents andPareto optimality ensures that this load distribution is efficient. We demonstrate the ad-vantages of this approach over a state-of-the-art approximation algorithm and over adomain-specific heuristic approximation, both on synthetic, abstract problems and onrealistic law enforcement dynamic task allocation problems with spatial and temporalconstraints.

In future work we intend to investigate the use of non-linear utility functions. Thiswill allow us to represent cooperative synergies in groups of agents where the utilityderived by a coalition is greater than the sum of utilities if each of the coalition membersacted on its own. It will also serve as a useful modeling tool for applications withadditive utilities in order to further encourage cooperation, which we showed in thispaper to improve several measures of performance.

We also intend to propose a distributed version of our algorithm and compare itto state of the art distributed incomplete algorithms. We recently learned that an ele-gant distributed algorithm for finding Fisher market equilibrium exists [15] and thus,the main challenge of our task, to propose a distributed version to our task allocationalgorithm, will be in implementing distributed ordering heuristics.

References

1. N. R. Devanur, C. H. Papadimitriou, A. Saberi, and V. V. Vazirani. Market equilibrium viaa primal-dual-type algorithm. In Proceedings of the 43rd Symposium on Foundations ofComputer Science FOCS 2002, pages 389–395, Washington, DC, USA, 2002.

2. M.B. Dias, R. Zlot, N. Kalra, and A. Stentz. Market-based multirobot coordination: A surveyand analysis. Proceedings of the IEEE, 94(7):1257 –1270, july 2006.

3. A. Farinelli, A. Rogers, A. Petcu, and N. R. Jennings. Decentralised coordination of low-power embeddeddevices using the max-sum algorithm. In Proc. 7th Intern. Joint Conf. Au-tonom. Agents and Multiagent Sys. (AAMAS-08), pages 639–646, 2008.

4. D. Gale. The Theory of Linear Economic Models. McGraw-Hill, 1960.5. E. G. Jones, M. B. Dias, and A. Stentz. Learning-enhanced market-based task allocation for

oversubscribed domains. In Proceedings of the 2007 IEEE/RSJ InternationalConference onIntelligent Robots and Systems, San Diego, CA, November 2007.

6. Kathryn Macarthur, Ruben Stranders, Sarvapali Ramchurn, and Nick Jennings. A distributedanytime algorithm for dynamic task allocation in multi-agent systems. In Twenty-Fifth Con-ference on Artificial Intelligence (AAAI), pages 701–706. AAAI Press, August 2011. EventDates: August 7-11, 2011.

7. Noam Nisan, Jason Bayer, Deepak Chandra, Tal Franji, Robert Gardner, Yossi Matias, NeilRhodes, Misha Seltzer, Danny Tom, Hal Varian, Dan Zigmond, and Google Inc. Googlesauction for tv ads preliminary version, 2008.

8. D. Poole and A. K. Mackworth. Artificial Intelligence - Foundations of ComputationalAgents. Cambridge University Press, 2010.

9. S. D. Ramchurn, M. Polukarov, A. Farinelli, C. Truong, and N. R. Jennings. Coalition forma-tion with spatial and temporal constraints. In Proceedings of the 9th International Conferenceon Autonomous Agents and Multiagent Systems AAMAS 2010, pages 1181–1188, Richland,SC, 2010.

10. Sarvapali D. Ramchurn, Alessandro Farinelli, Kathryn S. Macarthur, and Nicholas R. Jen-nings. Decentralized coordination in robocup rescue. Comput. J., 53(9):1447–1461, 2010.

28

11. C. R. Reeves, editor. Modern heuristic techniques for combinatorial problems. John Wiley& Sons, Inc., New York, NY, USA, 1993.

12. J. H. Reijnierse and J. A. M. Potters. On finding an envy-free pareto-optimal division. Math-ematical Programming, 83:291–311, 1998.

13. A. Schoneveld, J. F. de Ronde, and P. M. A. Sloot. On the complexity of task allocation.Journal of Complexity, 3:52–60, 1997.

14. W.E. Walsh and M.P. Wellman. A market protocol for decentralized task allocation. In MultiAgent Systems, 1998. Proceedings. International Conference on, pages 325 –332, jul 1998.

15. Li Zhang. Proportional response dynamics in the fisher market. In Proceedings of the 36thInternatilonal Collogquium on Automata, Languages and Programming: Part II, ICALP ’09,pages 583–594, Berlin, Heidelberg, 2009.

29

Distributed Gibbs: A Memory-BoundedSampling-Based DCOP Algorithm?

Duc Thien Nguyen†, William Yeoh‡, and Hoong Chuin Lau†

†School of Information SystemsSingapore Management University

Singapore 178902{dtnguyen.2011,hclau}@smu.edu.sg

‡Department of Computer ScienceNew Mexico State UniversityLas Cruces, NM 88003, USA

[email protected]

Abstract. Researchers have used distributed constraint optimizationproblems (DCOPs) to model various multi-agent coordination and re-source allocation problems. Very recently, Ottens et al. proposed apromising new approach to solve DCOPs that is based on confidencebounds via their Distributed UCT (DUCT) sampling-based algorithm.Unfortunately, its memory requirement per agent is exponential in thenumber of agents in the problem, which prohibits it from scaling up tolarge problems. Thus, in this paper, we introduce a new sampling-basedDCOP algorithm called Distributed Gibbs, whose memory requirementsper agent is linear in the number of agents in the problem. Additionally,we show empirically that our algorithm is able to find solutions that arebetter than DUCT; and computationally, our algorithm runs faster thanDUCT as well as solve some large problems that DUCT failed to solvedue to memory limitations.

1 Introduction

Distributed constraint optimization problems (DCOPs) are problems whereagents need to coordinate their value assignments to maximize the sum of re-sulting constraint rewards [18, 21, 30]. Researchers have used them to modelvarious multi-agent coordination and resource allocation problems such as thedistributed scheduling of meetings [32], the distributed allocation of targets tosensors in a network [5, 33], the distributed allocation of resources in disasterevacuation scenarios [13], the distributed management of power distributionnetworks [12], the distributed generation of coalition structures [25] and thedistributed coordination of logistics operations [14].

The field has matured considerably over the past decade as researchers con-tinue to develop better and better algorithms. Most of these algorithms fall into

? A version of this paper appeared in AAMAS 2013 [19].

30

one of the following two classes of algorithms: (1) search-based algorithms likeADOPT [18] and its variants [29, 8], AFB [7] and MGM [16], where the agentsenumerate through combinations of value assignments in a decentralized man-ner, and (2) inference-based algorithms like DPOP [21], max-sum [5] and ActionGDL [26], where the agents use dynamic programming to propagate aggregatedinformation to other agents.

More recently, Ottens et al. proposed a promising new approach to solveDCOPs that is based on confidence bounds [20]. They introduced a newsampling-based algorithm called Distributed UCT, which is an extension ofUCB [1] and UCT [10]. While the algorithm is shown to outperform competingapproximate and complete algorithms,its memory requirements per agent is ex-ponential in the number of agents in the problem, which prohibits it from scalingup to large problems.

Thus, in this paper, we introduce a new sampling-based DCOP algorithmcalled Distributed Gibbs (D-Gibbs), which is a distributed extension of the Gibbsalgorithm [6]. D-Gibbs is memory-bounded – its memory requirement per agentis linear in the number of agents in the problem. While the Gibbs algorithm wasdesigned to approximate joint probability distributions in Markov random fieldsand solve maximum a posteriori (MAP) problems, we show how one can mapsuch problems into DCOPs in order for Gibbs to operate directly on DCOPs.Our results show that D-Gibbs is able to find solutions that are better thanDUCT faster than DUCT as well as solve some larger problems that DUCTfailed to solve due to memory limitations.

2 Background: DCOP

A distributed constraint optimization problem (DCOP) [18, 17, 21] is defined by〈X ,D,F ,A,α〉, where X = {x1, . . . , xn} is a set of variables;D = {D1, . . . , Dn} isa set of finite domains, where Di is the domain of variable xi; F is a set of binaryutility functions, where each utility function Fij : Di×Dj 7→ N∪{0,∞} specifiesthe utility of each combination of values of variables xi and xj ; A = {a1, . . . , ap}is a set of agents and α : X → A maps each variable to one agent. Although thegeneral DCOP definition allows one agent to own multiple variables as well as theexistence of n-ary constraints, we restrict our definition here for simplificationpurposes. One can transform a general DCOP to our DCOP using pre-processingtechniques [31, 4, 2]. A solution is a value assignment for a subset of variables.Its utility is the evaluation of all utility functions on that solution. A solutionis complete iff it is a value assignment for all variables. The goal is to find autility-maximal complete solution.

A constraint graph visualizes a DCOP instance, where nodes in the graphcorrespond to variables in the DCOP and edges connect pairs of variables ap-pearing in the same utility function. A DFS pseudo-tree arrangement has thesame nodes and edges as the constraint graph and satisfies that (i) there is asubset of edges, called tree edges, that form a rooted tree and (ii) two variablesin a utility function appear in the same branch of that tree. A DFS pseudo-tree

31

Algorithm 1: Gibbs(z1, . . . , zn)

1 for i = 1 to n do2 z0i ← Initialize(zi)3 end4 for t = 1 to T do5 for i = 1 to n do6 zti ← Sample(P (zi | zt1, . . . , zti−1, z

t−1i+1 , . . . , z

t−1n ))

7 end

8 end

arrangement can be constructed using distributed DFS algorithms [9]. In thispaper, we will use Ni to refer to the set of neighbors of variable xi in the con-straint graph, Ci to refer to the set of children of variable xi in the pseudo-tree,and Pi to refer to the parent of variable xi in the pseudo-tree.

3 Background: Algorithms

We now provide a brief overview of two relevant sampling-based algorithms –the centralized Gibbs algorithm and the Distributed UCT (DUCT) algorithm.

3.1 Gibbs

The Gibbs sampling algorithm [6] is a Markov chain Monte Carlo algorithmthat can be used to approximate joint probability distributions. It generates aMarkov chain of samples, each of which is correlated with previous samples. Sup-pose we have a joint probability distribution P (z1, z2, . . . , zn) over n variables,which we would like to approximate. Algorithm 1 shows the pseudocode of theGibbs algorithm, where each variable zti represents the t-th sample of variable zi.The algorithm first initializes z0i to any arbitrary value of variable zi (lines 1-3).Then, it iteratively samples zti from the conditional probability distribution as-suming that all the other n−1 variables take on their previously sampled values,respectively (lines 4-8). This process continues for a fixed number of iterations oruntil convergence, that is, the joint probability distribution approximated by thesamples do not change. It is also common practice to ignore a number of sam-ples at the beginning as it may not accurately represent the desired distribution.Once the joint probability distribution is found, one can easily identify that acomplete solution with the maximum likelihood. This problem is called the max-imum a posteriori (MAP) estimation problem, which is a common problem inapplications such as image processing [3] and bioinformatics [28, 22].

The Gibbs sampling algorithm is desirable as its approximated joint prob-ability distribution (formed using its samples) will converge to the true jointprobability distribution given a sufficiently large number of samples for mostproblems. While Gibbs cannot be used to solve DCOPs directly, we will latershow how one can slightly modify the problem such that Gibbs can be used tofind optimal solutions given a sufficiently large number of samples.

32

3.2 Distributed UCT

The Upper Confidence Bound (UCB) [1] and UCB Applied to Trees (UCT) [10]algorithms are two Monte Carlo algorithms that have been successfully appliedto find near optimal policies in large Markov Decision Processes (MDPs). TheDistributed UCT (DUCT) algorithm [20] is a distributed version of UCT thatcan be used to find near-optimal cost-minimal complete DCOP solutions. Wenow provide a brief introduction to the algorithm and refer readers to the originalarticle [20] for a more detailed treatment.

DUCT first constructs a pseudo-tree, after which each agents knows its par-ent, pseudo-parents, children and pseudo-children. Each agent xi maintains thefollowing for all possible contexts X and values d ∈ Di:

• Its current value di.

• Its current context Xi, which is initialized to null. It is its assumption on thecurrent values of its ancestors.

• Its cost yi, which is initialized to∞. It is the sum of the costs of all cost func-tions between itself and its ancestors given that they take on their respectivevalues in its context and it takes on its current value.

• Its counter τi(X, d), which is initialized to 0. It is the number of times it hassampled value d under context X.

• Its counter τi(X), which is initialized to 0. It is the number of times it hasreceived context X from its parent.

• Its cost µi(X, d), which is initialized to ∞. It is the smallest cost found whenit sampled d under context X so far up to the current iteration.

• Its cost µi(X), which is initialized to ∞. It is the smallest cost found undercontext X so far up to the current iteration.

At the start, the root agent chooses its value and sends it down in a CON-TEXT message to each of its children. When an agent receives a CONTEXTmessage, it too chooses its value, appends it to the context in the CONTEXTmessage, and sends the appended context down in a CONTEXT message to eachof its children. Each agent xi chooses its value di using:

di = argmind∈Di

Bi(d) (1)

Bi(d) = f(δi(d), µi(Xi, d), τi(Xi, d), Bc) (2)

δi(d) =∑

〈xj ,dj〉∈Xi

Fij(d, dj) (3)

where its bound Bi(d) is initialized with a heuristic function f that balancesexploration and exploitation. Additionally, each agent xi increments the numberof times it has chosen its current value di under its current context Xi using:

τi(Xi, di) = τi(Xi, di) + 1 (4)

τi(Xi) = τi(Xi) + 1 (5)

33

This process continues until leaf agents receive CONTEXT messages and choosetheir respective values. Then, each leaf agent calculates its cost and sends it upin a COST message to its parent. When an agent receives a COST message fromeach of its children, it too calculates its cost, which includes the costs receivedfrom its children, and sends it up to its parent. Each agent xi calculates its costsyi, µi(Xi, d) and µi(Xi) using:

yi = δi(di) +∑xc∈Ci

yc (6)

µi(Xi, di) = min{µi(Xi, di), yi} (7)

µi(Xi) = min{µi(Xi), µi(Xi, di)} (8)

This process continues until the root agent receives a COST message from eachof its children and calculates its own cost. Then, the root agent starts a newiteration, and the process continues until all the agents terminate. An agent xiterminates if its parent has terminated and the following condition holds:

maxd∈Di

{µi(Xi)−

[µi(Xi, d)−

√ln 2

∆

τi(Xi, di)

]}≤ ε (9)

where ∆ and ε are parameters of the algorithm.

4 Distributed Gibbs

While DUCT has been shown to be very promising, its memory requirementper agent is O(DT ), where D = maxxi Di is the largest domain size over allagents and T is the depth of the pseudo-tree. Each agent needs to store a con-stant number of variables for all possible contexts and values,and the number ofpossible contexts is exponential in the number of ancestors. Therefore, this highmemory requirement might prohibit the use of DUCT in large problems, espe-cially if the agents have large domain sizes as well. Therefore, we now introducethe Distributed Gibbs algorithm, which is a distributed extension of the Gibbsalgorithm adapted to solve DCOPs. Additionally, its memory requirement peragent is linear in the number of ancestors.

4.1 Mapping of MAP Estimation Problems to DCOPs

Recall that the Gibbs algorithm approximates a joint probability distributionover all the variables in a problem when only marginal distributions are available.Once the joint probability distribution is found, it finds the maximum a posteriori(MAP) solution. If we can map a DCOP where the goal is to find a completesolution with maximum utility, to a problem where the goal is to find a completesolution with the maximum likelihood, and that a solution with maximum utilityis also a solution with maximum likelihood, then we can use Gibbs to solveDCOPs.

34

We now describe how to do so. Consider a maximum a posteriori (MAP)estimation problem on a pairwise Markov random field (MRF).1 An MRF canbe visualized by an undirected graph 〈V,E〉 and is formally defined by

• A set of random variables X = {xi | ∀i ∈ V }, where each random variable xican be assigned a value di from a finite domain Di. Each random variable xiis associated with node i ∈ V .

• A set of potential functions θ = {θij(xi, xj) | ∀(i, j) ∈ E}. Each potentialfunction θij(xi, xj) is associated with edge (i, j) ∈ E. Let the probabilityP (xi = di, xj = dj) be defined as exp(θij(xi = di, xj = dj)). For convenience,we will drop the values in the probabilities and use P (xi, xj) to mean P (xi=di, xj=dj) from now on.

Therefore, a complete assignment x to all the random variables has the proba-bility:

P (x) =1

Z

∏(i,j)∈E

exp[θij(xi, xj)] =1

Zexp

[ ∑(i,j)∈E

θij(xi, xj)

](10)

where Z is the normalization constant. The objective of a MAP estimation prob-lem is to find the most probable assignment to all the variables under P (x), whichis equivalent to finding a complete assignment x that maximizes the function:

F (x) =∑

(i,j)∈E

θij(xi, xj) (11)

Maximizing the function in Equation 11 is also the objective of a DCOP ifeach potential function θij corresponds to a utility function Fij . Therefore, ifwe use the Gibbs algorithm to solve a MAP estimation problem, then the com-plete solution found for the MAP estimation problem is also a solution to thecorresponding DCOP.

4.2 Algorithm Description

We now describe the Distributed Gibbs algorithm. Algorithm 2 shows thepseudo-code, where each agent xi maintains the following:

• Its values di and di, which are both initialized to initial value ValInit(xi).They are the agent’s value in the current and previous iterations, respectively

• Its best value d∗i , which is also initialized to initial value ValInit(xi). It is theagent’s value in the best solution found so far. Note that each agent maintainsits own best value only and does not need to know the best values of otheragents. The best solution x∗ = (d∗1, . . . , d

∗n) can then be constructed upon

termination.

• Its current context Xi, which is initialized with all the tuples of neighbors andtheir initial values. It is its assumption on the current values of its neighbors.

1 We are describing pairwise MRFs so that the mapping to binary DCOPs is clearer.

35

Algorithm 2: Distributed Gibbs()

1 Create pseudo-tree2 Each agent xi calls Initialize()

Procedure Initialize()

3 d∗i ← di ← di ← ValInit(xi)4 Xi ← {〈xj , ValInit(xj)〉 | xj ∈ Ni}5 ∆∗i ← ∆i ← 06 t∗i ← ti ← 07 if xi is root then8 ti ← ti + 19 Sample()

10 end

• Its time index ti, which is initialized to 0. It is the number of iterations it hassampled.

• Its time index t∗i , which is initialized to 0. It indicates the most recent iterationthat a better solution was found. The agents use it to know if they shouldupdate their respective best values.

• Its value ∆i, which is initialized to 0. It is the difference in solution qualitybetween the current solution and the best solution found in the previousiteration.

• Its value ∆∗i , which is initialized to 0. It is the difference in solution qualitybetween the best solution found in the current iteration and the best solutionfound so far up to the previous iteration.

The algorithm starts by constructing a pseudo-tree (line 1) and having eachagent initialize their variables to their default values (lines 2-6). The root thenstarts by sampling, that is, choosing its value di based on the probability:

P (xi | xj ∈ X \ {xi}) = P (xi | xj ∈ Ni)

=1

Z

∏〈xj ,dj〉∈Xi

exp[Fij(di, dj)]

=1

Zexp

[ ∑〈xj ,dj〉∈Xi

Fij(di, dj)

](12)

where Z is the normalization constant (lines 9 and 12). It then sends its valuein a VALUE message to each of its neighbors (line 19).

When an agent receives a VALUE message, it updates the value of the senderin its context (line 20). If the message is from its parent, then it too samplesand sends its value in a VALUE message to each of its neighbors (lines 32, 12and 19). This process continues until all the leaf agents sample. Each leaf agentthen sends a BACKTRACK message to its parent (line 34). When an agent re-ceives a BACKTRACK message from each child (line 38), it too sends a BACK-

36

Procedure Sample()

11 di ← di12 di ← Sample based on Equation 12

13 ∆i ← ∆i +∑〈xj ,dj〉∈Xi [Fij(di, dj)− Fij(di, dj)]

14 if ∆i > ∆∗i then15 ∆∗i ← ∆i

16 d∗i ← di17 t∗i ← ti18 end19 Send VALUE (xi, di,∆i,∆

∗i , t∗i ) to each xj ∈ Ni

Procedure When Received VALUE(xs, ds, ∆s, ∆∗s, t∗s)

20 Update 〈xs, d′s〉 ∈ Xi with (xs, ds)21 if xs = Pi then22 Wait until received VALUE message from all pseudo-parents in this iteration23 ti ← ti + 124 if t∗s = ti then25 d∗i ← di26 else if t∗s = ti − 1 and t∗s > t∗i then

27 d∗i ← di28 end29 ∆i ← ∆s

30 ∆∗i ← ∆∗s31 t∗i ← t∗s32 Sample()33 if xi is a leaf then34 Send BACKTRACK (xi,∆i,∆

∗i ) to Pi

35 end

36 end

TRACK message to its parent (line 46). This process continues until the rootagent receives a BACKTRACK message from each child, which concludes oneiteration.

We now describe how the agents identify if they have found a better solutionthan the best one found thus far in a decentralized manner without having toknow the values of every other agent in the problem. In order to do so, theagents use delta variables ∆i and ∆∗i . These variables are sent down the pseudo-tree in VALUE messages together with the current value of agents (line 19)and up the pseudo-tree in BACKTRACK messages (lines 34 and 46). When anagent receives a VALUE message from its parent, it updates its delta values toits parents’ delta values prior to sampling (lines 29-30). After sampling, eachagent calculates its local difference in solution quality

∑〈xj ,dj〉∈Xi [Fij(di, dj) −

Fij(di, dj)] and adds it to ∆i (line 13). Thus, ∆i can be seen as a sum of localdifferences from the root to the current agent as it is updated down the pseudo-tree. If this difference ∆i is larger than the maximum difference ∆∗i , which means

37

Procedure When Received BACKTRACK(xs, ∆s, ∆∗s)

37 Store ∆s and ∆∗s38 if Received BACKTRACK message from all children in this iteration then39 ∆i ←

(∑xc∈Ci ∆c

)−(|Ci| − 1

)·∆i

40 ∆∗Ci ←(∑

xc∈Ci ∆∗c

)−(|Ci| − 1

)·∆∗i

41 if ∆∗Ci > ∆∗i then42 ∆∗i ← ∆∗Ci43 d∗i ← di44 t∗i ← ti45 end46 Send BACKTRACK (xi,∆i,∆

∗i ) to Pi

47 if xi is root then48 ∆i ← ∆i −∆∗i49 ∆∗i ← 050 ti ← ti + 151 Sample()

52 end

53 end

that the new solution is better than the best solution found thus far, then theagent updates the maximum difference ∆∗i to ∆i and its best value d∗i to itscurrent value di (lines 14-16).

After finding a better solution, the agent needs to inform other agents toupdate their respective best values to their current values since the best solutionfound thus far assumes that the other agents take on their respective currentvalues. There are the following three types of agents that need to be informed:

• Descendant agents: The agent that has found a better solution updates itstime index t∗i to the current iteration (line 17) and sends this variable down toits children via VALUE messages (line 19). If an agent xi receives a VALUEmessage from its parent with a time index t∗s that equals the current iterationti, then it updates its best value d∗i to its current value di (lines 24-25). It thenupdates its time index t∗i to its parent’s time index t∗s and sends it down to itschildren via VALUE messages (lines 31, 32 and 19). This process continuesuntil all descendant agents update their best values.

• Ancestor agents: The agent that has found a better solution sends its max-imum difference ∆∗i up to its parent via BACKTRACK messages (lines 34and 46). In the simplest case where an agent xi has only one child xc, ifthe agent receives a BACKTRACK message with a maximum difference ∆∗clarger than its own maximum difference, then it updates its best value d∗i toits current value di (lines 40, 41 and 43). In the case where an agent has morethan one child, then it compares the sum of the maximum differences overall children xc subtracted by the overlaps (∆∗i was added an extra |Ci| times)with its own maximum difference ∆∗i . If the former is larger than the latter,then it updates its best value d∗i to its current value di (lines 40, 41 and 43).It then updates its own maximum difference ∆∗i (line 42) and sends it to its

38

parent via BACKTRACK messages (line 46). This process continues until allancestor agents update their best values.

• Sibling subtree agents: Agents in sibling subtrees do not get VALUE orBACKTRACK messages from each other. Thus, an agent xi cannot update itsbest value using the above two methods if another agent in its sibling subtreehas found a better solution. However, in the next iteration, the commonancestor of these two agents will propagate its time index down to agent xivia VALUE messages. If agent xi receives a time index t∗s that equals theprevious iteration ti − 1 and is larger than its own time index t∗i (indicatingthat it hasn’t found an even better solution in the current iteration), then

it updates its best value d∗i to its previous value di (lines 26-28). (It doesn’tupdate its best value to its current value because the best solution was foundin the previous iteration.) Thus, all agents in sibling subtrees also updatetheir best values.

Therefore, when a better solution is found, all agents in the DistributedGibbs algorithm update their best values by the end of the next iteration. Thealgorithm can either terminate after a given number of iterations or when nobetter solution is found for a given number of consecutive iterations. We latershow that by choosing at least 1

α·ε number of samples, the probability that thebest solution found is in the top α-percentile is at least 1− ε (Theorem 2).2

4.3 Theoretical Properties

Like Gibbs, the Distributed Gibbs algorithm also samples the values sequentiallyand samples based on the same equation (Equation 12). The main difference isthat Gibbs samples down a pseudo-chain (a pseudo-tree without sibling sub-trees), while Distributed Gibbs exploits parallelism by sampling down a pseudo-tree. However, this difference only speeds up the sampling process and doesnot affect the correctness of the algorithm since agents in sibling subtrees areindependent of each other. Thus, we will show several properties that hold forcentralized Gibbs and, thus, also hold for Distributed Gibbs. Some of these prop-erties are well-known (we label them “properties”) and some are new properties(we label them “theorems”) to the best of our knowledge. The proofs for thenew properties are available in [19].

Property 1. Gibbs is guaranteed to converge.

Property 2. Upon convergence, the probability P (x) of any solution x equals itsapproximated probability PGibbs(x):

2 One can slightly optimize the algorithm by having the agents (1) send their currentvalues in BACKTRACK messages instead of VALUE messages to their parents;and (2) send smaller VALUE messages, which do not contain delta values and timeindices, to all pseudo-children. We describe the unoptimized version here for ease ofunderstanding.

39

P (x) = PGibbs(x) =exp[F (x)]∑

x′∈S exp[F (x′)]

where S is the set of all solutions sampled.

Property 3. The expected numbers of samples NGibbs to get optimal solution x∗

is

E(NGibbs) ≤1

PGibbs(x∗)

+ L

where L is the number of samples needed before the estimated joint probabilityconverges to the true joint probability.

The process of repeated sampling to get an optimal solution is equivalent tosampling Bernoulli trials with success probability PGibbs(x

∗). Thus, the corre-sponding geometric variable for the number of samples needed to get an optimalsolution for the first time has an expectation of 1/PGibbs(x

∗) [11]. In the follow-ing, we assume that 1/PGibbs(x

∗) >> L and we will thus ignore L.

Theorem 1. The expected number of samples to find an optimal solution x∗

with Gibbs is no greater than with a uniform sampling algorithm. In other words,

PGibbs(x∗) ≥ Puniform(x∗)

Definition 1 A set of top α-percentile solutions Sα is a set that contains solu-tions that are no worse than any solution in the supplementary set D \ S and|Sα||D| = α.

Theorem 2. After N = 1α·ε number of samples with Gibbs, the probability that

the best solution found thus far xN is in the top α-percentile is at least 1− ε. Inother words,

PGibbs

(xN ∈ Sα | N =

1

α · ε

)≥ 1− ε

Corollary 1. The quality of the solution found by Gibbs approaches optimal asthe number of samples N approaches infinity. In other words,

limε→0

PGibbs

(xN ∈ Sα | N =

1

α · ε

)= 1

While we have only described the above properties and theorems for problemswith discrete values, we believe that they can be applied directly for DCOPs withcontinuous values [23] by changing the summations to integrations.

40

200

300

400

500

600

700

800

900

15 18 21 24 27 30

Solu

tion Q

ualit

y

No. of Agents |X|

p1 = 0.3, |Di| = 5

D-GibbsMGM

MGM2DUCTDPOP

500

550

600

650

700

750

5 8 11 14 17 20

Solu

tion Q

ualit

y

Domain Size |Di|

|X| = 25, p1 = 0.3

D-GibbsMGM

MGM2DUCT

(a) (b)

400

600

800

1000

1200

1400

1600

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Solu

tion Q

ualit

y

Density p1

|X| = 25, |Di| = 5

D-GibbsMGM

MGM2DUCT

300

310

320

330

340

350

360

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

Solu

tion Q

ualit

y

DUCT Parameters ∆ and ε

|X| = 19, p1 = 0.3, |Di| = 5

D-GibbsMGM

MGM2DUCTDPOP

(c) (d)

Fig. 1. Results for Graph Coloring Problems

4.4 Complexity Analysis

Each agent xi needs to store a context Xi, which contains the agent-value pairsof all neighboring agents xj ∈ Ni. Additionally agent xi needs to store the deltavalues ∆c and ∆∗c for all children xc ∈ Ci. Thus, the memory complexity of eachagent is linear in the number of agents in the problem (= O(|X |)).

Each agent xi needs to send a VALUE message to each neighboring agentand a BACKTRACK message to its parent in each iteration, and each messagecontains a constant number of values (each VALUE message contains 5 valuesand each BACKTRACK message contains 3 values). Thus, the amount of infor-mation passed around in the network per iteration is polynomial in the numberof agents in the problem (= O(|X |2).

5 Experimental Results

We now compare Distributed Gibbs (D-Gibbs) to DPOP [21] (an optimal algo-rithm) and MGM [16], MGM2 [16] and DUCT [20] (sub-optimal algorithms). Interms of network load, that is, the amount of information passed around the net-work, DPOP sends an exponential amount of information in total (=O(exp(|X |))while MGM, MGM2, DUCT and D-Gibbs send a polynomial amount of infor-mation in each iteration (= O(|X |2).

To compare runtimes and solution qualities, we use publicly-available imple-mentations of MGM, MGM2, DUCT and DPOP, which are all implemented on

41

the FRODO framework [15]. We run our experiments on a 64 core Linux ma-chine with 2GB of memory per run. We measure runtime using the simulatedtime metric [24] and evaluate the algorithms on graph coloring problems. For allproblems, we set the DUCT parameters ∆ = ε = 0.05, similar to the settingsused in the original article [20] unless mentioned otherwise. We also let MGM,MGM2 and D-Gibbs run for as long as DUCT did for fair comparisons.3 Eachdata point is averaged over 50 instances.

We used the random graph coloring problem generator provided in theFRODO framework [15] to generate our problems. We varied the size of theproblem by increasing the number of agents |X | from 18 to 30, the graph den-sity p1from 0.2 to 1.0 and the domain size |Di| of each agent xi from 5 to 20,and we chose the constraint utilities uniformly from the range (0, 10) at randomif the neighboring agents have different values and 0 if they have the same value.Figure 1 shows our results, where we varied the number of agents |X | in Fig-ure 1(a), the domain size |Di| in Figure 1(b), the density p1 in Figure 1(c) andthe DUCT parameters ∆ and ε in Figure 1(d). DPOP ran out of memory forproblems with 20 agents and above, and DUCT ran out of memory for problemswith domain sizes 18 and 19 and for problems with a density of 1.

In all four figures, DPOP found better solutions (when it did not run out ofmemory) than D-Gibbs, which found better solutions than MGM, MGM2 andDUCT. The difference in solution quality increases as the number of agents,domain size and density increases.

Additionally, in Figure 1(d), as ∆ and ε decreases, the runtimes of DUCT(and thus of all the other algorithms also since we let them run for as long asDUCT) increases since the tolerance for error decreases. However, the quality ofits solutions improves as a result. Interestingly, the quality of solutions found byD-Gibbs, MGM and MGM2 remained relatively unchanged despite given moreruntime, which means that they found their solutions very early on. Thus, D-Gibbs found close to optimal solutions faster (when ∆ = ε = 0.1) than DUCT(when ∆ = ε = 0.01).

6 Conclusions

Researchers have not investigated sampling-based approaches to solve DCOPsuntil very recently, where Ottens et al. introduced the Distributed UCT (DUCT)algorithm, which uses confidence-based bounds. However, one of its limitation isits memory requirement per agent, which is exponential in the number of agentsin the problem. This large requirement prohibits it from scaling up to large prob-lems. Examples include problems with domain sizes 19 and 20 or problems witha density of 1, which we showed experimentally. Therefore, in this paper, weintroduce a new sampling-based algorithm called Distributed Gibbs (D-Gibbs),

3 Exceptions are when DUCT failed to find a solution due to insufficient memory. Fordomain size |Di| = 19 and 20 in Figure 1(b), we let the other algorithms run for aslong as DUCT did for domain size |Di| = 18, and for density p1 = 1 in Figure 1(c),we let the other algorithms run for as long as DUCT did for density p1 = 0.9.

42

whose memory requirement per agent is linear in the number of agents in theproblem. It is a distributed extension of Gibbs, which was originally designed toapproximate joint probability distributions in Markov random fields. We experi-mentally show that D-Gibbs finds better solutions compared to competing localsearch algorithms like MGM and MGM2 in addition to DUCT. Additionally,we also show how one can choose the number of samples based on the desireda priori approximation bound (using Theorem 2). While we have described D-Gibbs for (discrete-valued) DCOPs, we believe that it can easily be extendedto solve continuous-valued DCOPs [23] as well. Thus, we would like to comparethis approach with the Continuous-Valued Max-Sum [23, 27] in the future.

Acknowledgment

This research is supported by the Singapore National Research Foundation underits International Research Centre @ Singapore Funding Initiative and adminis-tered by the IDM Programme Office.

References

1. P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmedbandit problem. Machine Learning, 47(2–3):235–256, 2002.

2. F. Bacchus, X. Chen, P. van Beek, and T. Walsh. Binary vs. non-binary constraints.Artificial Intelligence, 140(1–2):1–37, 2002.

3. J. Besag. On the statistical analysis of dirty pictures. Journal of the Royal Statis-tical Society, Series B, 48(3):259–279, 1986.

4. D. Burke and K. Brown. Efficiently handling complex local problems in distributedconstraint optimisation. In Proceedings of ECAI, pages 701–702, 2006.

5. A. Farinelli, A. Rogers, A. Petcu, and N. Jennings. Decentralised coordinationof low-power embedded devices using the Max-Sum algorithm. In Proceedings ofAAMAS, pages 639–646, 2008.

6. S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and theBayesian restoration of images. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 6(6):721–741, 1984.

7. A. Gershman, A. Meisels, and R. Zivan. Asynchronous Forward-Bounding fordistributed COPs. Journal of Artificial Intelligence Research, 34:61–88, 2009.

8. P. Gutierrez, P. Meseguer, and W. Yeoh. Generalizing ADOPT and BnB-ADOPT.In Proceedings of IJCAI, pages 554–559, 2011.

9. Y. Hamadi, C. Bessiere, and J. Quinqueton. Distributed intelligent backtracking.In Proceedings of ECAI, pages 219–223, 1998.

10. L. Kocsis and C. Szepesvari. Bandit based Monte-Carlo planning. In Proceedingsof ECML, pages 282–293, 2006.

11. V. Kulkarni. Modeling, Analysis, Design, and Control of Stochastic Systems.Springer, 1999.

12. A. Kumar, B. Faltings, and A. Petcu. Distributed constraint optimization withstructured resource constraints. In Proceedings of AAMAS, pages 923–930, 2009.

13. R. Lass, J. Kopena, E. Sultanik, D. Nguyen, C. Dugan, P. Modi, and W. Regli.Coordination of first responders under communication and resource constraints(Short Paper). In Proceedings of AAMAS, pages 1409–1413, 2008.

43

14. T. Leaute and B. Faltings. Coordinating logistics operations with privacy guaran-tees. In Proceedings of IJCAI, pages 2482–2487, 2011.

15. T. Leaute, B. Ottens, and R. Szymanek. FRODO 2.0: An open-source frameworkfor distributed constraint optimization. In Proceedings of the Distributed ConstraintReasoning Workshop, pages 160–164, 2009.

16. R. Maheswaran, J. Pearce, and M. Tambe. Distributed algorithms for DCOP: Agraphical game-based approach. In Proceedings of PDCS, pages 432–439, 2004.

17. R. Mailler and V. Lesser. Solving distributed constraint optimization problemsusing cooperative mediation. In Proceedings of AAMAS, pages 438–445, 2004.

18. P. Modi, W.-M. Shen, M. Tambe, and M. Yokoo. ADOPT: Asynchronous dis-tributed constraint optimization with quality guarantees. Artificial Intelligence,161(1–2):149–180, 2005.

19. D. T. Nguyen, W. Yeoh, and H. C. Lau. Distributed Gibbs: A memory-boundedsampling-based DCOP algorithm. In Proceedings of AAMAS, pages 167–174, 2013.

20. B. Ottens, C. Dimitrakakis, and B. Faltings. DUCT: An upper confidence boundapproach to distributed constraint optimization problems. In Proceedings of AAAI,pages 528–534, 2012.

21. A. Petcu and B. Faltings. A scalable method for multiagent constraint optimiza-tion. In Proceedings of IJCAI, pages 1413–1420, 2005.

22. D. Sontag, T. Meltzer, A. Globerson, T. Jaakkola, and Y. Weiss. Tightening LPrelaxations for MAP using message passing. In Proceedings of UAI, pages 503–510,2008.

23. R. Stranders, A. Farinelli, A. Rogers, and N. Jennings. Decentralised coordina-tion of continuously valued control parameters using the Max-Sum algorithm. InProceedings of AAMAS, pages 601–608, 2009.

24. E. Sultanik, R. Lass, and W. Regli. DCOPolis: a framework for simulating anddeploying distributed constraint reasoning algorithms. In Proceedings of the Dis-tributed Constraint Reasoning Workshop, 2007.

25. S. Ueda, A. Iwasaki, and M. Yokoo. Coalition structure generation based on dis-tributed constraint optimization. In Proceedings of AAAI, pages 197–203, 2010.

26. M. Vinyals, J. Rodrıguez-Aguilar, and J. Cerquides. Constructing a unifying theoryof dynamic programming DCOP algorithms via the generalized distributive law.Autonomous Agents and Multi-Agent Systems, 22(3):439–464, 2011.

27. T. Voice, R. Stranders, A. Rogers, and N. Jennings. A hybrid continuous max-sumalgorithm for decentralised coordination. In Proceedings of ECAI, pages 61–66,2010.

28. C. Yanover, T. Meltzer, and Y. Weiss. Linear programming relaxations and beliefpropagation – an empirical study. Journal of Machine Learning Research, 7:1887–1907, 2006.

29. W. Yeoh, A. Felner, and S. Koenig. BnB-ADOPT: An asynchronous branch-and-bound DCOP algorithm. Journal of Artificial Intelligence Research, 38:85–133,2010.

30. W. Yeoh and M. Yokoo. Distributed problem solving. AI Magazine, 33(3):53–65,2012.

31. M. Yokoo, editor. Distributed Constraint Satisfaction: Foundation of Cooperationin Multi-agent Systems. Springer, 2001.

32. R. Zivan. Anytime local search for distributed constraint optimization. In Pro-ceedings of AAAI, pages 393–398, 2008.

33. R. Zivan, R. Glinton, and K. Sycara. Distributed constraint optimization for largeteams of mobile sensing agents. In Proceedings of IAT, pages 347–354, 2009.

44

Solving Customer-Driven MicrogridOptimization Problems as DCOPs

Saurabh Gupta†, Palak Jain‡, William Yeoh†, Satish J. Ranade‡, andEnrico Pontelli†

†Department of Computer ScienceNew Mexico State University

Las Cruces, NM 88003{saurabh,wyeoh,epontell}@cs.nmsu.edu

‡Klipsch School of Electrical and Computer EngineeringNew Mexico State University

Las Cruces, NM 88003{palak,sranade}@nmsu.edu

Abstract. In response to the challenge by Ramchurn et al. to solvesmart grid optimization problems with artificial intelligence tech-niques [21, 23], we investigate the feasibility of solving two common smartgrid optimization problems as distributed constraint optimization prob-lems (DCOPs). Specifically, we look at two common customer-drivenmicrogrid (CDMG) optimization problems – a comprehensive CDMGoptimization problem and an islanding problem. We show how one canmodel both problems as DCOPs and solve them using off-the-shelf DCOPalgorithms, thus showing that researchers in the distributed constraintreasoning community are in a unique position to contribute towards thischallenge.

1 Introduction

There is a growing consensus that there is an urgent need to move away fromfossil fuel to renewable energy resources given that the demand for fossil fuelwill soon outstrip supply and the carbon emissions from burning fossil fuel willhave long lasting impact on global warming [21]. In order to meet the demandof power, a large number of renewable power generators, which may likely be co-located with power consumers, will need to be incorporated into the power grid.These renewable generators (e.g., solar panels, wind farms) may be located at thesame locations as the power consumers (e.g., homes, factories, office buildings,etc.). Thus, unlike the power grids of today, where we have limited control andcommunication capabilities, the power grids of the future will need new mecha-nisms to control and coordinate the power generation as well as consumption.

One vision of this future power grid is a smart grid, which the US Departmentof Energy defined as: “A fully automated power delivery network that monitorsand controls every customer and node, ensuring a two-way flow of electricity

45

and information between the power plant and the appliance, and all points inbetween” [2].

While there is a large body of work by researchers in power systems towardsrealizing this vision [11, 12, 10, 1, 5, 17, 7, 9, 8, 3, 20], there has been very littleinter-disciplinary interactions with researchers in the artificial intelligence (AI)community. Thus, Ramchurn et al. have recently challenged the AI communityto work towards realizing this future as well and develop new algorithms andmechanisms that can solve problems involving large numbers of highly hetero-geneous agents (e.g., consumers with different demand profiles and generatorswith different capabilities), each with their own aims and objectives, and havingto operate under uncertainty [21, 23].

Researchers in distributed constraint reasoning [24] are in a unique position toanswer this challenge as a number of optimization problems in smart grids can bemodeled as distributed constraint satisfaction problems (DCSPs) and distributedconstraint optimization problems (DCOPs). Furthermore, the decentralized for-mulation is desirable because this problem is inherently distributed; each agent(e.g., home owners, factory owners, and power plant managers) controls a setof local variables (e.g., the amount of power consumed in an agent’s home) andis constrained to other agents in the problem. In this paper, we present a wayto use DCOPs to model two such optimization problems – (1) the problem ofoptimizing power generation, consumption and delivery in a power network, and(2) the problem of identifying if there is a need to island a group of agents (i.e., aneed to disconnect the group of agents from the overall power grid) under unex-pected circumstances (e.g., shut down of power plants). We focus our applicationdomain on customer-driven microgrids (CDMGs), which can be viewed as onepossible instantiation of the smart grid, whereby control of power production,consumption and delivery is distributed amongst agents in the system [22].

The structure of this paper is as follows. We first provide some background onCDMGs, describe the two optimization problems mentioned above, and providea brief overview on DCOPs in Section 2. We then show how we can use DCOPs tomodel both optimization problems in Section 3 and discuss some related work inSection 4. Finally, we show some empirical evaluations of our models in Section 5and conclude in Section 6.

2 Background

We now provide some background on DCOPs and CDMGs, and describe thetwo optimization problems mentioned above.

2.1 Distributed Constraint Optimization Problems

A distributed constraint optimization problem (DCOP) [18, 19, 25, 4, 24] is definedby a tuple 〈A,X , α,D,F〉, where:

– A = {a1, . . . , ap} is a set of agents.

46

a1

a2

a3

a1

a2

a3

for i < j

ai aj Costs

0 0 50 1 81 0 201 1 3

(a) (b) (c)

Fig. 1: Example DCOP

– X = {x1, . . . , xm} is a set of variables.– α : X → A maps each variable to an agent.– D = {D1, . . . , Dm} is a set of finite domains, where Di is the domain assigned

to the variable xi.– F = {F1, . . . , Fk} is a set of n-ary constraints, where each constraint Fi :Di1 × . . . × Din 7→ N+ ∪ {0,∞} specifies the cost of each combination ofvalues of variables xi1 to xin .

A solution is a value assignment for a subset of variables. The cost of the solutionis the evaluation of all constraints on that solution. A solution is complete iff it isa value assignment for all variables. The goal is to find a cost-minimal completesolution X∗:

X∗ = arg minX

∑Fi∈F

Fi(xi1 , . . . , xin | xi1 , . . . , xin ∈ X) (1)

A constraint graph visualizes a DCOP instance, where nodes in the graphcorrespond to variables in the DCOP and edges connect pairs of variables ap-pearing in the same constraint. A DFS pseudo-tree arrangement has the samenodes and edges as the constraint graph, and: (i) there is a subset of edges,called tree edges, that form a rooted tree, and (ii) two variables in a constraintappear in the same branch of that tree. The other edges are called backedges.Tree edges connect parent-child nodes, while backedges connect a node with itspseudo-parents and its pseudo-children. A DFS pseudo-tree arrangement can beconstructed using distributed DFS algorithms [6].

Figure 1(a) shows the constraint graph of a sample DCOP, with three agentscontrolling variables with domain {0, 1}. Figure 1(b) shows one possible pseudo-tree (the dotted line is a backedge), and the third column in Figure 1(c) showsthe utilities of the three constraints.

2.2 Customer-Driven Microgrids

A customer-driven microgrid (CDMG) can be viewed as one possible instan-tiation of the smart grid, whereby control of power production, consumptionand delivery is distributed amongst agents in the system [22]. Figure 2 showsan example CDMG with six homes and a coal power plant. Each home has itsindividual power demands (e.g., washing machine, dryer, air-conditioner, heater,

47

Power Plant

Houses with solar panels

Wifi Communica6on Network

Power Lines

Fault

Island

Circuit Breaker (Switch)

Fig. 2: Illustration of a Customer-Driven Microgrid

etc.) and generation capacities (e.g., solar panels). The homes and power plantare connected to each other via a communication network (e.g., wireless net-work), which allows them to communicate with each other, as well as via apower network (i.e., network of power lines), which enables them to transferpower to one another. Each power line has a power carrying capacity due tophysical constraints such as thermal limits that prevent them from carryingmore power. In normal operations, this CDMG stays connected with the powerplant, but in the case of a fault between the homes and the power plant, thehomes can isolate themselves by opening a circuit breaker (switch) and islandthemselves as a new CDMG. This islanded CDMG now needs to solve variouspower management and control problems, preferably in a distributed manner,because a central controller can be lost due to a fault in this scenario.

2.3 Comprehensive CDMG Optimization Problem

We now describe a comprehensive CDMG optimization problem that has beenshown to subsume several classical power systems subproblems such as loadshedding, demand response, and restoration [10]. At a high level, the optimiza-tion problem frames the question: “How should a CDMG be operated such thatcollective cost is minimized for all users in the system subject to domain-specificconstraints?” Figure 3 shows a mathematical formulation of the problem, where

– n is the number of nodes (e.g., power plant, homes) in the network;– Pgi is the power generation at node i;– Pli is the power consumption at node i;– Ptij is the transmission line flow between nodes i and j;

– P gi is the maximum generation capacity at node i;

48

Minimize

n∑i=1

[ consumption preference︷︸︸︷e−AiPli

(Pli − P li

)2+

generation preference︷︸︸︷(1− e−BiPgi

)(Pgi − P gi

)2+(αi + βiPgi + γiPg

2i

)︸︷︷︸cost function

](2)

subject to the following constraints :

Pgi − Pli −∑j

Ptij = 0 ∀i (3)

Ptij + Ptji = 0 ∀i, j (4)

Pgi ∈ {0, . . . , P gi} ∀i (5)

Pli ∈ {0, pli, . . . , P li} ∀i (6)

Ptij ∈ {−P tij , . . . , P tij} ∀i (7)

Fig. 3: Mathematical Program for the Comprehensive CDMG OptimizationProblem

– P li and pli is the maximum and minimum power consumption at node i,respectively;

– P tij is the maximum transmission line capacity between nodes i and j;– αi, βi, and γi are cost coefficients; and– Ai and Bi are weights.

The objective function in Equation (2) is the summation of three parts: theagents’ consumption preference, the agents’ generation preference, and a globalcost function. Figure 4a shows a representative example of the first componentof the objective function – the agent’s consumption preference. The graph plotsthe cost of power (in dollars) against the amount of power consumed (in kWs).As the cost of power increases, the demand for power will also decrease. If thecost of power is at its highest, only a small amount of power will be consumedto satisfy only critical demands. If the cost of power is slightly cheaper, thenmore power will be consumed to also satisfy discretionary demands. Finally, ifthe cost of power is at its lowest, then more power will be consumed to alsosatisfy non-critical demands.

Figure 4b shows a representative example of the second component of theobjective function – the agent’s generation preference. The graph plots the costof power (in dollars) against the amount of power generated (in kWs). When thegenerator is switched off (generation is at 0 kW) or operating at peak capacity(generation is at 10 kW), then the cost is minimal. If the generator operatesbelow peak capacity, then the cost peaks in between the peak and minimalcapacity. If the generator operates above peak capacity, then the cost growsexponentially.

Figure 4c shows a representative example of the third component of theobjective function – the global cost function. It grows at a polynomial rate with

49

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 10

Cost ($

)

Demand (kW)

CriticalDemands

DiscretionaryDemands

Non-criticalDemands

(a)

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45

0 2 4 6 8 10 12 14

Cost ($

)

Genera.on (kW)

(b)

0.224

0.2245

0.225

0.2255

0.226

0.2265

0.227

0 2 4 6 8 10 12 14

Cost ($

)

Genera.on (kW)

(c)

Fig. 4: Comprehensive CDMG Optimization Objective Subfunctions

respect to the overall generation, which is typically used by electric utilities tomodel the operating cost of its generators. This cost includes the cost of fuel,cost to maintain the generators and the power network, etc. These three examplegraphs can vary depending on the preferences and needs of agents as well as typesof power generators.

The optimization problem is subject to the following constraints. The con-straint in Equation (3) represents the power balance principle at each node i,where the access power (amount of power generated minus the amount of powerconsumed) must be transferred out of the node to its neighboring nodes. Theconstraint in Equation (4) ensures that there is no power loss in the system; thepower flowing from node i to node j equals the power received by node j fromnode i. The constraints in Equations (5) and (6) represent the generation andconsumption limits at node i, and the constraint in Equation (7) represents themaximum current carrying capacity of the power line between nodes i and j.

2.4 Islanding Problem

We now describe another important problem in microgrids, namely the islandingproblem. Consider the power network shown in Figure 2. If the main powersupply to a neighborhood is interrupted due to a fault, then this neighborhoodwill have to rely on its own power generation. This neighborhood will isolate itselffrom the rest of the network by opening a switch. This isolated neighborhood iscalled an island. Now, if the total maximum power generation capacity of thisneighborhood is not enough to satisfy its minimum power demands, then it willresult in a black out. To avoid such failures, we modeled switches at both endsof each power line, which can be opened and closed according to the situation,to facilitate the formation of islands in the neighborhood. A viable island isone that has enough power generation capacity to satisfy its own minimumpower demands. The islanding problem frames the following question: “Whichcircuit breaker to break such that the impact is smallest?” For example, if impactis measured by the amount of collective unsatisfied power consumption, thenFigure 5 shows a mathematical formulation of the problem, where Sij is theswitch between nodes i and j, and all other variables are the same as that inFigure 3.

The objective function in Equation (8) is to minimize the difference betweenthe largest amount of possible power consumed or, alternatively, the maximum

50

Minimize∑n

(P li − Pli) (8)

subject to the following constraints :

Pgi − Pli −∑j

Ptij = 0 ∀i (9)

Ptij + Ptji = 0 ∀i, j (10)

Pgi ∈ {0, . . . , P gi} ∀i (11)

Pli ∈ {0, pli, . . . , P li} ∀i (12)

Ptij ∈ {−P tij , . . . , P tij} ∀i (13)

−Sij × P tij ≤ Ptij ≤ Sij × P tij ∀i, j (14)

Sij ≤ |Ptij | ∀i, j (15)

Sij ∈ {0, 1} ∀i, j (16)

Fig. 5: Mathematical Program for the Islanding Problem

power demand in the network and the largest amount of possible power producedor, alternatively, the maximum production capacity in the network. The opti-mization problem is subject to the same constraints as that in Figure 3 exceptfor the constraints in Equations (14), (15), and (16). Equation (16) restricts thepossible values of Sij to 0 and 1, where a value of 0 indicates that the switchbetween nodes i and j is open or, alternatively, the transmission line betweenthe two nodes is severed, and a value of 1 indicates otherwise. The constraintin Equation (14) ensures that power is flowing between nodes i and j only ifthe switch between the two nodes is closed, and the constrain in Equation (15)ensures that a switch is open if no power is flowing through it.

We have actually solved a more detailed version of this problem in a dis-tributed manner using a distributed version of the Gauss-Seidel method [12].However, this method does not have any convergence guarantees and usuallytakes a long time to converge in practice. Therefore, in this paper, we introducethe simpler version described above as another example smart grid optimizationproblem that can be modeled as a DCOP.

3 CDMG Problems as DCOPs

We now show how one can model the CDMG problems described in Sections 2.3and 2.4 as a DCOP and use existing DCOP algorithms to solve it. We will illus-trate this model using a 4-node network as an example. However, the techniquesproposed here is easily generalizable to a real network. Figure 6 shows our ex-ample, where each node is assumed to have an autonomous agent that is capableof controlling the power generation and consumption at its node.

51

Pt12 Pt32

Pt21 Pt23

Pt24 Pt42

a1 (Pg1, Pl1)

a2 (Pg2, Pl2)

a3 (Pg3, Pl3)

a4 (Pg4, Pl4)

Fig. 6: Example 4-Node Network forthe Comprehensive CDMG Optimiza-tion Problem

pg1

pl1

pt12

a1

pg4

pt42

pl4

a4

pt32

pl3

pg3

a3

pt21

pl2

pt23

a2

pt24

pg2

Fig. 7: Constraint Graph for the Net-work in Figure 6

3.1 Modeling the Comprehensive CDMG Optimization Problem

Using the definition of DCOPs described in Section 2.1, we will now define thetuple 〈A,X , α,D,F〉 for our example:

– A consists of four agents {a1, a2, a3, a4}, one for each node of the network.– X consists of variables Pgi, P li and Ptij for each agent ai and its neighboring

agent aj .– α maps each variable Pgi, Pli and Ptij to agent ai.– D consists of the domains DPgi , DPli and DPtij for each agent ai and its

neighboring agent aj , where:

– DPgi = {0, . . . , P gi} is the domain of variable Pgi;

– DPli = {pli, . . . , P li} is the domain of variable Pli;

– DPtij = {−P tij , . . . , P tij} is the domain of variable Ptij .

– F consists of the following n-ary constraints for each agent ai:– Hard constraint: Pgi − Pli −

∑aj∈Ni

Ptij = 0, where Ni is the set ofneighboring agents of agent ai.

– Hard constraint: Ptij + Ptji = 0 for each neighboring agent aj .

– Soft constraint: e−AiPli(Pli − P li)2 + (1 − e−BiPgi)(Pgi − P gi)2 + αi +βiPgi + γiPg

2i .

The cost of a hard constraint is 0 if the constraint is satisfied and∞ otherwise.The cost of a soft constraint is exactly the evaluation of the constraint.

The solution to such DCOP maps to a solution of the CDMG optimizationproblem because of the following:

– The objective function of the CDMG optimization problem in Equation (2)can be decomposed into the sum of individual functions for each agent, andeach such function is represented as a soft constraint. A DCOP solutionminimizes the sum of constraint costs, and thus minimizes the objectivefunction as well.

– The constraints in Equations (3) and (4) are satisfied, because the DCOPsolution will have a cost of ∞ otherwise, due to the hard constraints.

52

a1 (Pg1, Pl1)

a2 (Pg2, Pl2)

a3 (Pg3, Pl3)

a4 (Pg4, Pl4)

S12 S21 S23 S32

S24

S42

Pt23 Pt21

Pt12 Pt32

Pt24 Pt42

Fig. 8: Example 4-Node Network forthe Islanding Problem

pg1

pl1

pt12

a1

pl4

pt42

pg4

a4

pt32

pl3

pg3

a3

pt21

pl2

pt23

a2

pt24

pg2 S32 S12

S21 S23

S24

S42

Fig. 9: Constraint Graph for the Net-work in Figure 8

– The constraints in Equations (5), (6) and (7) are satisfied, because the possi-

ble values of the variables Pgi, Pti and Ptij are bounded from above by P gi,

P li and P tij , respectively, and from below by 0, pli, −P tij , respectively.

Given this model, the constraint graph for our example network is shown inFigure 7, where nodes correspond to variables and (hyper-)edges correspond toconstraints. We also group the variables that are owned by the same agent in adotted box and the agent owning those variables are shown in the top-left cornerof each box.

3.2 Modeling the Islanding Problem

For the islanding problem, our example network will include two new variablesSij and Sji for each transmission line between nodes i and j. Figure 8 showsthe example. The DCOP 〈A,X , α,D,F〉 is exactly the same as that describedin Section 3, except for the following:

– X also consists of variables Sij for each agent ai and its neighboring agentaj .

– α also maps each variable Sij to agent ai.

– D also consists of the domains DSij = {0, 1} for each agent ai and its neigh-boring agent aj .

– F also consists of the following n-ary constraints for each agent ai:

– Hard constraint: −Sij × P tij ≤ Ptij ≤ Sij × P tij for each neighboring agentaj . The cost of this constraint is 0 if the constraint is satisfied and ∞ other-wise.

The solution to such DCOP maps to a solution of the islanding problem forsimilar reason as described in Section 3. Given this model, the constraint graphfor our example network is shown in Figure 9.

53

4 Related Work

Researchers have used DCOPs to model some smart grid optimization problemsand used DCOP algorithms to solve them with some success. For example, Milleret al. have used max-sum to solve the problem of optimizing power generationand transmission such that carbon dioxide emissions from power generators suchas coal power plants is minimized, assuming the power consumption of each agentthat needs to be satisfied is given as an input [16]; and Kumar et al. have usedDPOP to solve the problem of reconfiguring power networks to restore power tonodes when power lines fail [13].

The two problems above are similar to ours in that all the problems modelconstraints that are derived from the power network (e.g., flow conservationconstraints). However, there are two main differences between their work andours: (1) Their objective functions are linear while ours can be non-linear withmultiple local optima, and (2) they assume that the consumption of agents aregiven as an input while we allow them to be optimized as part of the problem. Asa result of the second difference, our constraint graph can have multiple cyclesunlike previous constraint graphs [16].

5 Experimental Results

We now describe some of our experimental results on synthetic problems thatare motivated by real-world power networks for both the comprehensive CDMGoptimization problem and islanding problem. We choose DPOP [19], an optimalDCOP algorithm, and max-sum [4], a sub-optimal DCOP algorithm, to showthat these problems can be solved using off-the-shelf DCOP algorithms. We usepublicly-available implementations of both algorithms [14]. We measured theperformance of the algorithms in terms of quality of solutions found, runtimemeasured in non-concurrent constraint checks (NCCCs) [15], and network loadmeasured in the total amount of information exchanged. To test the scalabilityof DPOP and max-sum, we varied the problem size and complexity along threedimensions: (i) number of agents, (ii) size of the domains, and (iii) complexityof topology configurations. We average each data point over 10 runs.

5.1 Varying Number of Agents

For this set of experiments, we fix the parameters to the following values: P gi = i,

P li = 11− i, P tij = 2, αi = 10, βi = 0.01, γi = 0.001, Ai = 1/P li, Bi = 1/P gifor all nodes i and j. We vary the number of agents from 1 to 10 and arrangethem in a chain (see Configuration 1 in Figure 12).

Figure 10a shows the runtime of both algorithms. As expected, the runtimeof both algorithms increases with the number of agents. However, DPOP isfaster than max-sum. The reason is the following: the computational cost ofsome messages in both algorithms (UTIL messages in DPOP and function-to-variable messages in max-sum) is exponential in the arity of the constraints [19,

54

1

10

100

1000

10000

100000

1000000

1 2 3 4 5 6 7 8 9 10

Run %m

e (NCC

C)

Number of Agents

CDMG Op%miza%on Problem DPOP Max-‐Sum

1

10

100

1000

10000

100000

1000000

1 2 3 4 5 6 7 8 9 10

Run %m

e (NCC

C)

Number of Agents

Islanding Problem DPOP Max-‐Sum

(a)

0 2000 4000 6000 8000

10000 12000 14000 16000 18000 20000

1 2 3 4 5 6 7 8 9 10

Amou

nt of Informa,

on

Exchan

ged (bytes)

Number of Agents

CDMG Op,miza,on Problem DPOP Max-‐Sum

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

1 2 3 4 5 6 7 8 9 10

Amou

nt of Informa,

on

Exchan

ged (bytes)

Number of Agents


(b)

Fig. 10: Experiments Varying the Number of Agents

4]. However, the number of UTIL messages that DPOP constructs is linear inthe number of agents in the problem, and the number of function-to-variablemessages that max-sum constructs depends on the number of iterations it takesto converge, which is typically larger than the number of agents in the problem.

Figure 10b shows the network load, where both algorithms exchange a com-parable amount of information in total. In terms of solution quality, DPOP foundoptimal solutions for all problem instances and max-sum found optimal solutionsfor 40% of the comprehensive CDMG optimization problem instances and 30%of the islanding problem instances. Max-sum found infeasible solutions for theother problem instances.

5.2 Varying Domain Sizes

For this set of experiments, we fix the number of agents to 4 and arrange themin a “T” (see Configuration 2 in Figure 12 with agents A1, A2, A3, and A7).

We fix the domain sizes to the following values: P gi = k, P li = k, and P tij = k,where we vary k from 1 to 10. We fix the other parameters to the same valuesas in the first set of experiments.

Figure 11 shows our results, where the trends are similar to those in Figure 10.In terms of solution quality, DPOP found optimal solutions for all problem in-stances and max-sum found optimal solutions for 10% of the comprehensive

55

100

1000

10000

100000

1000000

10000000

100000000

1 2 3 4 5 6 7 8 9 10

Run %m

e (NCC

C)

Domain Size


100

1000

10000

100000

1000000

10000000

100000000

1 2 3 4 5 6 7 8 9 10

Run %m

e (NCC

C)

Domain Size


(a)

3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000

1 2 3 4 5 6 7 8 9 10

Amou

nt of Informa,

on

Exchan

ged (bytes)

Domain Size


3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000

1 2 3 4 5 6 7 8 9 10

Amou

nt of Informa,

on

Exchan

ged (bytes)

Domain Size


(b)

Fig. 11: Experiments Varying the Domain Size

CDMG optimization problem instances and 100% of the islanding problem in-stances. Max-sum found infeasible solutions for the other problem instances.

5.3 Varying Complexity of Topology Configurations

For this set of experiments, we fix the number of agents to 7. We fix the domain

sizes to the following values: P gi = 4, P li = 4, and P tij = 4, and we fix theother parameters to the same values as in the first set of experiments. We varythe topology of the network in 5 configurations as illustrated in Figure 12. Thedifferent configurations show a different levels of problem complexity in terms ofdifferent constraint arity.

Figure 13 shows our results, where the trends are similar to those in Fig-ure 10. In terms of solution quality, DPOP found optimal solutions for all prob-lem instances and max-sum found optimal solutions for 0% of the comprehen-sive CDMG optimization problem instances and 30% of the islanding probleminstances. Max-sum found infeasible solutions for the other problem instances.

Overall, these sets of experiments show that it is feasible to map CDMGoptimization problems and islanding problems as DCOPs and solve them usingexisting DCOP algorithms. However, some of our trends in our results requiremore analysis and a deeper understanding for their behavior.

56

A1 A2 A3

A4

A5

A6

A7

A1 A2 A3 A4 A5

A6

A7

A1 A2 A3 A4 A5

A6 A7

A1 A2 A3 A4 A5 A6

A7

A1 A2 A3 A4 A5 A6 A7

Configura2on 1

Configura2on 2 Configura2on 3

Configura2on 4 Configura2on 5

Fig. 12: Topology Configurations

6 Conclusions and Discussions

In this paper, we attempt to answer the challenge posed by Ramchurn et al. tosolve smart grid optimization problems with artificial intelligence techniques [21,23] by showing how one can model two types of customer-driven microgrid(CDMG) optimization problems, namely the comprehensive CDMG optimiza-tion problem and the islanding problem, as distributed constraint optimizationproblems (DCOPs) and solve them using existing off-the-shelf DCOP algorithms.This decentralized formulation is appealing as it allows the problem to be solvedin parallel and maintains the privacy of users by providing only local informa-tion to each agent instead of the information of the entire problem. One of thechallenges for our future work is to exploit domain properties to increase thescalability of these algorithms in order to deploy them in the real world.

Acknowledgment

This work is partially supported by the Department of Energy under the micro-grid education and research grant DE-OE0000098.

References

1. S. Bukowski and S. Ranade. Communication network requirements for the smartgrid and a path for an IP based protocol for customer driven microgrids. InProceedings of the IEEE Energytech, pages 1–6, 2012.

2. U. S. Department of Energy. Grid 2030: A national vision for electricity’s second100 years. Technical report, Department of Energy, United States of America,2003.

57

1000

10000

100000

1000000

10000000

1 2 3 4 5

Run %m

e (NCC

C)

Configura%ons


1000

10000

100000

1000000

10000000

1 2 3 4 5

Run %m

e (NCC

C)

Configura%ons


(a)

8000

8500

9000

9500

10000

10500

11000

1 2 3 4 5

Amou

nt of Informa,

on

Exchan

ged (bytes)

Configura,ons


8000

8500

9000

9500

10000

10500

11000

1 2 3 4 5

Amou

nt of Informa,

on

Exchan

ged (bytes)

Configura,ons


(b)

Fig. 13: Experiments Varying the Topology Configuration

3. P. Du and J. K. Nelson. Two-step solution to optimal load shedding in a micro-grid. In Proceedings of the IEEE Power and Energy Society (PES) Power SystemsConference and Exposition (PSCE), pages 1–9, 2009.

4. A. Farinelli, A. Rogers, A. Petcu, and N. Jennings. Decentralised coordination oflow-power embedded devices using the Max-Sum algorithm. In Proceedings of theInternational Joint Conference on Autonomous Agents and Multiagent Systems(AAMAS), pages 639–646, 2008.

5. K. Gampa, S. Ranade, P. Jain, M. Balakrishnan, and S. Yemewar. Performanceanalysis of capacity discovery algorithm on hardware platform. In Proceedings ofthe North American Power Symposium (NAPS), pages 1–7, 2010.

6. Y. Hamadi, C. Bessiere, and J. Quinqueton. Distributed intelligent backtracking.In Proceedings of the European Conference on Artificial Intelligence (ECAI), pages219–223, 1998.

7. M. Hassanzahraee and A. Bakhshai. Transient droop control strategy for paralleloperation of voltage source converters in an islanded mode microgrid. In Pro-ceedings of the IEEE International Telecommunications Energy Conference (INT-ELEC), pages 1–9, 2011.

8. Y.-Y. Hong and S.-Y. Ho. Determination of network configuration consideringmultiobjective in distribution systems using genetic algorithms. IEEE Transactionson Power Systems, 20(2):1062–1069, 2005.

9. O. Ipinnimo, S. Chowdhury, and S. P. Chowdhury. Voltage dip mitigation withdg integration: A comprehensive review. In Proceedings of the IEEE InternationalConference on Power Electronics, Drives and Energy Systems (PEDES), pages1–10, 2010.

58

10. P. Jain, S. Gupta, S. Ranade, and E. Pontelli. Optimum operation of a customer-driven microgrid: A comprehensive approach. In Proceedings of the IEEE Inter-national Conference on Power Electronics, Drives and Energy Systems (PEDES),2012.

11. P. Jain and S. Ranade. Capacity discovery in customer-driven micro-grids. InProceedings of the North American Power Symposium (NAPS), pages 1–6, 2009.

12. P. Jain, S. Ranade, and S. Srivastava. Island identification in customer-drivenmicro-grids. In Proceedings of the IEEE Power and Energy Society (PES) Trans-mission and Distribution Conference and Exposition, pages 1–7, 2010.

13. A. Kumar, B. Faltings, and A. Petcu. Distributed constraint optimization withstructured resource constraints. In Proceedings of the International Joint Confer-ence on Autonomous Agents and Multiagent Systems (AAMAS), pages 923–930,2009.

14. T. Leaute, B. Ottens, and R. Szymanek. FRODO 2.0: An open-source frameworkfor distributed constraint optimization. In Proceedings of the Distributed ConstraintReasoning Workshop, pages 160–164, 2009.

15. A. Meisels, E. Kaplansky, I. Razgon, and R. Zivan. Comparing performance ofdistributed constraints processing algorithms. In Proceedings of the DistributedConstraint Reasoning Workshop, pages 86–93, 2002.

16. S. Miller, S. Ramchurn, and A. Rogers. Optimal decentralised dispatch of embed-ded generation in the smart grid. In Proceedings of the International Joint Confer-ence on Autonomous Agents and Multiagent Systems (AAMAS), pages 281–288,2012.

17. J. Mitra and M. R. Vallem. Determination of storage required to meet reliabilityguarantees on island-capable microgrids with intermittent sources. IEEE Trans-actions on Power Systems, 27(4):2360–2367, 2012.

18. P. Modi, W.-M. Shen, M. Tambe, and M. Yokoo. ADOPT: Asynchronous dis-tributed constraint optimization with quality guarantees. Artificial Intelligence,161(1–2):149–180, 2005.

19. A. Petcu and B. Faltings. A scalable method for multiagent constraint optimiza-tion. In Proceedings of the International Joint Conference on Artificial Intelligence(IJCAI), pages 1413–1420, 2005.

20. S. A. Pourmousavi and M. H. Nehrir. Demand response for smart microgrid: Initialresults. In Proceedings of the IEEE Power and Energy Society (PES) InnovativeSmart Grid Technologies, pages 1–6, 2011.

21. S. Ramchurn, Perukrishnen, Vytelingum, A. Rogers, and N. Jennings. Putting the’smarts’ into the smart grid: A grand challenge for artificial intelligence. Commu-nications of the ACM, 55(4):86–97, 2012.

22. S. Ranade, J. Mitra, S. Suryanarayanan, and P. Riberio. A holistic approach tocustomer-driven microgrids. Technical Report NSF Grants ECCS#0702208 and#0757956, 2011.

23. A. Rogers, S. Ramchurn, and N. Jennings. Delivering the smart grid: Challengesfor autonomous agents and multi-agent systems research. In Proceedings of theAAAI Conference on Artificial Intelligence (AAAI), pages 2166–2172, 2012.

24. W. Yeoh and M. Yokoo. Distributed problem solving. AI Magazine, 33(3):53–65,2012.

25. R. Zivan. Anytime local search for distributed constraint optimization. In Pro-ceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 393–398,2008.

59

Large Scale Multiagent-Based Simulation usingNetLogo for implementation and evaluation of

the distributed constraints

Ionel Muscalagiu1, Popa Horia Emil2 and Jose Vidal3

1 The Faculty of Engineering of Hunedoara, The ”Politehnica” University ofTimisoara, Revolutiei, 5, Romania,[email protected]

2 The Faculty of Mathematics and Informatics,The University of the West,Timisoara, V.Parvan 4, Romania

[email protected] Computer Science and Engineering, University of South Carolina,

Columbia SC 29208,[email protected]

Abstract. Distributed Constraint programming (DisCSP/DCOP) is aprogramming approach used to describe and solve large classes of prob-lems such as searching, combinatorial and planning problems. This typeof distributed modeling appears naturally in many problems for whichthe information is distributed among many agents. Modeling and sim-ulation are essential tools in many areas of science and engineering, in-cluding computer science. The purpose of this paper is to present anopen-source solution for the implementation and evaluation of the dis-tributed constraints in NetLogo using computer clusters. Our tool allowsthe use of various search techniques and also the evaluation and analy-sis of the performance of the asynchronous search techniques. We alsoexplain our methodology for running the NetLogo models in a clustercomputing environment or on a single machine, varying both parametervalues and/or the random number of agents.

1 Introduction

Most multiagent systems are characterized by a set of autonomous agents eachwith local information and ability to perform an action when the set of actionsof all must be coordinated in order to achieve a desired global behavior. Con-straint programming is a programming approach used to describe and solve largeclasses of problems such as searching, combinatorial and planning problems. ADistributed Constraint Satisfaction Problem (DisCSP) is a constraint satisfac-tion problem in which variables and constraints are distributed among multipleagents [15], [7]. A Distributed Constraint Optimization Problem (DCOP) is sim-ilar to the constraint satisfaction problem except that the constraints return areal number instead of a Boolean value and the goal is to minimize the value ofthese constraint violations.

60

2 Ionel Muscalagiu et al.

Distributed Constraint Satisfaction/Distributed Constraint Optimization isa framework for describing a problem in terms of constraints that are knownand enforced by distinct participants (agents). The constraints are described onsome variables with predefined domains, and have to be assigned to the samevalues by the different agents. This type of distributed modeling appeared nat-urally for many problems for which the information was distributed to manyagents. DisCSPs are composed of agents, each owning its local constraint net-work. Variables in different agents are connected by constraints forming a net-work of constraints. Agents must assign values to their variables so that allconstraints between agents are satisfied. Instead, for DCOP a group of agentsmust distributedly choose values for a set of variables so that the cost of a setof constraints over the variables is either minimized or maximized. Distributednetworks of constraints have proven their success in modeling real problems.

There are some algorithms for performing distributed search in cooperativemultiagent systems where each agent has some local information and wherethe goal is to get all the agents to set themselves to a state such that the setof states in the system is optimal. There exist complete asynchronous search-ing techniques for solving the DisCSP in this constraints network, such as theABT (Asynchronous Backtracking), AWCS (Asynchronous Weak Commitment)[15], ABTDO (Dynamic Ordering for Asynchronous Backtracking) [7], AAS(Asynchronous Search with Aggregations) [13], DisDB (Distributed DynamicBacktracking) [2] and DBS (Distributed Backtracking with Sessions) [9]. Also,for DCOP there are many algorithms among whom we name ADOPT (Asyn-chronous Distributed OPTimization) [8] or DPOP (Dynamic Programming Op-timization Protocol) [11]. We find that many multiagent problems can be re-duced to a distributed constraints problem. Thus, the asynchronous searchingtechniques for solving the the DisCSP have many different applications.

Developing of evaluation and testing tools for the search techniques becamea necessity. There are very few platforms for implementing and solving DisCSPproblems: DisChoco [17], DCOPolis [12] and FRODO [4]. Such a tool allows theuse of various search techniques so that we can decide which is the most suitableone for that particular problem. Also, these tools can be used for the study ofagents’ behavior in several situations, like the priority order of the agents, thesynchronous and asynchronous case, apparition of delays in message transmis-sion, therefore leading to identifying possible enhancements of the performancesof asynchronous search techniques.

The asynchronous search techniques involves concurrent (distributed) pro-gramming. The agents can be processes residing on a single computer or onseveral computers, distributed within a network. The implementation of anyasynchronous search techniques supposes building the agents and the existingconstraints, the implementation of the links between the agents and the com-munication channels between them. The implementation of asynchronous searchtechniques can be done in any programming language allowing a distributed pro-gramming, such as Java, C, C++ or other. Nevertheless, for the study of suchtechniques, for their analysis and evaluation, it is easier and more efficient to

61

DisCSP-Netlogo - an open-source tool in NetLogo for distributed constraints 3

implement the techniques under a certain distributed environment, such as thenew generation of multiagent modeling language (NetLogo [16], [19], [20], [5]).

NetLogo is regarded as one of the most complete and successful agent simula-tion platforms [16], [5]. NetLogo is a high-level platform, providing a simple yetpowerful programming language, built-in graphical interfaces and the necessaryexperiment visualization tools for quick development of simulation user inter-face. It offers a collection of complex modeling systems, developed in time. Themodels could give instructions to hundreds or thousands of independent agentswhich could all operate in parallel. It is a environment written entirely in Java,therefore it can be installed and activated on most of the important platforms.Although, excellent for “modeling social and emergent phenomena”, i.e. agentbased simulations that consist of a large number of reactive agents, it lacks thefacilities to model easily more complex goal oriented agent behaviors.

Modeling and simulation is an essential tool in many areas of science and en-gineering, including in computer science, for example, for analyzing the perfor-mances of asynchronous search techniques. The asynchronous search techniquescan be remarked by the existence of a very large number of elements that can beintroduced, without affecting the completeness of the algorithm. For example,processing the message in packets or individual, storage or not of the nogoodmessages, message filtering, nogood learning and storing them for each value.Each run is affected by delays appeared in message transmission. Thus, a cor-rect evaluation assumes a large number of runs, with different data sets. Thepurpose of this paper is to present an open-source solution for implementationand evaluation of the asynchronous search techniques in NetLogo, for a greatnumber of agents, model that can be run on a cluster of computers. However,this model can be used in the study of agents’ behavior in several situations, likethe priority order of the agents, the synchronous and asynchronous case, etc.

Our goal is to supply the programmer with sources that can be updated anddeveloped so that each can take profit from the experience of those before them.In fact, it is the idea of development adopted in the Linux operating system.In this paper is proposed such an implementation and evaluation solution forexisting algorithms in the DisCSP/DCOP framework. The proposed approachis open-source and can used for implementing any search technique. Any re-searcher has access to the existing implementations and can start from these fordeveloping new ones. The platform offers modules for various problems of evalu-ation such as: the random binary problems, random graph coloring, multi-robotexploration.

This paper synthesizes all the tries of modeling and implementation in Net-Logo for the asynchronous search techniques. Many preliminary studies weredone and published on the modeling, implementation and evaluation of the asyn-chronous search techniques in NetLogo. Many implementations were done for aclass of algorithms from the ABT and AWCS families (DisCSP), respectivelyADOPT (DCOP). They can be downloaded from the websites [19], [20].

This paper is organized as follows. Section 2 presents a solution for modelingand simulation in Netlogo of the distributed constraints. In Section 3, we explain

62


our methodology for running the NetLogo models in a cluster computing envi-ronment. In this section is presented an architecture for the multiagent system.Section 4 presents briefly a discussion on the facilities offered by the NetLogoplatform. The conclusions of the paper are in Section 5.

2 Modeling and implementing of the asynchronous searchtechniques in NetLogo

In this section we present a solution of modeling and implementation for theexisting agents’ process of execution in the case of the asynchronous searchtechniques. This open-source solution, called DisCSP-NetLogo is extended sothat it is able to run on a larger number of agents, model runnable on a clusterof computers and is presented below. This modeling can also be used for anyof the asynchronous search techniques, such as those from the AWCS family[15], ABT family [2], DisDB [2], DBS [9]. Implementation examples for thesetechniques can be found on the sites in [19],[20]. This modeling approach inNetLogo was first presented in [10] in a preliminary form.

The modeling of the agents’ execution process is structured on two levels,corresponding to the two stages of implementation [10], [18]. The definition ofthe way in which asynchronous techniques are programmed so that the agentsrun concurrently and asynchronously constitutes the internal level of the model.The second level refers to the way of representing the surface of the implementedapplications. This is the exterior level.

In any NetLogo agent simulation, four entities(objects) participate:

– The Observer, that is responsible for simulation initialisation and control.This is a central agent.

– Patches, i.e. components of a user defined static grid (world) that is a 2Dor 3D world, which is inhabited by turtles. Patches are useful in describingenvironment behavior.

– Turtles that are agents that “live” and interact in the world formed bypatches. Turtles are organised in breeds, that are user defined groups sharingsome characteristics, such as shape, but most importantly breed specific userdefined variables that hold the agents’ state.

– Links agents that “connect” two turtles representing usually a spatial/logicalrelation between them.

Both patches, turtles and links carry their own internal state, stored in a setof system and user-defined variables local to each agent. The definition of turtlespecific variables allows them to carry their own state and facilitates the encodingof complex behavior. Agents’ behavior can be specified by the domain specificNetLogo programming language, that supports functions (called reporters) andprocedures. The language includes a large variety of primitives for turtles motion,environment inspection, etc.

63


2.1 Agents’ simulation and initialization

First of all, the agents are represented by the breed type objects (those are of theturtles type). In Figure 1 is presented the way the agents are defined togetherwith the global data structures proprietary to the agents. We implement in open-source NetLogo the agents’ process of execution in the case of the asynchronoussearch techniques [10],[18]:

S1. Agents’ simulation and initialization in DisCSP-NetLogo. First of all,the agents are represented by the breed type objects (those are of the turtlestype). Figure 1 shows the way the agents are defined together with the globaldata structures proprietary to the agents.

breeds [agents]globals[variables that simulate the memory shared by all the agents]agent-own [Message-queue Current-view MyValue Nogoodsnr-constraintc messages-received-ok messages-received-nogood AgentC-Cost];Message-queue contains the received messages.;Current-view is a list indexed on the agent’s number, of the form [v0 v1...],;vi = -1 if we don’t know the value of that agent.;nogoods is the list of inconsistent value [0 1 1 0 ... ], where 1 is inconsistent.;messages-received-ok, etc, count the number of messages received by an agent.;nr-cycles -the number of cycles, nr-constraintc - the number of constraints checked;AgentC-Cost - a number of non-concurrent constraint checks

Fig. 1. Agents’ definition in DisCSP-Netlogo for the asynchronous search techniques

This type of simulation can be applied for different problems used at evalu-ation and testing:

-the distributed problem of the n queens characterized by the number of queens.-the distributed problem of coloring of a randomly generated graph characterized

by the number of nodes, colors and the number of connections between the nodes.-the randomly generated (binary) CSPs characterized by the 4-tuple (n, m,p1,p2),

where: n is the number of variables; m is the uniform domain size; p1 is the portionof the n · (n − 1)/2 possible constraints in the constraint graph; p2 is the portionof the m ·m value pairs in each constraint that are disallowed by the constraint.

-the randomly generated problem that has a structure of scale-free network(the constraint graph has a structure of scale-free network) [1]. An instance DisCSPthat has a structure of scale-free network have a number of variables with a fixeddomain and are characterized by the 5-tuple (n, m, t,md, γ), where n is the num-ber of variables, m is the domain size of each variable; t (the constraint tightness)determining the proportion of value combinations forbidden by each constraint,md=the minimal degree of each nodes and γ is the exponent that depends on eachnetwork structure. A scale-free network is characterized by a power-law degreedistribution as follows p(k) ∝ k−γ [1].

-the multi-robot exploration problem [3] are characterized by the 6-tuple (n, m,p1, sr, cr, obsd), where:

64


– n is the number of robots exploring an environment, interact and communicatewith their spatial neighbors and share a few common information (informationabout already explored areas);

– m = 8 is the domain size of each variable; Dom(xi)is the set of all 8 cardinaldirections that a robot Ai can choose to plan its next movement.

– p1 - network-connectivity, sr - the sensor range of a robot, cr - the communi-cation range of a robot;

– obsd - obstacles-density. We have considered environments with different levelsof complexity depending on: the number of obstacles, the size of the obstacles,the density of the obstacles.

For these types of problems used in the evaluation there are NetLogo modulesthat can be included in the future implementations. The modules are availableon the website [19]. For each module are available procedures for random gener-ation of instances for the choosen problems, together with many ways of staticordering of the agents. Also, there are procedures for saving in files the gener-ated instances and reusing them for the implementation of other asynchronoussearch techniques. On the website [19] can be found many modules that cangenerate problem instances (both solvable and unsolvable problems) with vari-ous structures for the previous problems, depending on various parameters (uni-form random binary DisCSP generator, scale-free network instance generator forDisCSP). An example is presented in Figure 2 for the random binary problems.

S2.Representation and manipulation of the messages. Any asynchronoussearch technique is based on the use by the agents of some messages for com-municating various information needed for obtaining the solution. The agents’communication is done according to the communication model introduced in[15].

The communication model existing in the DisCSP frame supposes first of allthe existence of some channels for communication, of the FIFO type, that canstore the messages received by each agent. The way of representation of the mainmessages is presented as follows:◦ (list ”type message” contents Agent-costs) ;

The simulation of the message queues for each agent can be done usingNetlogo lists, for whom we define treatment routines corresponding to the FIFOprinciples. These data structures are defined in the same time with the definitionof the agents. In the proposed implementations from this paper, that structurewill be called message-queue. This structure property of each agent will containall the messages received by that agent.

The manipulation of these channels can be managed by a central agent (whichin NetLogo is called observer) or by the agents themselves. In this purpose wepropose the building of a procedure called go for global manipulation of themessage channels. It will also have a role in detecting the termination of theasynchronous search techniques’ execution. That go procedure is some kind of a“main program”, a command center for agents. The procedure should also allowthe management of the messages that are transmitted by the agents. It needs tocall for each agent another procedure which will treat each message accordingto its type. This procedure will be called handle-message, and will be used tohandle messages specific to each asynchronous search technique.

65


breeds [agents-nodes]breeds [edges];nodes = agents, each undirected edge goes from a to b;edges = links agents that connect two agents-nodes;representing usually a spatial/logical relation between them.globals[Orders done nr-cycles domain-colour-list no-more-messages]agent-own [Message-queue Current-view MyValue Nogoods Neighbours-list ChildrenAParentA nr-constraintc messages-received-ok messages-received-nogood AgentC-Cost]

includes[”RBP.nls””StaticOrders.nls”];are included the modules for generating instances and choosing a static order for the agents...to setup ; Setup the model for a run, build a constraints graph.setup-globals ; setup Global Variablessetup-patches ; initialize the work surface on which the agents movesetup-turtles we generate the objects of the turtles type that simulate the agentssetup-random-binary-problems ; or LoadRBRFile; we generate a the types of problems used at the evaluation; or we load an instance generated and salved previously.CalculateOrdersMaxCardinality ; is selected from variable-ordering heuristicssetup-DisCSP //we initialize the data structures necessary for the DisCSP algorithm

end

Fig. 2. Templates for agents’ definition in DisCSP-Netlogo for the the random binaryproblems

S3. Definition and representation of the user interface.

As concerning the interface part, it can be used for the graphical represen-tation of the DisCSP problem’s objects (agents, nodes, queens, robots, obstacle,link, etc.) of the patch type. It is recommended to create an initialization pro-cedure for the display surface where the agents’ values will be displayed.

To model the surface of the application are used objects of the patches type.Depending on the significance of those agents, they are represented on the Net-Logo surface. In Figure 3 is presented the way in NetLogo for representing theagents.

S4.Running the DisCSP problems.

The initialization of the application supposes the building of agents and ofthe working surface for them. Usually are initialized the working context of theagent, the message queues, the variables that count the effort carried out by theagent. The working surface of the application should contain NetLogo objectsthrough whom the parameters of each problem could be controlled in real time:the number of agents (nodes, robots), the density of the constraints graph,etc.These objects allow the definition and monitoring of each problem’s parameters.For launching the simulation is proposed the introduction of a graphical objectof the button type and setting the forever property. That way, the attached code,in the form of a NetLogo procedure (that is applied on each agent) will run con-

66


(a) Netlogo representation for the graph coloring problem

(b) The 2D square lattice representation for the multi-robot explo-ration problem

(c) The 3D square lattice representation (d) A scale-free network with 900nodes

Fig. 3. Representation of the environment in the case three problems

tinuously, until emptying the message queues and reaching the stop command.Another important observation is tied to attaching the graphical button to theobserver [10]. The use of this approach allows obtaining a solution of implemen-tation with synchronization of the agents’ execution. In that case, the observeragent will be the one that will initiate the stopping of the DisCSP algorithm

67


execution (the go procedure is attached and handled by the observer). Theseelements lead to the multiagent system with synchronization of the agents’ exe-cution. If it’s desired to obtain a system with asynchronous operation, the secondmethod of detection will be used, which supposes another update routine [10],[19]. That new go routine will be attached to a graphical object of the buttontype which is attached and handled by the turtle type agents.

(a) NetLogo’s graphical interface (b) NetLogo’s code tab

Fig. 4. NetLogo implementation of the ABT with temporary links for the randombinary problems, n=100 agents

In Figure 4 is captured an implementation of the ABT with temporary linksfor the random binary problems technique that uses the model presented. Theupdate procedure is attached and handled by the turtle type agents (Figure 4).These elements lead to a multiagent system with agents handling asynchronouslythe messages. Implementation examples for the ABT family, DisDB, DBS andthe AWCS family can be downloaded from the website [19].

More details of implementation of the DisCSP/DCOP in Netlogo are notpresented here but are available as a tutorial and downloadable software from[19].

2.2 The evaluation of the asynchronous search techniques

Another important thing that can be achieved in NetLogo is related to theevaluation of the asynchronous algorithms. The model presented within thispaper allows the monitoring of the various types of metrics:

– the number of messages transmitted during the search: messages-received-ok,messages-received-nogood, messages-received-nogood-obsolete, etc.

– the number of cycles. A cycle consists of the necessary activities that all theagents need in order to read the incoming messages, to execute their localcalculations and send messages to the corresponding agents. This metricsallows the evaluation of the global effort for a certain technique

68


– the number of constraints checked. The time complexity can be also evalu-ated by using the total number of constraints verified by each agent. It is ameasurement of the global time consumed by the agents involved. It allowsthe evaluation of the local effort of each agent. The number of constraintsverified by each agent can be monitored using the variables proprietary toeach robots called nr − constraintc.

– a number of non-concurrent constraint checks. This can be done by intro-ducing a variable proprietary to each agent, called AgentC−Cost. This willhold the number of the constraints concurrent for the agent. This value issent to the agents to which it is connected. Each agent, when receiving amessage that contains a value SenderC−Cost, will update its own monitorAgentC − Cost with the new value.

– the total traveled distance by the robots. This metric is specific to the multi-robot exploration problem [3]. It makes it possible to evaluate if an algorithmis effective for mobile and located agents in an unknown environment.

The models presented allow real time visualization of metrics. During run-time, using graphic controls, various metrics are displayed and updated in realtime, after each computing cycle. Also, the metrics’ evolution can be repre-sented as graphics using plot-like constructions (models from [19] include sometemplates).

3 Running on a Linux cluster

In this paragraph we will present a methodology to run the proposed NetLogomodels in a cluster computing environment or on a single machine. We utilizethe Java API of NetLogo as well as LoadLeveler. LoadLeveler is a job schedulerwritten by IBM, to control scheduling of batch jobs. This solution is not re-stricted to operate only in this configuration, it can be used on any cluster withJava support and it operates with other job schedulers as well (such as Condor).

Such a solution will allow running a large number of agents (nodes, variables,robots, queens, etc.). The first tests allowed running of as much as 500 agents,in the conditions of a high density constraint graph. The first experiments weredone on the InfraGrid cluster from the UVT HPC Centre [21], on 100 computingsystems (an hybrid x86 and NVIDIA Tesla based). InfraGRID is an Linux onlycluster based on a mixture of RedHat Enterprise Linux 6 and CentOS 6. ForWorkload Management, JOB execution is managed at the lowest level by IBMLoadLeveler.

The methodology proposed in the previous paragraph that uses the GUIinterface will run on a single computer. In this paragraph we will present a newsolution, without GUI, that can run on a single computer or on a cluster.

The proposed approach uses the NetLogo model presented previously, runnablewithout the GUI, with many modifications. In order to run the model in thatmanner is used a tool named BehaviorSpace, existent in NetLogo. BehaviorSpaceis a software tool integrated with NetLogo that allows you to perform experi-ments with models in the “headless” mode, that is, from the command line,

69


without any graphical user interface (GUI). This is useful for automating runson a single machine, and can also be used for running on a cluster.

BehaviorSpace runs a model many times, systematically varying the model’ssettings and recording the results of each model’s run. Using this tool we developan experiment that can be runned on a single computer (with a small numberof agents) or, in the headless mode on a cluster (with a large number of agents).

We will now present the methodology for creating such an experiment [6].The steps necessary for the implementation of a multiagent system are as fol-lows:S1. Create a NetLogo model according to the previous model for the asyn-chronous search techniques and for the types of problems used at the evaluation.For running it on the cluster and without GUI some adaptations have to bemade. First, the NetLogo model must have a procedure called setup to instan-tiate the model and to prepare the output files. At a minimum it will need thefollowing lines of code in Figure 5.

to setup ; Setup the model for a run, build a constraints graph.setup-globals ; setup Global Variablessetup-patches ; initialize the work surface on which the agents movesetup-turtles ; we generate the objects of the turtles type that simulate the agentssetup-random-problem ; we generate the types of problems used at the evaluation.setup-DisCSP ; we initialize the data structures necessary for the DisCSP algorithm

end

Fig. 5. The Setup Procedure in DisCSP-Netlogo.

Next, all models must also have a go (update) procedure. The go procedureis a bit different than the usual NetLogo program. The wrapper runs the NetL-ogo program by asking it to loop for a certain number of times and allows thefinalizing of the DisCSP algorithm.

Usually for the DisCSP algorithms, the solution is generally detected onlyafter a break period in sending messages (this means there is no message beingtransmitted, state called quiescence). This situation can be resolved by checkingthe message queues, queues that need to be empty. In such a procedure, thatneeds to run continuously (until emptying the message queues) for each agent,the message queue is verified (to detect a possible break in message transmitting).

The procedure should also allow the management of messages that are trans-mitted by the agents. The procedure needs to call for each agent another pro-cedure (that is called handle-message) and is used to handle messages specificto each asynchronous search technique. The two procedures are the most im-portant from the point of view of their messages handling way asynchronous orsynchronous (way of work that defines the asynchronous techniques).

The first solution of termination detection is based on some of the facilitiesof the NetLogo: the ask-concurrent command that allows the execution of thecomputations for each agent and the existence of the central observer agent.The handling of the communication channels will be performed by this centralagent. These elements will lead to a variant of implementation in which the

70


to go // The running procedureset no-more-messages trueset nr-cycles nr-cycles + 1ask-concurrent agents [if (not empty? message-queue)[set no-more-messages false]]

if (no-more-messages) [WriteSolutionstop]ask-concurrent agents [handle-message]

end

Fig. 6. The Go Procedure in DisCSP-Netlogo for the asynchronous search techniqueswith synchronization of the agents’ execution

synchronizing of the agents’ execution is done. Sample code for the go procedurein the case of asynchronous search techniques can be found in Figure 6.

S2. Create an experiment using BehaviorSpace and parse the NetLogo fileinto an input XML file (so that it can be runned in the headless mode, that iswithout GUI). In Figure 7 is presented a simple example of XML file.

<experiments><experiment name=”experiment” repetitions=”10” ><setup>setup< /setup><go>go-mrp< /go><final>WriteMetrics< /final><exitCondition>Final< /exitCondition><enumeratedValueSet variable=”p1-network-connectivity”><value value=”0.2”/>

...< /experiment>

< /experiments>

Fig. 7. The XML file for the multi-robot exploration problem

To finalize the run and adding up the results it is recommended the use ofa Netlogo reporter and a routine that writes the results. The run stops if thisreporter becomes true.

java −Xmx1024m # use up to 1GB RAM-Dfile.encoding=UTF − 8 # for compatible cross-platform-classpath NetLogo.jar # specify main jarorg.nlogo.headless.Main # specify that we want headless, not GUI−−model NetLogo−Model.nlogo # the Netlogo model that runs−−experiment name− of − experiment # the name of the experiment

Fig. 8. The script for multiple runs on a cluster for DisCSP

S3. Create a Linux shell script (in sh or in bash) that describes the job forLoadLeveler. Once the Netlogo model is completed with the experiment created

71


with the BehaviorSpace tool, it is time to prepare the system for multiple runs.To do this, first create a script that allows running with no GUI, an examplescript is presented in Figure 8.

Fig. 9. Architecture of a multiagent system with synchronization of the agents’ execu-tion

In Figure 9 is presented this multiagent system’s architecture for running ona cluster of computers.

4 Discussion

There are very few platforms for implementing and solving DisCSP problems:DisChoco [17], DCOPolis [12] and FRODO [4]. In [17] a DisCSP/DCOP platformshould have the following features:

- be reliable and modular, so it is easy to personalize and extend;- be independent from the communication system;- allow the simulation of multiagent systems on a single machine;- make it easy to implement a real distributed framework;- allow the design of agents with local constraint networks.

The solution presented in this paper, based on NetLogo, has these features:

- the modules can be adapted and personalised for each algorithm. There is avery large community of NetLogo users that can help for development.- it allows the communication between agents, without being necessary to calldirectly the communication system (it is independent of the network support);

72


- the models can allow the simulation of multiagent systems on a single machine,and also on a cluster;- DisCSP-NetLogo provides a special agent Observer, that is responsible for sim-ulation initialisation and control interface. The AgentObserver allows the userto track operations of a DisCSP algorithm during its execution. Also, there are 4tools (Globals Monitor, Turtle Monitor, Patch Monitor and Link Monitor) thatallow monitoring of global variables values, the values of the variables associatedto the agents.- there are facilities such as agentsets that allow the implementation of agentsthat manage more variables.- manipulating large quantities of information requires the use of databases, forexample for nogood management, using the SQL extension of NetLogo we canstore and access values from databases.- NetLogo allows users to write new commands and reporters in Java and usethem in their models (using extensions).

5 Conclusions

In this paper we introduce an model of study and evaluation for the asynchronoussearch techniques in NetLogo using the typical problems used for evaluation,model called DisCSP-NetLogo.

An open-source solution for implementation and evaluation of the asyn-chronous search techniques in NetLogo, for a great number of agents, modelthat can be run on a cluster of computers is presented. Such a tool allows theuse of various search techniques so that we can decide on the most suitable one.

In this paper we have developed a methodology to run NetLogo models in acluster computing environment or on a single machine, varying both parametervalues and/or random number of robots. We utilize the Java API of NetLogoas well as LoadLeveler. The solution without GUI allows to be run on a clusterof computers in the mode with synchronization, as opposed to the GUI solutionthat can be runned on a single computer and allows running in both ways: withsynchronization or completely asynchronously.

The open-source solution presented in this paper can be used as an alternativefor testing the asynchronous search techniques, in parallel with the platformsalready consecrated as DisChoco, DCOPolis, FRODO, etc.

Future research will include running more sets of experiments with two largefamilies: the ABT family and the AWCS family applied to the typical evaluationproblems (the distributed problem of the m-coloring of a randomly generatedgraph, the multi-robot exploration problem, the random binary CSPs). Also,building modules for integrating MySQL and PostgreSQL databases for nogoodmanagement.

References

1. A. L. Barabasi and A. L Albert, Emergence of scaling in random networks, Science,286 (1999), p. 509-512.

73


2. C. Bessiere, I. Brito, A. Maestre, P. Meseguer. Asynchronous Backtracking withoutAdding Links: A New Member in the ABT Family. A.I., 161:7-24, 2005.

3. A. Doniec, N. Bouraqadi, M. Defoort, V-T Le, and S. Stinckwich. Multi-robot explo-ration under communication constraint: a disCSP approach. In 5th National Con-ference on “Control Architecture of Robots”, May 18-19, 2010 - Douai, FRANCE.

4. Leaute, T., Ottens, B., Szymanek, R.: FRODO 2.0: An open-source framework fordistributed constraint optimization. In Proceedings of the IJCAI’09 Distributed Con-straint Reasoning Workshop (DCR’09). pp. 160–164. Pasadena, California, USA,2009. Available: http://liawww.ep.ch/frodo/.

5. Lytinen, S. L. and Railsback, S. F. The evolutionof agent-based simulation platforms:A review of NetLogo 5.0 and ReLogo. In Proceedings of the Fourth InternationalSymposium on Agent-Based Modeling and Simulation, Vienna, 2012.

6. Koehler, M., Tivnan, B., and Upton, S. Clustered Computing with NetLogo andRepast J: Beyond Chewing Gum and Duct Tape. Proceedings of the Agent2005conference, Chicago, IL, 2005.

7. A. Meisels, Distributed Search by Constrained Agents: algorithms, performance,communication, Springer Verlag, London, 2008.

8. Modi, P., Shen, W.-M., Tambe, M., Yokoo, M., ADOPT: Asynchronous distributedconstraint optimization with quality guarantees, A.I., 2005, 161 (1-2), 149-180.

9. Monier, P., Piechowiak, S. et Mandiau, R. A complete algorithm for DisCSP: Dis-tributed Backtracking with Sessions (DBS). In Second International Workshop on:Optimisation in Multi-Agent Systems (OptMas), Eigth Joint Conference on Au-tonomous and Multi-Agent Systems (AAMAS 2009), Budapest, Hungary.

10. Muscalagiu, I., Jiang, H., Popa, H. E. Implementation and evaluation model for theasynchronous techniques: from a synchronously distributed system to a asynchronousdistributed system. Proceedings of the 8th International Symposium on Symbolic andNumeric Algorithms for Scientific Computing (SYNASC2006), Timisoara, IEEEComputer Society Press, 209–216, 2006.

11. A. Petcu. A Class of Algorithms for Distributed Constraint Optimization. PhD.Thesis No. 3942, Swiss Federal Institute of Technology (EPFL), Lausanne, 2007.

12. Sultanik, E.A., Lass, R.N., Regli,W.C. Dcopolis: a framework for simulating anddeploying distributed constraint reasoning algorithms. AAMAS Demos, page 1667-1668. IFAAMAS, (2008).

13. Silaghi M.C., D. Sam-Haroud, B. Faltings. Asynchronous Search with Aggregations.In Proceedings AAAI’00, 917-922.

14. S. Tisue and U. Wilensky. Netlogo: Design and implementation of a multi-agentmodeling environment. In Proceedings of the Agent 2004 Conference, 2004.

15. M. Yokoo, E. H. Durfee, T. Ishida, K. Kuwabara. The distributed constraint sat-isfaction problem: formalization and algorithms. IEEE Transactions on Knowledgeand Data Engineering 10(5), page. 673-685, 1998.

16. U. Wilensky. NetLogo itself: NetLogo. Available:http://ccl.northwestern.edu/netlogo/. Center for Connected Learning andComputer-Based Modeling, Northwestern University. Evanston, 1999.

17. M. Wahbi, R. Ezzahir, C. Bessiere, El Houssine Bouyakhf. DisChoco 2: A Platformfor Distributed Constraint Reasoning. Proceedings of the IJCAI11 Workshop onDistributed Constraint Reasoning (DCR11), pages 112–121, Barcelona, 2011.

18. J. Vidal. Fundamentals of Multiagent Systems with NetLogo Examples. Available:http://multiagent.com/p/fundamentals-of-multiagent-systems.html.

19. MAS NetLogo Models-a. Available: http://discsp-netlogo.fih.upt.ro/.20. MAS NetLogo Models-b. Available:http://jmvidal.cse.sc.edu/netlogomas/.21. InfraGRID Cluster. Available:http://hpc.uvt.ro/infrastructure/infragrid/

74

Applying MaxSum to DCOP MST

Harel Yedidsion and Roie Zivan,Department of Industrial Engineering and Management,

Ben-Gurion University of the Negev,Beer-Sheva, Israel

{yedidsio,zivanr}@bgu.ac.il

Abstract. The DCOP MST model allows to represent and solve problems thatinvolve a team of mobile sensors used for coverage of targets in different en-vironments. Unlike the standard DCOP model, DCOP MST handles dynamicproblems in which the sets of alternative assignments for agents and their sets ofneighbors, derive from their physical locations which are dynamic.Local (incomplete) search algorithms that were proposed for solving DCOP MSTwere found to converge fast to a deployment with high quality in coverage. How-ever, Max-sum, which is an inference algorithm that has become popular in re-cent years and has been applied to a number of realistic applications was not usedwithin the DCOP MST framework.In this paper we describe the modifications needed to adapt the Max-sum algo-rithm to the DCOP MST framework. Specifically, the application of techniquesgeared towards reducing computation Max-sum preforms in each iteration. Wecompare Max-sum with the existing local search algorithms for solving DCOP MST.Our results show that the Max-sum algorithm outperforms standard DCOP algo-rithms but is inferior to the local search algorithms that were specifically designedto solve DCOP MST. Furthermore, the complexity of Max-sum prevents it fromsolving large and dense problems, and to improve if technology offers larger sens-ing and mobility ranges.

1 Introduction

Some of the most challenging applications of multi-agent systems include a team ofmobile agents with sensing abilities that are required to cover a given area to achieve acommon goal. Various examples are a network of sensors tracking enemy targets, res-cue teams searching for survivors in a disaster area, and teams of unmanned vehicles(UVs) searching an unfamiliar terrain. A common feature of the above applications isthat the agents need to perform in a dynamic environment. Zivan et. al. [32] proposed amodel and corresponding algorithms that make a team of mobile sensing agents robustto changes in the problems they face - DCOP MST. The DCOP MST model is an ex-tension of the well-known distributed constraint optimization problem (DCOP) modelin which agents adjust their location in order to adapt to the dynamically changing en-vironment and the dynamic changes in the quality of information reported by sensorsin the team.

Local distributed search algorithms were proposed for solving DCOP MST [32].These distributed self-adjusting algorithms were based on the Maximum Gain Messages

75

(MGM) algorithm [12, 16]and the Distributed Stochastic Algorithm (DSA) [28]. Bothare distributed (message passing) local search algorithms with quick convergence, anessential property in a dynamic environment.

While the naive implementations of MGM and DSA for DCOP MST were found tobe dependent on the ranges of sensing and mobility that the technology of the sensorsprovides, specifically designed algorithms with advanced exploration methods werefound to perform well even with limited technology. However, the local search algo-rithms proposed in [32] were compared only with other local search algorithms. Theywere not compared with the popular inference algorithm, Max-sum, which has beenapplied to sensor network applications in recent studies [22, 21].

Max-sum is an incomplete algorithm that does not follow the standard structureof distributed local search algorithms and has drawn much attention recently [6, 19].Max-sum is an incomplete GDL algorithm [1]. In contrast to standard local search al-gorithms, agents in Max-sum do not propagate assignments but rather calculate utilities(or costs) for each possible value assignment of their neighboring agents’ variables.The general structure of the algorithm is exploitive, i.e., the agents attempt to com-pute the best costs/utilities for possible value assignments according to their own prob-lem data and recent information they received via messages from their neighbors. Thegrowing interest in the Max-sum algorithm in recent years included its use for solvingDCOPs representing various multi agent applications, e.g., sensor systems [22, 21] andtask allocation for rescue teams in disaster areas [18]. While it was found that Max-sum does not produce high quality solutions on dense problems that include cycles ofvarious sizes [31], in realistic structured scenarios as mentioned above, it seems thatMax-sum produces high quality solutions. However, this solution quality comes with aprice. Many realistic scenarios include k-ary constraints that slow the performance ofMax-sum, since the computation in each iteration performed in Max-sum is exponen-tial in the number of agents involved in a constraint [6]. While a number of techniqueswere proposed in order to reduce the complexity in scenarios that include k-ary con-straints [22], the complexity remains exponential in k and thus, the use of Max-sumis limited to either very small problems or to problems with limited constraint arity.The rest of this paper is organized as follows: Related work is presented in Section 2.Standard DCOPs are presented in Section 3.2. The DCOP MST model is presented inSection 3. Section 4 deals with the adaptation of MaxSum to the DCOP MST modelSection 5 includes an evaluation of the MaxSum algorithm and a detailed analysis of itsperformance. Conclusions are presented in Section 6.

In this paper we extend the work on DCOP MST by adjusting the Max-sum al-gorithm to the DCOP MST model and comparing its performance to the former pro-posed local search algorithms. Besides applying the complexity reduction techniquesproposed in [22], to reduce domain sizes by a preprocessing procedure and to performa branch and bound algorithm when searching for the best costs a function node canprovide for each value in the domain of its neighboring variable nodes, we demonstratethat in fact, the effective domain size a function node needs to consider when Max-sumis applied to DCOP MST is two. This is because the only information required for thefunction node (the target) to compute the endured cost is whether the variable (the sen-

76

sor) covers it or not. Thus, it is enough to select the covering and non covering valueswith maximal cost.

We demonstrate that even with domain size of two, the exponential computation ofMax-sum function node still limits the technology it can use. The comparison to localsearch algorithms reveals that Max-sum outperforms standard local search algorithms,but is inferior when compared to the algorithms proposed in [32].

2 Related Work

DCOP is a general model for distributed problem solving that has generated significantinterest from researchers [12, 15, 17, 28, 4, 8]. A number of studies on DCOPs presentedcomplete algorithms [15, 17, 20, 7]. However, since DCOPs are NP-hard, there has beena growing interest in the last few years in local (incomplete) DCOP algorithms [16, 28,30, 24, 25]. Although local search algorithms do not guarantee that the obtained solutionis optimal, they are applicable for large problems and are compatible with real timeapplications. We note that the study of DCOP algorithms in dynamic environments isin a preliminary stage [13, 10]. In [13] two distributed constraint satisfaction (DisCSP)algorithms were adjusted to solve dynamic problems. This pioneering work was the firstto evaluate algorithms according to their performance through time and not only afterconvergence. On the other hand, the problems on which the algorithms were comparedwere three coloring problems that included dynamic constraints but no other dynamicelements. Zivan et. al. [32] addressed distributed optimization problems that includemore realistic dynamic elements.

A number of papers considered DCOP for solving static sensor networks. Someexamples are [3, 24]. In [9, 23], the performance of DCOP local search algorithms whenthe reward function is uncertain is investigated. This property is related to mobile sensornets when agents do not know the reward of taking a position (assignment) before theyactually take it. In [9], experiments in which DCOP algorithms were used to solve arealistic problem of robots seeking to maximize radio signals were presented.

An alternative approach to solving the mobile sensor placement problem was pre-sented by Stranders et al. [22, 21]. Their approach stems from the Max-sum algorithm.Max-sum, however, is an incomplete inference algorithm and not a search algorithm.The assignment selection is not a part of the algorithm, and assignment selections arenot propagated to other agents. In DCOP MST, selection and propagation of assign-ments are used to determine the content of domains and the constraint network, andtherefore must be part of the solving algorithm. In this paper we added assignment se-lection and propagation to Max-sum, and as a result we needed to make changes inthe factor graph with each assignment selection in order to represent the dynamics ofDCOP MST.

3 Problem Statement

In this section we formalize the problem confronting mobile sensor teams, then describethe conventional DCOP representation and present our novel DCOP MST formulation.

77

Fig. 1. DCOP MST example.

A simple example problem that will serve to illustrate the different aspects of the modelis depicted in Figure 1.

3.1 Mobile Sensor Teams

The agents A = {A1, A2, . . . , An} in a mobile sensor team are physically situatedin the environment, modeled as a metric space with distance function d. The currentposition of agent Ai is denoted by cur posi. We assume that the the locations thatcan be occupied by agents are a set of discrete points that form a subset of the totalenvironment. These points can either be a discretization of the underlying space orlocations that dominate other nearby points in terms of the sensing quality they affordagents located there. In Figure 1, the environment is the Euclidean plane, agents aredepicted by small robots, and the possible locations are shown by “X”s.

We assume that time is discretized so that agents compute movements betweenpossible positions. The maximum distance that Ai can travel in a single time step isits mobility range MRi. The mobility range of each agent is shown in Figure 1 by thefainter, outer circle centered on the agent. All “X”s within the circle are locations thatthe agent can move to in a single time step.

Agents are only able to effectively sense targets within a limited sensing range.Agents may be equipped with different kinds of sensors, resulting in heterogeneoussensing ranges; the sensing range of agent Ai is denoted by SRi. Because of the sensingrange constraint, each agent Ai can observe all targets within a distance SRi fromcur posi, and cannot observe any target that is farther away. The sensing ranges aredepicted in Figure 1 by the inner circle centered at each agent.

Agents may also differ in the quality of their sensing abilities, a property termedtheir credibility. The credibility of agent Ai is denoted by the positive real numberCred i, with higher values indicating better sensing ability. We assume that Cred i isexogenously provided (for instance, calculated by a reputation model) and accurately

78

represents the agent’s sensing ability; dealing with inaccurate scores is of interest butbeyond the scope of the work. An agent’s credibility changes over time due to sensorfailures, environmental conditions, and movement of the agent. The credibility of eachagent is shown in Figure 1 by the number on the respective sensing range circles.

The individual credibilities of agents sensing the same target are combined usinga joint credibility function F : 2A → R, where 2A denotes the power set of A. Werequire that F be monotonic so that additional sensing agents can only improve thejoint credibility. Formally, for two sets S, S′ ⊆ A with S ⊆ S′, we require that F (S) ≤F (S′).

The targets are represented implicitly by the environmental requirement functionER which maps each point in the environment to a non-negative real number represent-ing the joint credibility required for that point to be adequately sensed. In this repre-sentation, targets are the points p with ER(p) > 0. Because targets may arise, move,disappear, ER changes dynamically. Moreover, ER can change as the agent team be-comes aware of new targets. A major aspect of the mobile sensing team problem is toexplore the environment sufficiently to be aware of the presence of targets. In the ex-ample presented in Figure 1 there are a number of targets (red/dark points) and theirnumbers represent their ER values.

Agents within sensing range of a target are said to cover the target and the remainingcoverage requirement of the target, denoted Cur REQ , is the environmental require-ment diminished by the joint credibility of the agents currently covering the target, witha minimum value of 0. Denoting the set of agents within sensing range of a point p bySR(p) = {Ai ∈ A[d(p, cur posi) ≤ SRi]}, this is formalized as

Cur REQ(p) = max{0,ER(p) F (SR(p))},

where : R × R → R is a binary operator (written in infix notation) that decreasesthe environmental requirement by the joint credibility. For x, y, z ∈ R with y > z, werequire that xy < xz, so that decreasing the environmental requirement by a higherjoint credibility results in a lower remaining coverage requirement.

The global goal of the agents is to position themselves in order to minimize thevalues of Cur REQ for all targets. In some cases it may be possible to reduce thevalues of Cur REQ to zero for all targets indicating perfect coverage. However inother cases this may not be possible, either because of insufficient numbers or qualityof agents, or because of the choice of F and. For these cases we consider two specificobjectives. The first is to minimize the sum of remaining coverage requirements for alltargets, while the second is to minimize the maximum remaining coverage requirementover all targets. Minimizing either of these objectives is NP-hard [26].

Examples We now consider three specific choices of F and for different applica-tions. The sum joint credibility function simply sums the individual credibility of agents:

Fsum(S) =∑Ai∈S

Cred i

This can be used to model applications of tracking targets with simple sensors capableof determining distance but not direction. This requires readings from three different

79

agents to triangulate the position of a target, represented by ER(p) = 3 for each target p,binary credibilities of 1 for functioning sensors and 0 for non-functioning sensors, andchoosing to be the standard subtraction operator. The sum joint credibility functioncan also be used to protect against sensor failure with ER(p) being the desired levelof redundancy for target p. This approach can be crudely extended to sensors that mayhave different failure rates, as reflected by non-binary credibilities.

A more nuanced approach for robustness models sensor failures probabilisticallyand seeks to guarantee that each target is covered by a working sensor with some mini-mum target-specific probability. In this case, Cred i is the probability that the sensor onAi will fail and ER(p) is the minimum desired probability that target p is covered byat least one working sensor. This is represented with the complementary probabilisticjoint credibility function

Fcprob(S) = 1−∏

Ai∈S(1− Cred i)

to compute the probability that at least working one sensor covers the target and choos-ing to be the subtraction operator.

A related approach can be used for applications that detect sporadic events withsensors that may give false negatives. The goal is to minimize the probabilities thatevents occur without being detected. In this case Cred i is the probability of an accuratereading from Ai and ER(p) is the probability that the event occurs at p. This is modeledusing the probabilistic joint credibility function

Fprob(S) =∏

Ai∈S(1− Cred i)

to compute the probability that no sensor detects an event, and choosing to be themultiplication operator so that Cur REQ(p) is the probability that an event occurs atp without being detected.

3.2 Distributed Constraint Optimization

Distributed constraint optimization is a general formulation of multiagent coordinationproblems that has previously been used for static sensor networks and many other appli-cations. A distributed constraint optimization problem (DCOP) is a tuple 〈A,X ,D,R〉where A = {A1, A2, . . . , An} is finite set of agents , X = {X1, X2, . . . , Xm} is afinite set of variables, D = {D1, D2, . . . , Dm} is the set of finite domains for the vari-ables, andR is a finite set of relations, also called constraints. Each variable Xi is held(or owned) by an agent who chooses a value to assign it from the finite set of val-ues Di; each agent may hold multiple variables. Each constraint C ∈ R is a functionC : Di1 ×Di2 × . . .×Dik → R+ ∪ {0} that maps assignments of a subset of the vari-ables to a non-negative cost. The cost of a full assignment of values to all variables iscomputed by aggregating the costs of all constraints. Addition is the aggregation oper-ator most commonly considered so that the total cost is the sum of the constraint costs,but other operators, such as the maximum, have also been considered [14]. The goal ofa DCOP is to find a full assignment with minimum cost.

80

Control in DCOPs is distributed, with agents only able to assign values to variablesthat they hold. Furthermore, agents are assumed to know only of the constraints in-volving variables that they hold, thereby distributing knowledge of the structure of theDCOP. In order to coordinate, agents must communicate via message passing. Agentscan only communicate with agents who hold variables constrained with their own vari-ables, called their neighbors. While transmission of messages may be delayed, it isassumed that messages are received in the order that they were sent.

3.3 The DCOP MST Model

The DCOP MST model is a dynamic DCOP formulation that exploits the structureof mobile sensor team problems without requiring an explicit model of the dynamics.Instead, the agents consider local changes in their position, and react to changes as theyoccur. The key innovation comes from introducing a type of dynamism not found in theconventional DCOP formalism: dynamic domains. Each agent Ai holds a variable forits position but with a dynamic domain of all locations within MRi of cur posi; as theagent moves locations, the domain changes.

Dynamic domains induce a change in the constraints. Because of the restricted do-mains, not all variables can take values within sensing range of all targets, and hencethe constraints need no longer by n-ary. Instead, the constraint Cp for a target p onlyinvolves those agents Ai whose domains include a location within SRi of p. As thedomains change, the constraints change as well.

A consequence of this is that the set of neighbors for each agent is no longer thefull team and it changes over time as the agents move. In DCOP MST, two agents areconsidered neighbors if their sensing areas overlap after they both move as much aspossible in a single time step toward each other. Denoting the set of neighbors of Ai

by cur nei i, we formalize this by cur nei i = {Aj [d(cur posi, cur posj) ≤ MRi +MRj + SRi + SRj ]}. Because agents can only communicate with their neighbors,agents in DCOP MST can only communicate with other agents who are physicallynearby, which is a more realistic formulation than the conventional DCOP formulation.As with domains and constraints, neighborhoods in DCOP MST are dynamic.

3.4 Adjusting standard local search algorithms for DCOP MST

For lack of space we do not include a detailed description of the implementation of DSAand MGM in DCOP MST.1 We note that standard local search algorithms as DSA andMGM require that agents will be able to compute the best alternative assignment fortheir variable. In DCOP MST, the best alternative is a position that allows the coverageof the targets with the highest current coverage requirement in range.It is also importantto mention that as self adjusting algorithms, the algorithms should run infinitely, i.e.,after the algorithm converges to a solution it remains active in order to be sensitive tochanges [5].

Besides the different method for detecting the best alternative assignment, the majoradjustment of the algorithms to the DCOP MST model is to consider a different domain

1 A detailed description can be found at [32].

81

and a different set of neighbors in each iteration. However, because of the myopic natureof these algorithms this challenge is mostly technical.

3.5 Exploration methods

Classic local search combines exploitation methods in order to converge to local optimawith exploration methods in order to escape them [29]. The adjusted MGM MST algo-rithm is strictly exploitive (monotone). It benefits from quick convergence and avoidscostly moves by the sensors. However, once a target is beyond the agents’ ranges it re-mains uncovered. DSA MST has an inherent exploration element. However, agents donot consider assignments that reduce their local benefit and thus, the inherent level ofexploration in DSA is not enough to escape local optima in MST problems.

We note that algorithms that implement exploration methods were proposed forstandard DCOPs [12, 28, 16]. However, some of the methods that are most effective instandard DCOP as K-optimality [12, 16] and the Anytime framework proposed in [30]are not effective for DCOP MST (see [32]).

In order to explore the area for new targets while maintaining coverage of targetsthat were previously detected, three simple but powerful exploration methods wereproposed in [32]. Two are combined with the MGM MST algorithm and one withDSA MST. These three methods change the parameters of the algorithm temporarilyin order to escape local minima. This approach was found successful for local search inDisCSPs [2].

1. Periodic Double Mobility Range (PDMR) simply allows an agent to consider pointswithin a larger (double) range than their MR for a small number of iterations. Thismethod assumes that a wider range is possible even though the iteration will takelonger. Therefore, the agents consider a wider range only in part of the algorithm’siterations, which repeat periodically (in our experiments, for example, for two iter-ations out of every five we used 2 ·MR instead of MR).

2. Periodic Incremented Largest Reduction (PILR) allows agents in some iterationsto move to a position that results in an increase of the Cur REQ function up toa constant bound c. More specifically, line 8 of the algorithm is changed in theseiterations to:8. if (LR+ c > 0)Again, this reduced condition is only temporary and is applied periodically. Thiswould mean that for a small number of iterations the importance (coverage require-ment) of targets in the area is reduced. Notice that the c parameter defines by howmuch they are reduced and thus avoids abandoning targets.

3. DSA PILR is similar to PILR, only here the same approach of periodic reducedcondition is implemented within the DSA algorithm and not within MGM. Morespecifically, in the iterations where the condition is reduced, the algorithm performsmoves even if the reduction is negative up to c.

In all of three methods, agents are not expected to leave targets with high impor-tance in order to search for new targets. This is obvious in PMDR since, as in the caseof MGM MST, only moves that result in a gain are performed. In the case of PILR

82

and DSA PILR, the c parameter defines the reduced importance of the targets that arealready covered. Thus, c is a bound on the increase to the Cur REQ function that themethod can create by a single move.

4 Applying Max-sum to DCOP MST

In contrast to standard local search algorithms, agents in Max-sum do not propagateassignments but rather calculate utilities (or costs) for each possible value assignmentof their neighboring agents’ variables.2

While assignment selections are not a part of the Max-sum algorithm, the selectionof assignments is an inherent element in DCOP MST and it directly affects the structureof the constraints netrwork. In other words, if we try to apply Max-sum to DCOP MST,with each assignment selection we may get a different factor graph. Thus, we applyMax-sum to DCOP MST by:

1. Selecting a random assignment (as in any local search algorithm).2. Generating a factor graph according to the current assignment where each sensor is

a variable node in the factor graph and each target is a function node in the graph.Each variable node is connected by an edge to a function node iff the distancebetween them is less or equal to the sum of its mobility range and its sensing range(MRi + SRi), i.e., the sensor can cover the target after a single move.

3. The agents perform the Max-sum algorithm for a predefined number of iterations.4. The sensors move to the best position (value assignment) found for them by the

algorithm.5. A new factor graph is generated according to the new assignment selection and the

process repeats itself.

We note that in general we could have allowed the algorithm to consider two or moresteps in each iteration (double mobility range). However, this would result in a densegraph, i.e., more sensors in the range of each target, and would make the complexity ofthe calculations made in each iteration infeasible.

The number of iterations of the Max-sum algorithm, which are performed beforean assignment (position) selection, must be selected with care. On one hand, we wouldlike to allow the information regarding the coverage capabilities of sensors to propagateto other sensors. On the other hand, selecting a large number of iterations can cause adeterioration in the quality of the solution as a result of cycles as reported in [6, 31].Furthermore, these iterations of Max-sum constitute only a single iteration in the globalmulti iteration deployment algorithm, thus, we do not want to generate unnecessarydelays. In our experiments we found that Max-sum converges very fast and thus, asmall number of iterations (5) was enough to get the best performance.

The complexity bottleneck of Max-sum is the generation of messages by the function-nodes (targets in the case of DCOP MST). This complexity is known to be exponentialin the number of agents where, in standard Max-sum, the base in the power formula is

2 For lack of space we do not describe Max-sum. The reader is referred to the following papersfor a description of the algorithm [6, 31].

83

the domain size and the exponent is the number of variables involved in the function (thedegree of the constraint). A number of papers proposed techniques to reduce the com-plexity of the calculation required for the generation of messages by the function-nodesin Max-sum [22, 11]. We implemented all of the proposed methods in Max-sum MST,the version of Max-sum we adjusted to DCOP MST.

The first technique we implemented was the preprocessing method proposed in [22]to reduce the size of the variables’ domains by detecting and eliminating values that aredominated by others. The technique works as follows:

1. A lower bound (LB) and upper bound (UB) on the utility that can be assigned to avalue assignment, are calculated for each value assignment in the domain.

2. All dominated assignments are removed from the domain. A dominated value as-signment (position in DCOP MST) is one whose UB is lower than the maximal LBin the domain (in a maximization problem).

3. The revised domains are propagated to the functions that recalculate the costs ac-cordingly and propagate them further.

4. This process is repeated until no more eliminations can be made.

In our experiments, this preprocessing method pruned between 25% to 30% of thevalue assignments.

The second technique we applied was proposed in [11].3 It reduces the effective sizeof domains that is used to calculate the function values to two. This is done by iden-tifying that the only information used for every value assignment in order to calculatethe function value is whether the sensor covers the target at this position or not. Thus,by identifying and considering the value assignment with the largest utility among thecovering positions of a sensor and the value assignment with the largest utility amongthe non covering positions we avoid making redundant calculations.

Thus, if the original domain size was d, the number of neighbors of a target is kand the first technique reduced 30% off d, then the complexity for calculating a singlemessage for the target is:(k − 1)0.7d+ 0.7d · 2k−1.

The last technique implemented was the use of a branch and bound search in orderto calculate the utility for a value assignment of a neighboring sensor instead of a naivesearch through all possible combinations of the dual domains of the other sensors.

It is important to notice that all the above proposed techniques do not reduce theexponent of the complexity and thus, the effective number of neighboring sensors thata target can have is limited (and small).

5 Experimental evaluation

We compared Max-sum MST with the DCOP MST algorithms proposed in [32] byusing a simulator representing a mobile sensing agent’s team problem. The problemsimulated is of an area in which the possible positions are a m over m grid. Each ofthe points in the area has an ER value between 0 and 100. The mobility and sensing

3 This method is also known as: “Fast Max-sum”

84

100*100, 50 agents, 20 targets SR=3 MR=3,4,5,6,7

Running times for larger values exceeded 24 hours.

100*100, 50 agents, 20 targets , MR=3, SR=3,4,5,6,7

The times it took to run 10 experiments is given in the following table

SR=3 MR=3 MR=4 MR=5 MR=6 MR=7

MaxSum 39 145 504 2060 16290

MGM 8 10 13 15 17

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

2000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Co

vera

ge D

iffe

ren

ce

Iterations

Max-sum algorithm SumDiff for different Mobility and Sensing ranges 100*100, 50 agents, 20 targets,MR= 3 SR=3,4,5,6,7

3

4

5

6

7

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Co

vera

ge D

iffe

ren

ce

Iterations


3

4

5

6

7

Fig. 2. Sum of coverage differences for all targets, as a function of the number of iterations fordifferent sensors’ mobility ranges.

ranges were given in terms of distance on the grid and were varied in our experimentsto demonstrate their effect on the success of the algorithms. The credibility of an agentcould vary between zero (for an agent with no credibility) and 100 (for an agent withmaximal credibility). The method for calculating the joint coverage of agents within thesensing range of a target is a standard sum of the agents’ credibility and the operatorwas a standard minus, i.e.: Cur REQ(p) = max{0, ER(p)−

∑Ai∈SR(p)

Credi}

The credibility variable in the experiments for the agents was initially set to 30; .These values were chosen so targets with maximal importance (100) will require thecooperation of multiple agents. In addition, this setup allows complete coverage (i.e.,Cur REQ = 0) in the optimal case and thus we can evaluate the success of the pro-posed algorithms relative to the optimum.

The reputation model used in our experiments was inspired by SPORAS [27]. Asin SPORAS, all agents are initiated with similar credibility (or “reputation value” [27])4 and the effect of the events on the credibility of agents is with respect to their currentlevel of credibility.

All results depicted in this section are an average over 50 runs of the algorithmsolving 50 different random problems. The first experiment investigated the relationbetween the performance of Max-sum and the sensing and mobility ranges of agents(i.e., technology limitations). This experiment included 50 agents (sensors) and 20 tar-gets that were deployed randomly in a 100*100 grid. Each target had a significance of100. Therefore, the sum of all uncovered targets is at most 2000.

In [32] experiments with the same settings, in which the sensing range (SR) was setto 10 and the mobility range (MR) was varied between 3 and 15, demonstrated that thesimplest myopic algorithm (MGM) is enough for achieving high quality coverage whenthe ranges are high enough. Similar results were obtained for MR = 10 and SR between3 and 15.

4 In contrast to SPORAS, the initial credibility is not zero since in MSTs we are not concernedwith agents using different pseudonyms.

85

100*100, 50 agents, 20 targets SR=3 MR=3,4,5,6,7

Running times for larger values exceeded 24 hours.

100*100, 50 agents, 20 targets , MR=3, SR=3,4,5,6,7

The times it took to run 10 experiments is given in the following table

SR=3 MR=3 MR=4 MR=5 MR=6 MR=7

MaxSum 39 145 504 2060 16290

MGM 8 10 13 15 17

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

2000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Co

vera

ge D

iffe

ren

ce

Iterations


3

4

5

6

7

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Co

vera

ge D

iffe

ren

ce

Iterations


3

4

5

6

7

Fig. 3. Sum of coverage differences for all targets, as a function of the number of iterations fordifferent sensors’ sensing ranges.

0

200

400

600

800

1000

1200

1400

1600

1800

0 5 10 15 20 25

Co

vera

ge D

iffe

ren

ce

Iterations

7 algo 100*100 , 50 A, 20 T, SR=MR=5 , 5 MSR

DBA

DSA

MGM

MAX_SUM

PILR

PDMR

DSA_PILR

150

200

250

300

350

400

450

500

0 5 10 15 20 25

Co

vera

ge D

iffe

ren

ce

Iterations

7 algo 20*20 , 5 A, 5 T, SR=MR=3 , 5 MSR

DBA

DSA

MGM

MAX_SUM

PILR

DSA_PILR

PDMR

Optimal

Fig. 4. Sum of coverage differences for all targets, as a function of the number of iterations fordifferent algorithms.

When we tried to repeat this experiment using Max-sum we could not complete itsince the large sensing and mobility ranges generated a large number of neighbors foreach target and the exponential calculation required in each iteration failed to completein a reasonable time. Thus, in order to demonstrate the effect of increasing ranges onthe Max-sum algorithm we had to select much smaller ranges.

Figure 2 presents the sum of coverage differences for all targets in the area achievedby Max-sum MST when SR = 3 and MR is varied between 3 and 7. It is notablethat there is a constant improvement in the results with the increase in mobility range.However, the results are far from the high quality results reported for MGM in [32].We assume that if Max-sum could have run with larger ranges it would produce highquality results as well. However, we were not able to complete the experiments withlarger ranges.

Similar results are presented for varying sensing ranges in Figure 3. Here MR wasset to 3 and SR varied between 3 and 7.

Figure 4 presents a comparison between Max-sum MST, the standard local searchDCOP algorithms that were adjusted to DCOP MST in [32], the explorative algorithms

86

0

200

400

600

800

1000

1200

1400

1600

1800

0 5 10 15 20 25

Co

vera

ge D

iffe

ren

ce

Iterations

7 algo 100*100 , 50 A, 20 T, SR=MR=5 , 5 MSR

DBA

DSA

MGM

MAX_SUM

PILR

PDMR

DSA_PILR

150

200

250

300

350

400

450

500

0 5 10 15 20 25

Co

vera

ge D

iffe

ren

ce

Iterations

7 algo 20*20 , 5 A, 5 T, SR=MR=3 , 5 MSR

DBA

DSA

MGM

MAX_SUM

PILR

DSA_PILR

PDMR

Optimal

Fig. 5. Sum of coverage differences for all targets, as a function of the number of iterations fordifferent algorithms.

MR=3 SR=3 SR=4 SR=5 SR=6 SR=7

MaxSum 47 269 800 2493 15508

MGM 8 11 19 32 51

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

3 4 5 6 7

Seco

nd

s

Mobility Range

Run time for 10 experiments 100*100 grid, 50 Agents, 20 Targets, SR=3

MaxSum

MGM

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

3 4 5 6 7

Seco

nd

s

Sensing Range

Run time for 10 experiments 100*100 grid, 50 Agents, 20 Targets, MR=3

MaxSum

MGM

Fig. 6. Seconds needed to complete an experiment as a function of the sensors’ mobility range.

proposed in [32] and the optimal solution. The scenario selected was small enough tosolve both Max-sum and to produce the optimal solution using exhaustive search.

The results demonstrate the advantage of Max-sum MST as the algorithm that con-verges fastest.5 Its final result is also better than standard DCOP algorithms. However,it is inferior in comparison with the exploration methods proposed in [32].

Similar results are presented in Figure 5. Here the problem included 50 sensorswith SR = MR = 5 and the area was simulated by 100 over 100 grid. It is apparentthat Max-sum performs better than other standard DCOP algorithms, both in its fastconvergence and in the final result. However, there is a large difference in the coveragedifferences between the local search algorithms with enhanced exploration and all thestandard DCOP algorithms including Max-sum.

Figure 6 demonstrates the exponential growth of runtime as a function of the sen-sors’ mobility range. In this set of experiments all sensors had SR = 3 and the MR wasas specified in the figure. It is apparent that while the runtime needed to complete andexperiment with Max-sum grows exponentially, the increase in time when using MGMis negligible. Similar results were obtained for fixed MR and an increasing SR.

5 In fact, Max-sum converges in the smallest number of iterations, but each iteration takes muchmore time to compute than in standard local search.

87

6 Summary and Conclusions

The DCOP MST model represents dynamic applications of a team of mobile sensingagents that is expected to be robust to changes in the environment in which the sensorsoperate, changes in the team’s tasks, and technology failures.

In previous work local search algorithms were proposed for DCOP MST. In thispaper we extended the study of the model by adjusting the Max-sum algorithm to it.While Max-sum has complexity limitations, some of them were identified in previouswork and various techniques exist in order to reduce its complexity.

Our results demonstrate that while Max-sum performs well in terms of solutionquality and speed of convergence it is limited in the size of the problem it can be ap-plied to, and in the degree of the constraints, i.e., limited sensing and mobility ranges.On small problems it outperformed standard DCOP local search algorithms but was out-performed by explorative local search algorithms specifically designed for DCOP MST.

References

1. S. M. Aji and R. J. McEliece. The generalized distributive law. IEEE Transactions onInformation Theory, 46(2):325–343, 2000.

2. M. Basharu, I. Arana, and H. Ahriz. Solving coarse-grained discsps with multi-dispel anddisbo-wd. In IAT ’07: Proceedings of the 2007 IEEE/WIC/ACM International Conferenceon Intelligent Agent Technology, pages 335–341, Washington, DC, USA, 2007.

3. R. Bejar, C. Domshlak, C. Fernandez, K. Gomes, B. Krishnamachari, B. Selman, andM. Valls. Sensor networks and distributed csp: communication, computation and complexity.Artificial Intelligence, 161:1-2:117–148, January 2005.

4. Anton Chechetka and Katia Sycara. No-commitment branch and bound search for distributedconstraint optimization. In AAMAS ’06: Proceedings of the fifth international joint confer-ence on Autonomous agents and multiagent systems, pages 1427–1429, New York, NY, USA,2006.

5. Z. Collin, R. Dechter, and S. Katz. Self-stabilizing distributed constraint satisfaction.Chicago Journal of Theoretical Computer Science, 5, 1999.

6. A. Farinelli, A. Rogers, A. Petcu, and N. R. Jennings. Decentralised coordination of low-power embedded devices using the max-sum algorithm. In In: 7 th International Conferenceon Autonomous Agents and Multi-Agent Systems (AAMAS-08, pages 639–646, 2008.

7. A. Gershman, A. Meisels, and R. Zivan. Asynchronous forward-bounding for distributedconstraints optimization. In Proc. ECAI-06, pages 103–107, August 2006.

8. A. Gershman, A. Meisels, and R. Zivan. Asynchronous forward bounding. J. of ArtificialIntelligence Research, 34:25–46, 2009.

9. M. Jain, M. E. Taylor, M. Yokoo, and M. Tambe. Dcops meet the real world: Exploringunknown reward matrices with applications to mobile sensor networks. In Proc. Twenty-firstInternational Joint Conferenceon Artificial Intelligence (IJCAI-09), Pasadena, CA, USA,July 2009.

10. R. N. Lass, E. A. Sultanik, and W. C. Regli. Dynamic distributed constraint reasoning. InAAAI, pages 1466–1469, Chicago, IL, USA, 2008.

11. K. S. Macarthur, R. Stranders, S. D. Ramchurn, and N. R. Jennings. A distributed anytimealgorithm for dynamic task allocation in multi-agent systems. In AAAI, 2011.

12. R. T. Maheswaran, J. P. Pearce, and M. Tambe. Distributed algorithms for dcop: A graphical-game-based approach. In Proc. Parallel and Distributed Computing Systems PDCS), pages432–439, September 2004.

88

13. R. Mailer. Comparing two approaches to dynamic, distributed constraint satisfaction. InAAMAS, pages 1049–1056, Utrect, Netherlands, 2005.

14. J. Modi, W. Shen, M. Tambe, and M. Yokoo. An asynchronous complete method for dis-tributed constraintsoptimization. In Proc. Suton. Agents and Multi-agent Sys., 2003.

15. P. J. Modi, W. Shen, M. Tambe, and M. Yokoo. Adopt: asynchronous distributed constraintsoptimizationwith quality guarantees. Artificial Intelligence, 161:1-2:149–180, January 2005.

16. J. P. Pearce and M. Tambe. Quality guarantees on k-optimal solutions for distributed con-straint optimization problems. In IJCAI, Hyderabad, India, January 2007.

17. A. Petcu and B. Faltings. A scalable method for multiagent constraint optimization. InIJCAI, pages 266–271, 2005.

18. S. D. Ramchurn, A. Farinelli, K. S. Macarthur, and N. R. Jennings. Decentralized coordina-tion in robocup rescue. Comput. J., 53(9):1447–1461, 2010.

19. A. Rogers, A. Farinelli, R. Stranders, and N. R. Jennings. Bounded approximate decen-tralised coordination via the max-sum algorithm. Artif. Intell., 175(2):730–759, 2011.

20. M. C. Silaghi and M. Yokoo. Nogood based asynchronous distributed optimization (adoptng). In AAMAS, pages 1389–1396, 2006.

21. R. Stranders, F. M. Delle-Fave, A. Rogers, and N. R. Jennings. A decentralised coordinationalgorithm for mobile sensors. In AAAI, 2010.

22. R. Stranders, A. Farinelli, A. Rogers, and N. R. Jennings. Decentralised coordination ofmobile sensors using the max-sum algorithm. In IJCAI, pages 299–304, 2009.

23. M. E. Taylor, M. Jain, Y. Jin, M. Yokoo, and M. Tambe. When should there be a ”me” in”team”?: distributed multi-agent optimization under uncertainty. In Proc. of the 9th confer-ence on Autonomous Agents and Multi Agent Systems (AAMAS 2010), pages 109–116, May2010.

24. W. T. L. Teacy, A. Farinelli, N. J. Grabham, P. Padhy, A. Rogers, and N. R. Jennings. Max-sum decentralised coordination for sensor systems. In AAMAS ’08: Proceedings of the 7thinternational joint conference on Autonomous agents and multiagent systems, pages 1697–1698. International Foundation for Autonomous Agents and Multiagent Systems, 2008.

25. X. Sun W. Yeoh and S. Koenig. Trading off solution quality for faster computation in dcopsearch algorithms. In Proceedings of the International Joint Conference on Artificial Intelli-gence (IJCAI), pages 354–360, July 2009.

26. Guiling Wang, Guohong Cao, Piotr Berman, and Thomas F. Laporta. A bidding protocol fordeploying mobile sensors. In in Proceedings of IEEE ICNP, pages 315–324, 2003.

27. G. Zacharia, R. Moukas, and P. Maes. Collaborative reputation mechanisms in electronicmarketplaces. In HICSS, 1999.

28. W. Zhang, Z. Xing, G. Wang, and L. Wittenburg. Distributed stochastic search and dis-tributed breakout: properties, comparishon and applications to constraints optimization prob-lems in sensor networks. Artificial Intelligence, 161:1-2:55–88, January 2005.

29. Shlomo Zilberstein. Using anytime algorithms in intelligent systems. AI Magazine,17(3):73–83, 1996.

30. R. Zivan. Anytime local search for distributed constraint optimization. In AAAI, pages 393–398, Chicago, IL, USA, 2008.

31. R. Zivan and H. Peled. Max/min-sum distributed constraint optimization through value prop-agation on an alternating dag. In Proceedings of the 11th International Conference on Au-tonomous Agents and Multiagent Systems, AAMAS ’12, pages 265–272, 2012.

32. Roie Zivan, Robin Glinton, and Katia Sycara. Distributed constraint optimization for largeteams of mobile sensing agents. In International Joint Conference on Web Intelligence andIntelligent Agent Technology, pages 347–354, 2009.

89

Documents

New Mexico State Universitywyeoh/DCR2013/docs/dcr... · Preface The Distributed Constraint Reasoning (DCR) workshop, now in its fourteenth edition, continues a long sequence of meetings