[PDF] AED: An Anytime Evolutionary DCOP Algorithm

Abstract

Evolutionary optimization is a generic population-based metaheuristic that can be adapted to solve a wide variety of optimization problems and has proven very effective for combinatorial optimization problems. However, the potential of this metaheuristic has not been utilized in Distributed Constraint Optimization Problems (DCOPs), a well-known class of combinatorial optimization problems prevalent in Multi-Agent Systems. In this paper, we present a novel population-based algorithm, Anytime Evolutionary DCOP (AED), that uses evolutionary optimization to solve DCOPs. In AED, the agents cooperatively construct an initial set of random solutions and gradually improve them through a new mechanism that considers an optimistic approximation of local benefits. Moreover, we present a new anytime update mechanism for AED that identifies the best among a distributed set of candidate solutions and notifies all the agents when a new best is found. In our theoretical analysis, we prove that AED is anytime. Finally, we present empirical results indicating AED outperforms the state-of-the-art DCOP algorithms in terms of solution quality.

Full PDF

AAED: An Anytime Evolutionary DCOP Algorithm

Saaduddin Mahmud

Department of Computer Science andEngineering, University of [email protected]

Moumita Choudhury

Department of Computer Science andEngineering, University of [email protected]

Md. Mosaddek Khan

Department of Computer Science andEngineering, University of [email protected]

Long Tran-Thanh

School of Electronics and ComputerScience, University of [email protected]

Nicholas R. Jennings

Departments of Computing andElectrical and Electronic Engineering,Imperial College [email protected]

ABSTRACT

Evolutionary optimization is a generic population-based meta-heuristic that can be adapted to solve a wide variety of optimizationproblems and has proven very effective for combinatorial optimiza-tion problems. However, the potential of this metaheuristic hasnot been utilized in Distributed Constraint Optimization Problems(DCOPs), a well-known class of combinatorial optimization prob-lems prevalent in Multi-Agent Systems. In this paper, we present anovel population-based algorithm, Anytime Evolutionary DCOP(AED), that uses evolutionary optimization to solve DCOPs. In AED,the agents cooperatively construct an initial set of random solutionsand gradually improve them through a new mechanism that con-siders an optimistic approximation of local benefits. Moreover, wepresent a new anytime update mechanism for AED that identifiesthe best among a distributed set of candidate solutions and notifiesall the agents when a new best is found. In our theoretical analysis,we prove that AED is anytime. Finally, we present empirical resultsindicating AED outperforms the state-of-the-art DCOP algorithmsin terms of solution quality.

KEYWORDS

Distributed Problem Solving, DCOPs

ACM Reference Format:

Saaduddin Mahmud, Moumita Choudhury, Md. Mosaddek Khan, Long Tran-Thanh, and Nicholas R. Jennings. 2020. AED: An Anytime EvolutionaryDCOP Algorithm. In

Proc. of the 19th International Conference on AutonomousAgents and Multiagent Systems (AAMAS 2020), Auckland, New Zealand, May9–13, 2020,

IFAAMAS, 9 pages.

Distributed Constraint Optimization Problems (DCOPs) are a widelyused framework to model constraint handling problems in coop-erative Multi-Agent Systems (MAS). In particular, agents in thisframework need to coordinate value assignments to their variablesin such a way that minimizes constraint violations by optimizingtheir aggregated costs [28]. This framework has been applied tovarious areas of multi-agent coordination, including distributed

Proc. of the 19th International Conference on Autonomous Agents and Multiagent Systems(AAMAS 2020), B. An, N. Yorke-Smith, A. El Fallah Seghrouchni, G. Sukthankar (eds.), May9–13, 2020, Auckland, New Zealand meeting scheduling [20], sensor networks [6][3] and smart grids[8].Over the last two decades, several algorithms have been pro-posed to solve DCOPs, and they can be broadly classified into twoclasses: exact and non-exact. The former always provide an optimalsolution of a given DCOP. Among the exact algorithms, SyncBB[12], ADOPT [21], DPOP [24], AFB [11], BnB-ADOPT [27], andPT-FB [18] are widely used. Since solving DCOPs optimally is NP-hard, scalability becomes an issue as the system grows. In contrast,non-exact algorithms compromise some solution quality for scala-bility. As a consequence, diverse classes of non-exact algorithmshave been developed to deal with large-scale DCOPs. Among them,local search based algorithms are generally most inexpensive interms of computational and communication cost. Some well-knownalgorithms of this class are DSA [29], MGM & MGM2 [19], andGDBA [22]. Also, in order to further enhance solution quality andincorporate an anytime property in local search based algorithms,the Anytime Local Search (ALS) framework [30] was introduced.While inference based non-exact approaches such as Max-Sum[7][15][14] and Max-Sum_ADVP [31] have also gained attentiondue to their ability to handle n-ary constraints explicitly and guar-antee optimality on acyclic constraint graphical representations ofDCOPs. The third class of non-exact approaches that have been de-veloped are sample-based algorithms (e.g. DUCT [23] and PD-Gibbs[25]) in which the cooperative agents sample the search space in adecentralized manner to solve DCOPs.More recently, a new class of non-exact DCOP algorithms haveemerged in the literature through the introduction of a population-based algorithm ACO_DCOP [2]. ACO_DCOP is derived from a cen-tralized population-based approach called Ant Colony Optimization(ACO) [4]. It has been empirically shown that ACO_DCOP producessolutions with better quality than the aforementioned classes ofnon-exact DCOP algorithms [2]. It is worth noting that although awide variety of centralized population-based algorithms exist, ACOis the only such method that has been adapted to solve DCOPs.Among the remaining centralized population-based algorithms, alarge portion is considered as evolutionary optimization techniques(e.g. Genetic Algorithm [13], Evolutionary Programming [9]). Evo-lutionary optimization, as a population-based metaheuristic, hasproven very effective in solving combinatorial optimization prob-lems such as Traveling Salesman Problem [10], Constraint Satisfac-tion Problem [26], and many others besides. However, no prior work a r X i v : . [ c s . M A ] S e p xists that adapts evolutionary optimization techniques to solveDCOPs. Considering the effectiveness of evolutionary optimizationtechniques in solving combinatorial optimization problems alongwith the potential of population-based DCOP solver demonstratedby ACO_DCOP motivates us to explore this nascent area.Against this background, this paper proposes a novel population-based algorithm that uses evolutionary optimization to solve DCOPs.We call this Anytime Evolutionary DCOP (AED). In more detail,AED maintains a set of candidate solutions that are distributedamong the agents, and they search for new improved solutionsby modifying the candidate solutions. This modification is donethrough a new mechanism that considers an optimistic approxi-mation of local benefits and utilizes the cooperative nature of theagents. Moreover, we introduce a new anytime update mechanismin order to identify the best among this distributed set of candidatesolutions and help the agents to coordinate value assignments totheir variables based on the best candidate solution. Our theoreti-cal analysis proves that AED is anytime and empirical evaluationshows its superior solution quality compared to the state-of-the-artnon-exact DCOP algorithms. In this section, we first describe DCOPs and Evolutionary Optimiza-tion in more detail. Then, we discuss challenges that need to beaddressed in order to effectively extend evolutionary optimizationin the context of DCOPs.

Formally, a DCOP is defined by a tuple ⟨ X , D , F , A , δ ⟩ [21] where, • A is a set of agents { a , a , ..., a n } . • X is a set of discrete variables { x , x , ..., x m } , which arebeing controlled by the set of agents A. • D is a set of discrete and finite variable domains { D , D , ..., D m } ,where each D i is a set containing values which may be as-signed to its associated variable x i . • F is a set of constraints { f , f , ..., f l } , where f i ∈ F is afunction of a subset of variables x i ⊆ X defining the re-lationship among the variables in x i . Thus, the function f i : × x j ∈ x i D j → R denotes the cost for each possible assign-ment of the variables in x i . • δ : X → A is a variable-to-agent mapping function [16]which assigns the control of each variable x i ∈ X to an agentof A . Each variable is controlled by a single agent. However,each agent can hold several variables.Within the framework, the objective of a DCOP algorithm is toproduce X ∗ ; a complete assignment that minimizes the aggregatedcost of the constraints as shown in Equation 1. X ∗ = argmin X l (cid:213) i = f i ( x i ) (1)For ease of understanding, we assume that each agent controlsone variable. Thus, the terms ‘variable’ and ‘agent’ are used inter-changeably throughout this paper. Figure 1a illustrates a sample For a maximization problem argmin is replaced with argmax in Equation 1. x x x x (a) A constraint graph x x x x x x x x Figure 1: Example DCOP

DCOP using a constraint graph where each node represents anagent a i ∈ A labelled by a variable x i ∈ X that it controls and eachedge represents a function f i ∈ F connecting all x j ∈ x i . Figure 1bshows the corresponding cost tables. Evolutionary optimization is a generic population-based meta-heuristic inspired by biological evolutionary mechanisms such asSelection, Reproduction and Migration. The core mechanism ofevolutionary optimization techniques can be summarized in threesteps. In the first step, an initial population is generated randomly.A population is a set of ‘individuals’, each of which is a candidatesolution of the corresponding optimization problem. Besides, a fit-ness function is defined to evaluate the quality of an individualconcerning a global objective. The fitness of all the individualsin the initial population is also calculated. In the second step, asubset of the population is selected based on their fitness to repro-duce new individuals. This process is known as Selection. In thefinal step, new individuals are created using the selected subsetof the population and their fitness is evaluated. New individualsthen replace a subset of old individuals. Evolutionary optimizationperforms both the second and the third steps iteratively, whichresults in a gradual improvement in the quality of individuals. Anadditional step is performed at regular intervals by some paral-lel/distributed evolutionary optimization models that concurrentlymaintain multiple sub-populations instead of a single population.In this step, individuals are exchanged between sub-populations.This process is known as Migration, and this interval is known asthe Migration Interval.

We need to address the following challenges in order to develop aneffective anytime algorithm that uses evolutionary optimization tosolve DCOPs: • Individual and fitness:

We need to define an individualthat represents a solution of a DCOP along with a fitnessfunction to evaluate its quality concerning Equation 1. Wealso need to provide a method for calculating this fitnessfunction in a distributed manner. • Population:

We need to provide a strategy to maintain thepopulation collectively among the agents. Although creatingan initial random population is a trivial task for centralizedproblems, we need to find a distributed method to constructan initial random population for a DCOP. x x x Figure 2: BFS Pseudo-Tree • Reproduction mechanism:

In the DCOP framework, in-formation related to the entire problem is not available toany single agent. So it is necessary to design a Reproduc-tion method that can utilize information available to a singleagent along with the cooperative nature of the agents. • Anytime update mechanism:

We need to design an any-time update mechanism that can successfully perform thefollowing tasks – (i) Identify the best individual in a pop-ulation that is distributed among the agents. (ii) Notify allthe agents when a new best individual is found. (iii) Helpcoordinate the variable assignment decision based on thebest individual in a population.In the following section, we describe our method that addressesthe above challenges.

AED is a synchronous iterative algorithm that consists of twophases: Initialization and Optimization. During the former, agentsinitially order themselves into a pseudo-tree, then initialize the nec-essary variables and parameters. Finally, they make a random as-signment to the variables they control and cooperatively constructthe initial population. During the latter phase, agents iterativelyimprove this initial set of solutions using the cooperation of theirneighbours. When an agent detects a better solution, it notifiesother agents. Moreover, all the agents synchronously update theirassignments based on the best of the individuals reported to themso far. This results in a monotonic improvement of the global ob-jective. Algorithm 1 shows the pseudo-code for AED. For ease ofunderstanding, we show the process of initialization and anytimeupdate separately in Procedure 1 and Procedure 2, respectively.Note that the initialization phase addresses the first two of ourchallenges, while the optimization phase addresses the rest.

The Initialization Phase of AED consists of two parts; pseudo-tree construction and running INIT (Procedure 1) that initializesthe population, parameters and variables (Algorithm 1: Line 1-2).This phase starts by ordering the agents into a Pseudo-Tree. Thisordering serves two purposes. It helps in the construction of theinitial population and facilitates ANYTIME-UPDATE (Procedure2) during the optimization phase. Even though either of the BFSor DFS pseudo-tree can be used, AED uses BFS Pseudo-tree . Thisis because it generally produces a pseudo-tree with smaller height[1], which improves the performance of ANYTIME-UPDATE (seeTheoretical Analysis for details). Figure 2 shows an example of aBFS pseudo-tree constructed from the constraint graph shown inFigure 1a having x as the root. Here, the height (i.e. H = 2) of We suggest using the algorithm described in [1]. Height can be easily calculated byutilizing LAYER information. Length of the longest path in the pseudo-tree.

Algorithm 1:

Anytime Evolutionary DCOP Construct pseudo-tree Every agent a i calls INIT( ) while Stop condition not met each agent a i do P selected ← Select rp (| N i | ∗ ER ) P new ← Partition P selected into equal size subsets { P n new , ..., P n | Ni | new } for n j ∈ N i do Modify individuals in P n j new by Equations 5, 6, 7, 9 Send message P n j new to n j for P n j received received from n j ∈ N i do Modify individuals in P n j received by Equations 8, 9 Send P n j received to n j for n j ∈ N i do Receive P n j new back from n j P a i ← P a i ∪ P new B ← argmin I ∈ P ai I . f itness AN YT I ME − U P DAT E ( B ) P a i ← Select wrp (| N i | ∗ ER ) if Itr = Itr M + MI then for n j ∈ N i do Send

Select wrp ( ER ) to n j for P n j received received from n j ∈ N i do P a i ← P a i ∪ P n j received this pseudo-tree is calculated during the time of construction andis maintained by all agents. From this point, N i refers to the setof neighbours; C i ⊆ N i refers to the set of child nodes and PR i refers to the parent of an agent a i in the pseudo-tree. For instance,we can see in Figure 2 that N = { a , a , a } , C = { a , a } and PR = a for agent a . After the pseudo-tree construction, all theagents synchronously call the procedure INIT (Algorithm 1: Line2). INIT starts by initializing all the parameters and variables totheir default values . Then each agent a i sets its variable x i to arandom value from its domain D i . Lines 3 to 25 of Procedure 1describe the initial population construction process. In AED, wedefine population P as a set of individuals that are collectivelymaintained by all the agents and local population P a i ⊆ P as thesubset of the population maintained by agent a i . An individual inAED is represented by a complete assignment of variables in X andfitness is calculated using a fitness function shown in Equation 2.This function calculates the aggregated cost of constraints yieldedby the assignment. Hence, optimizing this fitness function resultsin an optimal solution for the corresponding DCOP. f itness = (cid:213) f i ∈ F f i ( x i ) (2)Note that a single agent can not calculate the fitness function. Ratherit is calculated in parts with the cooperation of all the agents dur-ing the construction process. Moreover, the fitness value is added AED takes a default value for each of the parameters as input. Default values of thevariables are discussed later in this section. o the representation of an individual because it enables an agentto recalculate the fitness when a new individual is constructedonly using local information. We take I = { x = , x = , x = , x = , f itness = } as an example of a complete individualfrom the DCOP shown in Figure 1. We use dot(.) notation to referto a specific element of an individual. For example I . x refers to x in I. Additionally, we define a Merger operation of two indi-viduals under construction, I , I as Merge( I , I ). This operationconstructs a new individual I by aggregating the assignments andsetting I . f itness = I . f itness + I . f itness . We define an extendedMerge operation for two ordered sets of individuals S and S as Merдe ( S , S ) = { I i : Merдe ( S . I i , S . I i )} where I i is the i-th indi-vidual in a set.At the beginning of the construction process, each agent a i sets P a i to a set of empty individuals . The size of the initial P a i isdefined by parameter IN. Then for each individual I ∈ P a i , agent a i makes a random assignment to I . x i . After that each agent a i executes a merger operation on P a i with each local populationmaintained by agents in N i (Procedure 1: Line 2-8). At this point, anindividual I ∈ P a i consists of an assignment of variables controlledby a i , and agents in N i with fitness set to zero. For example, I = { x = , x = , x = , f itness = } represents an individualof P a . The fitness of each individual is then set to the local costaccording to their current assignment (Procedure 1: Line 9-10).Hence, the individual I from the previous example becomes { x = , x = , x = , f itness = } . In the next step, each agent a i executes a merger operation on P a i with each local population thatis maintained by the agents in C i . Then each agent a i sends P a i to PR i apart from the root (Procedure 1: Line 11-18). At the end ofthis step, the local population maintained by the root consists ofcomplete individuals. However, their fitness is twice its actual valuesince each constraint is calculated twice. Therefore, the root agentat this stage corrects all the fitness values (Procedure 1: Lines 20-21).Finally, the local population of the root agent is distributed throughthe network so that agents can initialize their local population(Procedure 1: Line 22-25). This concludes the initialization phaseand after that, all the agents synchronously start the optimizationphase in order to improve this initial population iteratively. The Optimization Phase of AED consists of five steps, namelySelection, Reproduction, ANYTIME-UPDATE, Reinsertion and Mi-gration. An agent a i begins an iteration of this phase by selectingindividuals from P a i for the Reproduction step (Algorithm 1: Line 4).Prior to this selection, all the individuals are ranked from ( , R max ] based on their relative fitness in the local population P a i . The rank R j of an individual I j ∈ P a i is calculated using Equation 3. Here, I best and I worst are the individuals with the lowest and highestfitness in P a i respectively . We define Select rp ( S ) as the process oftaking a sample with replacement of size S from population P a i based on the probability calculated using Equation 4. As α increasesin Equation 4, the fitness vs. selection probability curve gets steeper.As a consequence, individuals with better fitness get selected moreoften. In this way, α controls the exploration and exploitation dy-namics in the Selection mechanism (See Section 5 for more details). Individuals with no assignment and fitness set to 0. For minimization problems, a lower value of fitness is better. Any individual can be selected more than once.

Procedure 1:

INIT( ) Initialize algorithm parameters IN, ER, R max , α , β , MI andvariables LB, GB, FM, UM x i ← random value from D i P a i ← Set of empty individuals for Individual I ∈ P a i do I . x i ← a random value from D i Send P a i to agents in N i for P n j received from n j ∈ N i do P a i ← Merдe ( P a i , P n j ) for Individual I ∈ P a i do I . f itness ← (cid:205) n j ∈ N i Cost i , j ( I . x i , I . x j ) if | C | = then Send P a i to PR i else Wait until received P c j from all c j ∈ C i for P c j received from c j ∈ C i do P a i ← Merдe ( P a i , P c j ) if a i (cid:44) root then Send P a i to PR i else for Individual I ∈ P a i do I . f itness ← I . f itness / Send P a i to all agent in C i if Received P PR i from PR i then P a i ← P PR i Send P a i to all agent in C i For example, assume P a i consists of 3 individuals I , I , I with fit-ness 16, 30, 40 respectively and R max =

5. Then Equations 3 and 4will yield, P ( I ) = . , P ( I ) = . , P ( I ) = .

027 if α = P ( I ) = . , P ( I ) = . , P ( I ) = . α =

3. Duringthis step, each agent a i selects | N i | ∗ ER individuals from P a i whichwe define as P selected . R j = R max ∗ | I worst . f itness − I j . f itness | + | I worst . f itness − I best . f itness | + P ( I j ) = R αj (cid:205) I k ∈ P ai R αk (4)Now, lines 5 to 11 of Algorithm 1 illustrate our proposed Repro-duction mechanism. Agents start this step by partitioning P selected into | N i | subsets of size ER. Then each subset is randomly assignedto a unique neighbour. The subset assigned to n j ∈ N i is denotedby P n j new . An agent a i creates a new individual from each I ∈ P n j new with cooperation of neighbour n j . Initially, agent a i changes assign-ment I . x i by sampling from its domain D i using Equations 5, 6, 7.Then, P n j new is sent to n j . Agent n j updates its assignment of I . x j foreach I ∈ P a i received (i.e. P n j new ) using Equation 8. Additionally, bothagents a i and n j update the fitness of the individual I by adding δ i and δ j to I.fitness, respectively. Here, δ ∗ is calculated using Equa-tion 9 where I . x new ∗ and I . x old ∗ are the old and new values of I . x ∗ ,respectively. d i = (cid:213) n k ∈ N i \ n j Cost i , k ( I . x i , I . x k ) + min d j ∈ D j Cost i , j ( I . x i , d j ) (5) W d i = O max ∗ | O worst − O d i | + | O worst − O best | + P ( d i ) = W βd i (cid:205) d k ∈ D i W βd k (7) I . x j = argmin d j ∈ D j (cid:213) n k ∈ N j Cost j , k ( d j , I . x k ) (8) δ ∗ = (cid:213) n k ∈ N ∗ Cost ∗ , k ( I . x new ∗ , I . x k ) − Cost ∗ , k ( I . x old ∗ , I . x k ) (9)For example, agent a of Figure 1 creates a new individual from I = { x = , x = , x = , x = , f itness = } with the helpof neighbour a . Here, the domain of agent a and a is { , } .Initially, agent a calculates P(1) = 0.90 and P(2) = 0.10 using Equa-tion 5, 6, 7 ( β = I . x by sampling this probabilitydistribution. The fitness is also updated by adding δ i (= -11). Letthe updated I be { x = , x = , x = , x = , f itness = } ,it is then sent to a . Based on Equation 8, the new value of I . x should be 1. Now, agent a updates I . x along with the fitness byadding δ j (= -16) and sends I back to a . Hence, Agent a receives I = { x = , x = , x = , x = , f itness = } .To summarize the Reproduction mechanism, each agent a i picksa neighbour n j randomly for each I ∈ P selected . Agent a i thenupdates I . x i by sampling based on the most optimistic cost (i.e. thelowest cost) of the constraint between a i and n j and aggregatedcost of the remaining local constraints. This cost represents theoptimistic local benefit for each domain value. Then n j sets I . x j to avalue that complements the optimistic change in I . x i most. The keyinsight of this mechanism is that it not only takes into account theimprovement in fitness that the change in I . x i will bring but alsoconsiders the potential improvement the change in I . x j will bring.Moreover, note that the parameter β in Equation 7 plays a similarrole as parameter α in Equation 3 (See Section 5 for details). Aftercollecting the newly constructed individuals from neighbours theyare added to P a i (Algorithm 1: Line 12-14). Then the best individualB in P a i is sent for ANYTIME-UPDATE (Algorithm 1: Line 15-16).To facilitate the anytime update mechanism, each agent main-tains four variables LB, GB, FM, UM. LB (Local Best) and GB (GlobalBest) are initialized to empty individuals with fitness set to infinity.FM and UM are initialized to ∅ . Additionally, GB is stored with aversion tag and each agent maintains previous versions of GB hav-ing version tags in the range [ Itr − H + , Itr ] (see the Theoreticalsection for details). Here, Itr refers to the current iteration number.We use GB j to refer to the latest version of GB with version tag notexceeding j. Ours proposed anytime update mechanism works asfollows. Each agent keeps track of two different best, LB and GB.Whenever the fitness of LB becomes less than GB, it has the poten-tial to be the global best solution. So it gets reported to the rootthrough the propagation of a Found message up the pseudo-tree.Since the root gets reports from all the agents, it can identify thetrue global best solution, and notify all the agents by propagatingan Update message down to the pseudo tree. The root also adds the Procedure 2:

ANYTIME-UPDATE(B) if B . f itness < LB . f itness then LB ← B if LB . f itness < GB Itr . f itness then if a i = root then GB itr ← LB U M ← {

V ersion : Itr , Individual : LB } else F M ← {

Individual : LB } Send Update Message UM to agents in C i and Found Message FMto PR i F M ← ∅ U M ← ∅ if Received update message M and M (cid:44) ∅ then GB M . V ersion ← M . individual LB ← Best between LB and M.individual U M ← M if Received found message M and M (cid:44) ∅ and M . individual . f itness < LB . f itness then LB ← M . individual if Itr > = H then x i = GB Itr − H + . x i version tag in the Update message to help coordinate variable as-signment. Now, ANYTIME-UPDATE starts by keeping LB updatedwith the best individual B in P a i . In line 3 of Procedure 2, agents tryto identify whether LB is the potential global best. When identifiedand if the identifying agent is the root, it is the true global best andan Update message UM is constructed. If the agent is not the root,it is a potential global best and a Found message FM is constructed(Procedure 2: Lines 4-8). Each agent forwards the message UM toagents in C i and the message FM to the PR i . Upon receiving thesemessages, an agent takes the following actions: • If an Update message is received then an agent updates bothits GB and LB. Additionally, the agent saves the Updatemessage in UM and sends it to all the agents in C i duringthe next iteration (Procedure 2: Lines 12-15). • If a Found message is received and it is better than LB, onlyLB is updated. If this remains a potential global best it will besent to PR i during next iteration (Procedure 2: Lines 16-17).An agent a i then updates the assignment of x i using GB Itr − H + (Procedure 2: Lines 18-19). Agents make decisions based on GB Itr − H + instead of the potentially newer GB Itr so that decisions are madebased on the same version of GB. GB Itr − H + will be same for allagents since it takes at most H iterations for an Update messageto propagate to all the agents. For example, assume agent a fromFigure 2 finds a potential best individual I at Itr =

3. Unless itgets replaced by a better individual, it will reach the root a viaagent a through a Found message at Itr =

4. Then a constructsan Update message { V ersion : 5 , Individual : I } at Itr =

5. Thismessage will reach all the agents by

Itr = GB = I . Finally, at Itr = GB − + = GB which is the best individual found at Itr = P new to P a i (Algorithm1: line 14). After that, each agent a i updates their P a i by keeping asample of size | N i |∗ ER and discarding the rest based on their fitness(Algorithm 1: line 17). This sample is taken using Select wrp ( S ) which is the same as Select rp ( S , ) except agents sample withoutreplacement . This sampling method keeps the local population P a i diverse by selecting a unique set of individuals.Finally, Migration, an essential step of AED, takes place on ev-ery MI iteration. We sketch this in lines 18-22 of Algorithm 1. Forthis step, we define Itr M as the iteration number when the lastMigration occurred. Migration is a simple process of exchangingindividuals among the neighbours. In AED, the Reproduction mech-anism utilizes local cooperation, so only a subset of variables ofan individual change. However, because of Migration, differentagents can change a different subset of variables as individuals getto traverse the network through this mechanism. Hence, this stepplays an essential role in the optimization process of AED. Duringthis step, an agent a i selects a sample of size ER using Select wrp ( S ) for each n j ∈ N i and sends a copy of those individuals to thatneighbour. Upon collecting individuals from all the neighbours, anagent, a i adds them to its local population P a i . This concludes aniteration of the optimization phase and every step repeats duringthe subsequent iterations.. In this section, we first prove that AED is anytime, that is the qualityof solutions found by AED increase monotonically. Then we analyzethe complexity of AED in terms of communication, computationand memory requirements.Lemma 4.1.

At iteration i + H , the root agent is aware of the bestindividual in P at least up to iteration i . Proof. Suppose, the best individual up to iteration i is foundat iteration i ′ ≤ i by agent a x at level l ′ . Afterwards, one of thefollowing 2 cases will occur at each iteration. • Case 1. This individual will be reported to the parent of thecurrent agent through a Found message. • Case 2. This individual gets replaced by a better individualon its way to the root at iteration i ∗ > i ′ by agent a y atlevel l ∗ . When only Case 1 occurs, the individual will reach the root atiteration i ′ + l ′ ≤ i + H (since l ′ can be at most H). If Case 2occurs, the replaced individual will reach the root agent by i ∗ + l ∗ = { i ∗ − ( l ′ − l ∗ )} + {( l ′ − l ∗ ) + l ∗ } = i ′ + l ′ ≤ i + H . The same can beshown when the new individual also gets replaced. In either case,at iteration i+H , the root will become aware of the best individualin P up to iteration i or will become aware of a better individual in P found at iteration i ∗ > i ; meaning the root will be aware of thebest individual in P at least up to iteration i . □ Lemma 4.2.

The variable assignment decision made by all theagents at iteration i + − yield a global cost equal to the fitnessof the best individual in P at least up to iteration i . Each individual can be selected at most once.

Proof. At iteration i + − , all the agents make decisionsabout variable assignment using GB i + H . However, GB i + H is thebest individual known to the root up to iteration i + H . We knowfrom Lemma 4.1 that, at iteration i + H , the root is aware of thebest individual in P at least up to iteration i . Hence, the fitness of GB i + H is at least equal to the best individual in P up to iteration i . Hence, at iteration i + − , it yields a global cost equal to thefitness of the best individual in P at least up to iteration i . □ Proposition 4.3.

AED is anytime.

Proof. From Lemma 4.2, the decisions regarding the variableassignments at iterations i + − and i + − + δ yields aglobal cost equal to the fitness of the best individual in P at leastup to iterations i and i + δ ( δ ≥ ), respectively. Now, the fitness ofthe best individual in P up to iteration i + δ is at most the fitness atiteration i . So the global cost at iteration i + δ is less than or equalto the same cost at iteration i . As a consequence, the quality ofthe solution monotonically improves as the number of iterationsincreases. Hence, AED is anytime. □ We now consider algorithm complexity. Assume, n is the numberof agents, | N | is the number of neighbours and | D | is the domainsize of an agent. In every iteration, an agent sends 2 | N | messagesduring the Reproduction step. Additionally, at most | N | messagesare passed for each of the ANYTIME-UPDATE and Migration steps.Now, | N | can be at most n (complete graph). Hence, the total numberof messages transmitted per agent during an iteration is O ( | N |) = O ( n ) . Since the main component of a message in AED is the setof individuals, the size of a single message can be calculated asthe size of an individual multiplied by the number of individuals.During the Reproduction, Migration and ANYTIME-UPDATE steps,at most ER individuals, each of which has size O ( n ) , is sent in asingle message. As a result, the size of a single message is O ( ER ∗ n ) .This makes the total message size per agent during an iteration O ( ER ∗ n ∗ n ) = O ( n ) .Before Reproduction, | P a i | can be at most 2 ER ∗ | N | (if Migra-tion occurred in the previous iteration) and Reproduction will add ER ∗ | N | individuals. So the memory requirement per agent is O ( ∗ ER ∗ | N | ∗ n ) = O ( n ) . Finally, Reproduction using Equa-tions 5, 6, 7, 8 and 9 requires | D i | ∗ | N | operations and in total ER ∗ | N | individuals are reproduced during an iteration per agent.Hence, the total computation complexity per agent during an itera-tion is O ( ER ∗ | N | ∗ | D | ∗ | N |) = O (| D | ∗ n ) . In this section, we empirically evaluate the quality of solutionsproduced by AED compared to six different state-of-the-art DCOPalgorithms. We show that AED asymptotically converges to solu-tions of quality higher than these six state-of-the-art algorithms.We select these algorithms to represent all four classes of non-exactalgorithms. Firstly, among the local search algorithms, we pick DSA(type C, P = 0.8, this value of P yielded the best performance in oursettings), MGM2 (with offer probability p = 0.5) and GDBA (N, NM,T; reported to perform the best [22]). Secondly, among the inference-based non-exact algorithms, we compare with Max-Sum_ADVP − as it has empirically shown to perform significantly better than

200 400 600 800 1000

Iteration C o s t AEDACO DCOPMGM2GDBA(M,NM,T)DSA(P=0.8)Maxsum ADVPPD-Gibbs

Figure 3: Comparison of AED and the benchmarking algo-rithms on a sparse configurations of random DCOPs.

Max-Sum [31]. We used switching parameter that yielded best re-sult between n and 2n, where n is the number of agents. Thirdly,we consider a sampling-based algorithm, namely PD-Gibbs, whichis the only such algorithm that is suitable for large-scale DCOPs[25]. Finally, we compare with ACO_DCOP as it is only availablepopulation-based DCOP algorithm. To evaluate ACO_DCOP, weuse the same values of the parameters recommended in [2]. Wediscuss parameter settings of AED in details later in this section.Additionally, we used the ALS framework for non-monotonic algo-rithms having no anytime update mechanism.We compare these algorithms on three different benchmarks.We consider random DCOPs for our first benchmark. Specifically,we set the number of agents to 70 and domain size to 10. We useErdős-Rényi topology (i.e. random graph) to generate the constraintgraphs with the value of p = . [ , ] . Our secondbenchmark is identical to the first setting except the value of p = . p = .

05 and constraint violationcosts are selected uniformly from [ , ] . In all three settings, werun all algorithms on 70 independently generated problems and30 times on each problem. Moreover, for stopping condition weconsider both max-iteration and max-time. For max-iteration, westop each of the algorithms after the 1000-th iteration. For max-time,we run each algorithm for 4 seconds, 25 seconds and 6 secondsfor the aforementioned benchmarks 1, 2 and 3, respectively. Inorder to conduct these experiments, we use a GCP-n2-highcpu-64instance - a cloud computing service which is publicly accessibleat cloud.google.com. It is worth noting that all differences shown inFigures 3, 4, 5 and Table 1 are statistically significant for p − value < . For implementing

Select wrp ( . ) we use Reservoir-sampling algorithm [17]. Forperforming set operation we use constant time polynomial hashing.

64 Intel Skylake vCPU @ 2.0 GHZ and 58 GB RAM

Iteration . . . . . C o s t × AEDACO DCOPMGM2GDBA(M,NM,T)DSA(P=0.8)Maxsum ADVPPD-Gibbs

Figure 4: : Comparison of AED and the benchmarking algo-rithms on a dense configurations of random DCOPs.

Iteration C o s t AEDACO DCOPMGM2GDBA(M,NM,T)DSA(P=0.8)Maxsum ADVPPD-Gibbs

Figure 5: Comparison of AED and the benchmarking algo-rithms on weighted graph coloring problems. the run due to their superior capability of exploration. However,it can be observed from Table 1 that AED produces 1.7% bettersolution than ACO_DCOP after running an equal amount of time.In contrast, most of the local search algorithms converge to localoptima within 400 iterations (see Figure 3) - with GDBA producingthe best performance. After running an equal amount of time, AEDoutperforms GDBA by a 9% and DSA by 14 . . − .

8% margin. The superiority of AEDin this experiment indicates that the Selection method along withthe new Reproduction mechanism based on optimistic local benefitachieves a better balance between exploration and exploitation.This helps AED to explore until the end of the run and producesolutions with better quality than the state-of-the-art algorithms.Figure 4 shows a comparison between AED and other bench-marking algorithms on dense random DCOP benchmark. It clearlyshows the advantage of AED over its competitors. To be exact, it out-performs the benchmarking algorithms by a margin of 0 . − . Table 1: Comparison of AED and the benchmarking algo-rithms using Max-Time as Stopping condition.Algorithm EXP - 1 EXP - 2 EXP - 3

DSA 6076 56799 781MGM-2 5775 56780 486GDBA 5770 56051 310PD-Gibbs 6021 56985 682MS_ADVP 5877 56786 625ACO_DCOP 5380 55735 291

AED 5289 55347 229

Figure 5 shows a comparison between AED and the other bench-marking algorithms on weighted graph colouring problems. Inthis experiment, AED demonstrates its excellent performance byoutperforming other algorithms by a significant margin. Amongthe benchmarking algorithms, ACO_DCOP is the closest but stilloutperformed by AED by a 27% margin. Among the local searchalgorithms, GDBA is the most competitive, but AED still finds solu-tions that are 35% better. Finally, it improves the quality of solutionsaround 1 . − . Iteration C o s t AED(alpha = 1, beta = 1)AED(alpha = 1, beta = 5)AED(alpha = 1, beta = 6)AED(alpha = 3, beta = 1)AED(alpha = 3, beta = 5)AED(alpha = 3, beta = 6)AED(alpha = VAR, beta = 5)

Figure 6: Performance of AED for different α and β on asparse configurations of random DCOPs. Now we consider the effects of different parameters on the bench-marks. Firstly, for all three benchmarks, we set

I N , which definesthe initial population size to 50. A small value of

I N will affectthe exploration of AED. However, after 50, it does not have anysignificant effect on the solution quality. Secondly, for all threebenchmarks, we set the migration interval, MI to 5. Through theMigration process, individuals get to traverse the network anddifferent agents get to change different variables of an individual.Hence, Migration works as an implicit cooperation mechanism. Ifthe value of MI is set too high, convergence will slow down dueto a lack of cooperation. On the other hand, when it is set too low,the population will lack diversity as different sub-population willmix fast. Thirdly, we show the effect of parameter ER on solution Table 2: Solution Quality & Memory Requirement Per-Agentof AED for different ER value.ER Solution Quality Memory (KB)EXP-1 EXP-2 EXP-3 EXP-1 EXP-2 EXP-3

05 5378 55550 275 39 188 5310 5346 55450 252 69 367 9720 5316

240 129

229 248

50 5285 55310 223 308 1899 445quality and memory requirement on Tables 2. ER effectively de-termines the population size. When it is set too low, explorationwill suffer. However, as we increase ER after a certain threshold, itdoes not improve solution quality by any significant margin. Wespecifically highlight the different values of ER we used in differentbenchmarks on Table 2. Notice that even with the small value of ER = α and β in Figure 6. While keep-ing α constant as we increase β , both the solution quality and theconvergence rate increase up to a threshold. After that, the conver-gence rate does not change much but the solution quality startsto suffer. As we increase β , the Reproduction mechanism startsto exploit more than explore. At the threshold value, the balancebetween exploitation and exploration becomes optimal. After that,when we increase β , the exploration starts to suffer. Hence, thisphenomenon occurs. In Figure 6, we observe that this β thresholdis 5 (Benchmark 1). For Benchmark 2, we have found this thresholdto be 5 and the value is 2 for Benchmark 3. On the other hand, as weincrease α , the convergence rate increases but the solution qualitydecreases. In order to mitigate this problem, we use a variable α .To be precise, in the first 150 iterations, we use α =

3. We thenuse α = α . Figure 6 shows that alpha = V AR yields a similar solution quality as α =

1; however,the convergence rate is near α = In this paper, we introduce a novel algorithm called AED that effec-tively uses evolutionary optimization to solve DCOPs. To incorpo-rate the anytime property in AED, we also present a new anytimeupdate mechanism. In our theoretical evaluation, we prove thatAED is anytime. Finally, we present empirical results that showthat AED outperforms state-of-the-art non-exact algorithms by1 . − .

9% on sparse random DCOPs, 0 . − .

0% on dense ran-dom DCOPs. More notably, AED produces 0 . − . This research is partially supported by the ICT Division of BangladeshGovernment and University Grants Commission of Bangladesh.

EFERENCES [1] Ziyu Chen, Zhen He, and Chen He. 2017. An improved DPOP algorithm based onbreadth first search pseudo-tree for distributed constraint optimization.

AppliedIntelligence

47 (2017), 607–623.[2] Ziyu Chen, Tengfei Wu, Yanchen Deng, and Cheng Zhang. 2018. An Ant-BasedAlgorithm to Solve Distributed Constraint Optimization Problems. In

Proceedingsof the 32nd AAAI Conference on Artificial Intelligence .[3] Moumita Choudhury, Saaduddin Mahmud, and Md. Mosaddek Khan. 2019. A Par-ticle Swarm Based Algorithm for Functional Distributed Constraint OptimizationProblems.

ArXiv abs/1909.06168 (2019).[4] Marco Dorigo, Mauro Birattari, and Thomas Stützle. 2006. Ant colony optimiza-tion: artificial ants as a computational intelligence technique.[5] Paul Erdős and Alfréd Rényi. 1960. On the evolution of random graphs.

Instituteof Mathematics, Hungarian Academy of Sciences

Autonomous agents and multi-agent systems

28 (2014), 337–380.[7] Alessandro Farinelli, Alex Rogers, Adrian Petcu, and Nicholas R. Jennings. 2008.Decentralised coordination of low-power embedded devices using the max-sumalgorithm. In

Proceedings of the 7th International Conference on AutonomousAgents and Multiagent Systems .[8] Ferdinando Fioretto, William Yeoh, Enrico Pontelli, Ye Ma, and Satishkumar J.Ranade. 2017. A Distributed Constraint Optimization (DCOP) Approach to theEconomic Dispatch with Demand Response. In

Proceedings of the 16th Interna-tional Conference on Autonomous Agents and Multiagent Systems .[9] David B. Fogel. 1966. Artificial Intelligence through Simulated Evolution.[10] David B. Fogel. 1988. An evolutionary approach to the traveling salesman problem.

Biological Cybernetics

60 (1988), 139–144.[11] Amir Gershman, Amnon Meisels, and Roie Zivan. 2009. Asynchronous ForwardBounding for Distributed COPs.

Journal of Artificial Intelligence Research

Principles and Practice of Constraint Programming-CP97 .222–236.[13] John Henry Holland et al. 1975.

Adaptation in natural and artificial systems: anintroductory analysis with applications to biology, control, and artificial intelligence .MIT press.[14] Md. Mosaddek Khan, Long Tran-Thanh, and Nicholas R. Jennings. 2018. AGeneric Domain Pruning Technique for GDL-Based DCOP Algorithms in Coop-erative Multi-Agent Systems. In

Proceedings of the 17th International Conferenceon Autonomous Agents and MultiAgent Systems . 1595âĂŞ1603.[15] Md Mosaddek Khan, Long Tran-Thanh, Sarvapali D Ramchurn, and Nicholas RJennings. 2018. Speeding Up GDL-Based Message Passing Algorithms for Large-Scale DCOPs.

Comput. J.

61 (2018), 1639–1666.[16] Md. Mosaddek Khan, Long Tran-Thanh, William Yeoh, and Nicholas R. Jennings.2018. A Near-Optimal Node-to-Agent Mapping Heuristic for GDL-Based DCOPAlgorithms in Multi-Agent Systems. In

Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems . 1604âĂŞ1612.[17] Kim-Hung Li. 1994. Reservoir-Sampling Algorithms of Time Complexity O(n(1 +Log(N/n))).

ACM Trans. Math. Software

20 (1994), 481âĂŞ493.[18] Omer Litov and Amnon Meisels. 2017. Forward bounding on pseudo-trees forDCOPs and ADCOPs.

Artificial Intelligence

252 (2017), 83–99.[19] Rajiv T Maheswaran, Jonathan P Pearce, and Milind Tambe. 2004. DistributedAlgorithms for DCOP: A Graphical-Game-Based Approach.. In

Proceedings of theISCA PDCS . 432–439.[20] Rajiv T. Maheswaran, Milind Tambe, Emma Bowring, Jonathan P. Pearce, andPradeep Varakantham. 2004. Taking DCOP to the Real World: Efficient Com-plete Solutions for Distributed Multi-Event Scheduling. In

Proceedings of the 3rdInternational Conference on Autonomous Agents and Multiagent Systems .[21] Pragnesh Jay Modi, Wei-Min Shen, Milind Tambe, and Makoto Yokoo. 2005.ADOPT: Asynchronous distributed constraint optimization with quality guaran-tees.

Artificial Intelligence

161 (2005), 149–180.[22] Steven Okamoto, Roie Zivan, Aviv Nahon, et al. 2016. Distributed Breakout:Beyond Satisfaction.. In

Proceedings of the 30th International Joint Conference onArtificial Intelligence .[23] Brammert Ottens, Christos Dimitrakakis, and Boi Faltings. 2012. DUCT: An UpperConfidence Bound Approach to Distributed Constraint Optimization Problems.

ACM TIST

Proceedings of the 19th International Joint Conference on ArtificialIntelligence .[25] Nguyen Thien, William Yeoh, Hoong Lau, and Roie Zivan. 2019. DistributedGibbs: A Linear-Space Sampling-Based DCOP Algorithm.

Journal of ArtificialIntelligence Research

64 (2019), 705–748.[26] Edward PK Tsang and Terry Warwick. 1990. Applying genetic algorithms toconstraint satisfaction optimization problems. In

Proceedings of the 9th EuropeanConference on Artificial Intelligence .[27] William Yeoh, Ariel Felner, and Sven Koenig. 2008. BnB-ADOPT: An Asyn-chronous Branch-and-Bound DCOP Algorithm.

Journal of Artificial IntelligenceResearch

38 (2008), 85–133.[28] Makoto Yokoo, Edmund H Durfee, Toru Ishida, and Kazuhiro Kuwabara. 1998.The distributed constraint satisfaction problem: Formalization and algorithms.

IEEE Transactions on knowledge and data engineering

10 (1998), 673–685.[29] Weixiong Zhang, Guandong Wang, Zhao Xing, and Lars Wittenburg. 2005. Dis-tributed stochastic search and distributed breakout: properties, comparison andapplications to constraint optimization problems in sensor networks.

ArtificialIntelligence

161 (2005), 55–87.[30] Roie Zivan, Steven Okamoto, and Hilla Peled. 2014. Explorative anytime localsearch for distributed constraint optimization.

Artificial Intelligence

212 (2014),1–26.[31] Roie Zivan and Hilla Peled. 2012. Max/min-sum distributed constraint optimiza-tion through value propagation on an alternating DAG. In