A Generic Approach for Accelerating Belief Propagation based DCOP Algorithms via A Branch-and-Bound Technique
Ziyu Chen, Xingqiong Jiang, Yanchen Deng, Dingding Chen, Zhongshi He
AA Generic Approach to Accelerating Belief Propagation based IncompleteAlgorithms for DCOPs via A Branch-and-Bound Technique
Ziyu Chen, Xingqiong Jiang, Yanchen Deng ∗ and Dingding Chen and Zhongshi He College of Computer Science, Chongqing University, Chongqing, China { chenziyu, jxq, dingding, zshe } @cqu.edu.cn, [email protected] Abstract
Belief propagation approaches, such as Max-Sum and itsvariants, are important methods to solve large-scale Dis-tributed Constraint Optimization Problems (DCOPs). How-ever, for problems with n -ary constraints, these algorithmsface a huge challenge since their computational complexityscales exponentially with the number of variables a func-tion holds. In this paper, we present a generic and easy-to-use method based on a branch-and-bound technique to solvethe issue, called Function Decomposing and State Pruning(FDSP). We theoretically prove that FDSP can provide mono-tonically non-increasing upper bounds and speed up beliefpropagation based incomplete DCOP algorithms without aneffect on solution quality. Also, our empirically evaluationindicates that FDSP can reduce 97% of the search space atleast and effectively accelerate Max-Sum, compared with thestate-of-the-art. Introduction
Distributed Constraint Optimization Problems (DCOPs)which require agents to coordinate their decisions to op-timize a global objective, are a fundamental frameworkfor modeling multi-agent coordination in multi-agent sys-tems (Hirayama and Yokoo 1997). Thus, DCOPs are widelydeployed in some real world coordination tasks includingmeeting scheduling (Enembreck and Barths 2012), sensornetworks (Farinelli, Rogers, and Jennings 2014), power net-works (Fioretto et al. 2017), etc.Algorithms for DCOPs can be classified into two cat-egories: complete and incomplete, according to whetherthey guarantee to find the optimal solution. Complete al-gorithms (Hirayama and Yokoo 1997; Modi et al. 2005;Petcu and Faltings 2005; Yeoh, Felner, and Koenig 2008;Vinyals, Rodriguez-Aguilar, and Cerquides 2009; Gersh-man, Meisels, and Zivan 2009) can get the optimal solu-tions but incur exponential communication or computationoverheads since DCOPs are NP-hard. In contrast, incom-plete algorithms (Maheswaran, Pearce, and Tambe 2004;Arshad and Silaghi 2004; Zhang et al. 2005; Ottens, Dim-itrakakis, and Faltings 2012; Nguyen, Yeoh, and Lau 2013; ∗ Corresponding author.Copyright c (cid:13)
Okamoto, Zivan, and Nahon 2016) trade accuracy for com-putation time and memory so that they can be applied tolarge-scale problems. As a kind of incomplete algorithmsbased on belief propagation, Max-Sum (Farinelli et al. 2008)and its variants (Rogers et al. 2011; Zivan and Peled 2012;Chen, Deng, and Wu 2017) have drawn a lot of attentionsince they can easily be deployed to any DCOP setting.Moreover, they can explicitly handle n -ary constraints andmore variables per agent (Cerquides et al. 2014). In moredetail, agents in Max-Sum propagate and accumulate beliefsthrough the whole factor graph. And each agent only holdsits belief about the utility for each possible assignment andcontinuously updates its belief based on the messages re-ceived from its neighbors.In spite of many advantages of belief propagation ap-proaches, they suffer from a huge challenge in scalability.Specifically, they perform maximization operations repeat-edly to locally accumulate beliefs for the involved variables,given the local utility function and a set of incoming mes-sages. The computation complexity of this step grows expo-nentially as the number of constraint arities. In other words,when a constraint function holds n variables and the do-main size of each variable is d , Max-Sum needs to perform d n maximization operations to yield the best assignment foreach variable.To address the issue, two kinds of methods were proposedto improve the scalability of belief propagation approaches.The first kind is the algorithms based on a branch-and-boundtechnique including BnB-MS (Stranders et al. 2009) andBnB-FMS (Macarthur et al. 2011) which both consist of apreprocessing phase and a pruning phase. In the preprocess-ing phase, the two algorithms use localized message-passingto simplify DCOPs. Specifically, BnB-MS reduces the num-ber of moves that each agent needs to consider in coordinat-ing mobile sensors while BnB-FMS removes tasks that anagent should never perform in dynamic task allocations. Inthe pruning phase, both algorithms reduce the search spaceusing a branch-and-bound technique to speed up maximiza-tion operations. Unfortunately, these algorithms require toexchange a lot of messages in their preprocessing phases.Moreover, the bounds in these algorithms are computed byeither brute force or domain-specific knowledge, which lim-its their applicability.The second kind of approaches is sorting based, such as a r X i v : . [ c s . M A ] J un -FBP(Kim and Lesser 2013) and GDP(Khan et al. 2018),which is applicable to all DCOP settings. G-FBP uses par-tially sorted lists to adapt FBP. Specifically, it selects andsorts the top cd n − values of the search space, presumingthat the maximum value can be found from the selectedrange. Here, c is a constant. However, G-FBP will incuradditional computation once the maximum value cannot befound within the selected range. Different from G-FBP, themain idea of GDP is to explore only the rows that can coverthe differences between the sum of the maximal utility ofeach message and the message utility corresponding to theassignment that produces the largest local utility. Thus, GDPneeds to sort the local utilities of all function-nodes inde-pendently by each assignment of each variable in the pre-processing phase and V i is the sorted result of each assign-ment i . Then, GDP returns a pruned range [ p, q ] or [ p, q ) according to whether q == p − t , where p = max( V i ) , q = max c { c ∈ V i : c ≤ ( p − t ) } and t = m − b . Here, m isthe summation of the maximum value for each received mes-sage from other variable-nodes and b is the summation forthe corresponding values of p from the incoming messagesof a function-node. However, GDP needs additional time toperform sorting operations in the preprocessing phase. Moreimportantly, GDP is an one-shot pruning procedure that can-not use the learned experience from the assignment combi-nations explored to dynamically prune the search space.Given the background, we devote to develop a generic andfast method for belief propagation based on a branch-and-bound technique, called Function Decomposing and StatePruning (FDSP). In more detail, we propose a domain-independent approach based on dynamic programming toeffectively evaluate the upper bound of a given partial as-signment, which overcomes the aforementioned drawbacksof BnB-MS and BnB-FMS. We further enforce the upperbounds by exploiting the fact that the assignment of the tar-get variable is given. Finally, we prune the search spacewhenever the upper bound of a partial assignment is nogreater than the best lower bound explored so far. The exper-imental results show the effectiveness of FDSP which canprune at least 97% of the search space when solving com-plex problems. Background
Distributed Constraint Optimization Problems
A distributed constraint optimization problem can be repre-sented by a tuple (cid:104)
A, X, D, F (cid:105) such that: • A = { a , a , . . . , a h } is a set of agents. • X = { x , x , . . . , x q } is a set of variables. • D = { D , D , . . . , D q } is a set of finite and discrete do-mains, variable x i taking an assignment value in D i . • F = { F , F , . . . , F r } is a set of constraints, where eachconstraint F k : x k → R + denotes how much utility isassigned to each possible combination of assignments ofthe involved variables x k ⊆ X .Thus, a constraint function F k ( x k ) denotes the utility foreach possible assignment combination of the variables in x k , x F x x F x x Figure 1: A DCOP instance using a factor graph n = | x k | represents the arity of F k , and d = | D i | denotesthe domain size of variable x i . Note that the variables in x k are ordered according to their own indices, where a variable x k , i is ordered before a variable x k , j if i < j .Given this, the goal for a DCOP is to find the joint variableassignment X ∗ such that a given global objective function ismaximal. Generally, the objective function is described asthe sum over F : X ∗ = arg max X (cid:88) F k ( x k ) ∈ F, x k ⊆ X F k ( x k ) Max-Sum Algorithm
As a belief propagation approach, Max-Sum is a message-passing inference-based algorithm operating on factorgraphs which comprise variable-nodes and function-nodes.Function-nodes which represent the constraints in a DCOPare connected to variable-nodes they depend on, whilevariable-nodes which represent the variables in a DCOP areconnected to function-nodes they are involved in. As shownin Fig. 1, F and F are two function-nodes, x , x , x , x and x are variable-nodes, where x , x , x and x are con-nected to F , and x , x and x are connected to F . Here, F (i.e., F ( x ) , where x = { x , x , x } ) is a -ary con-straint and F (i.e., F ( x ) , where x = { x , x , x , x } ) isa -ary one.In Max-Sum, beliefs are propagated and accumulatedthrough the whole factor graph via the messages exchangedbetween variable-nodes and function-nodes. The messagefrom a variable-node x i to a function-node F k ( x k ) , calledthe query message. It is defined by Q x i → F k ( x i ) = α ik + (cid:88) F j ∈ N i \ F k R F j → x i ( x i ) (1)where α ik is a scalar set such that (cid:80) x i Q x i → F k ( x i ) = 0 , x i ∈ x k and N i \ F k is a set of neighbors of x i except thetarget function-node F k . The response message sent from afunction-node F k ( x k ) to a variable-node x i is given by R F k → x i ( x i ) = max x k \ x i (cid:32) F k ( x k ) + (cid:80) x j ∈ x k \ x i Q x j → F k ( x j ) (cid:33) (2)When a variable-node x i makes its decision, it first ac-cumulates the belief for each possible assignment from allmessages it receives. Then, it selects a value to maximizethe total utilities. The procedure can be formalized by: x ∗ i = arg max x i (cid:88) F k ∈ N i R F k → x i ( x i ) (3) x x x F R R R R R R R G R R G R R R G G R G R R R G R G R G G R R G G G G R R R G R R G G R G R G R G G G G R R G G R G G G G R G G G G (a) . . . . . . x x F x x R F → x ( x ) { , } { , }{ , } (b)Figure 2: The utility matrix and message exchange for F In this paper, we use the term ”target variable” to denotethe destination variable-node of the outgoing message beingcomputed, and M x k , i represents the message from variable x k , i . Proposed Method
Motivation
The maximization operation in Eq. (2) is the most com-putationally expensive operation in Max-Sum. For a n -aryDCOP, the complexity of computing the response messageis O ( d n ) . Take Fig. 2 as an example. Assume that each vari-able takes a value in { R, G } and the function-node F hasreceived the messages M x = { , } , M x = { , } and M x = { , } from x , x and x , respectively. Then, F requires d n = 2 = 16 operations to generate the mes-sage R F → x ( x ) since its domain size d = 2 and arity n = 4 . Obviously, the complexity of this step grows ex-ponentially as d and n scale up. Therefore, this is a hugechallenge for scalability of belief propagation algorithms.As mentioned earlier, some efforts have been made to op-timize this maximization operation. Nevertheless, the im-proved algorithms based on a branch-and-bound techniqueincluding BnB-MS and BnB-FMS require a number of mes-sages to be passed in the preprocessing phase. And, thesealgorithms were proposed for the exact application, whichmakes it difficult to directly solve general DCOPs. Be-sides, the algorithms based on sorting, such as G-FBP andGDP, suffer from some drawbacks although they are generic.Specifically, G-FBP cannot guarantee that the maximumvalue can be found in the selected range, which can leadto a complete traverse to all the possible combinations inthe worst case, while GDP requires sorting for each valuein the domain of each variable in the preprocessing phase,which makes its use prohibitively expensive. Additionally,GDP cannot use the learned knowledge from the combi-nations explored to dynamically prune the search space. Inother words, GDP is actually an one-shot pruning method.Taking Fig. 2 for example, according to GDP, the local util-ities of F are sorted independently by each value of the do-main. When computing the pruned range of value G , thereare V G = { , , , , , , , } , p = max( V G ) = 13 and the base case t = (20+17+10) − (9+17+8) = 13 . Hence, p − t = 0 . Accordingly, GDP returns a fixed pruned range [13 , for value G . Obviously, the pruned range contains theentire search space of value G , and cannot be reduced in thesubsequent search process.Under such circumstances, we propose a generic, fast andeasy-to-use approach based on a branch-and-bound tech-nique, called FDSP that can use the learned experience fromthe combinations explored to dynamically prune the searchspace. FDSP
FDSP generally consists of two components: estimation toprovide upper bounds and pruning to reduce the searchspace. To provide the optimal upper bound for a partialassignment, the estimation must return the upper boundsfor both the local function and the incoming messages.FDSP computes the function estimations in the preprocess-ing phase, called Function Decomposing (FD), while themessage estimations are (re)constructed once the messageshave changed. Pruning is implemented by a procedure calledState Pruning (SP) which is based on a branch and boundtechnique. That is, the algorithm does not expand any partialassignment whose upper bound is less than the known lowerbound. FDSP can be easily applied to any belief propagationbased incomplete algorithms to deal with DCOPs with n -aryconstraints. Function Decomposing serves in a preprocessing phaseto compute the function estimation for each variable of afunction-node F k ( x k ) . Given a partial assignment P A | x k , i x k , to variables { x k , w ∈ x k | ≤ w ≤ i } , the upper bound of thelocal function is maximization of F k ( x k ) over the remainingunassigned variables. That is F unEst x k , i ( P A | x k , i x k , ) = max z = { x k , j | j>i } F k ( P A | x k , i x k , , z ) (4)Here, F unEst x k , i ( · ) is the uninformed function estimationfor the i -th variable x k , i in x k , which provides optimistic up-per bounds on the utilities of the subsequent search spaces of P A | x k , i x k , . Macarthur et al. (2011) tried to compute the esti-mation by using brute force, which incurs exponential oper-ations for each partial assignment. They also suggested thatthe domain characteristics can be used to compute the esti-mation, which has a limited generalization and cannot guar-antee the tightness. In contrast, our proposed FD is an one-shot preprocessing procedure that uses dynamic program-ming to compute the estimation, which can significantly re-duce the computation efforts. Specifically, the estimationsare computed recursively according to Eq. (5). F unEst x k , i = (cid:40) F k ( x k ) i = n max x k , i + F unEst x k , i +1 otherwise (5)That is, the estimation for a variable is maximization of theone for the next variable. Particularly, the estimation for thelast variable is the function itself. Note that, compared to theexponential overhead for each partial assignment in BnB-MS, our proposed FD only requires O ( d i +1 ) operations to lgorithm 1: Function Decomposing (FD) for eachfunction-node F k ( x k ) Function
Decompose( F k ( x k ) ) n ← | x k | for i ← n to do compute F unEst x k , i by Eq. (5) for j ← n to do foreach v k,j ∈ D x k , j ∧ i < j do compute F unEst x k , j = v k,j x k , i by Eq. (6) Figure 3: The sketch of function decomposingcompute the function estimation for each variable x k , i in thepreprocessing phase.In fact, the uninformed function estimation F unEst x k , i for x k , i could provide a tighter upper bound if we know theassignment of a variable x k , j such that j > i . In this way,we can compute a tight upper bound even if there are manyunassigned variables between the last assigned variable andthe target variable in a partial assignment. By consideringall the possible assignments of each variable ordered after x k , i , we further reinforce the upper bound and propose the informed function estimations. Eq. (6) gives the formal def-inition to the informed function estimation for x k , i in termsof x k , j where j > i . F unEst x k , j = v k,j x k , i = (cid:40) F unEst x k , j ( v k,j ) i = j − x k , i + F unEst x k , j = v k,j x k , i +1 otherwise (6)Similar to the uninformed function estimations, the in-formed function estimations are computed in a recursivefashion by maximizing the estimation for the next variable.And the estimation for the last variable before x k , j is thecorresponding uninformed function estimation with a givenassignment x k , j = v k,j .Fig. 3 gives the sketch of FD. The procedure begins withcomputing the uninformed function estimation for each vari-able in F k according to Eq. (5) from the last one to the firstone (line 1-3). Then, for every possible assignment of eachvariable, we compute the informed function estimation foreach variable whose index is smaller than the current vari-able according to Eq. (6) (line 4-6).Taking Fig. 2 for example, we can compute the unin-formed function estimations for variable x , x , x and x as follows: F unEst x = F , F unEst x = max x F unEst x F unEst x = max x F unEst x F unEst x = max x F unEst x These estimations can provide the upper bounds for the par-tial assignments with respect to their variables. Besides, theinformed function estimations in terms of x = R are com-puted as follows: F unEst x = Rx = F unEst x ( x = R ) F unEst x = Rx = max x F unEst x = Rx F unEst x = Rx = max x F unEst x = Rx State Pruning is geared towards speeding up the com-putation of the messages from function-nodes to variable-nodes by branch and bound. That is, when the upper bound
Algorithm 2:
State Pruning(SP) for each function-node F k ( x k ) Function
FDSP( F k ( x k ) , x k , t , M x k \ x k , t ) n ← | x k | Result ← ∅ lb ← −∞ msgChanged ← false for i ← n to do if M x k , i has changed then msgChanged ← true if msgChanged = true then ∀ x k , i ∈ x k \ x k , t , compute MsgEst x k , i byEquation start ← if x k , start = x k , t then start ← foreach v k,t ∈ D x k , t do Assign x k , t ← v k,t util ∗ ← FDSPRec( F k ( x k ) , x k , start , x k , t , M x k \ x k , t , ) Result ← Result ∪ { ( v k,t , util ∗ ) } return Result
Function
FDSPRec( F k ( x k ) , x k , i , x k , t , M x k \ x k , t , msgUtil ) next ← i + 1 if x k , next = x k , t then next ← next + 1 util ∗ ← −∞ foreach v k,i ∈ D x k , i do Assign x k , i ← v k,i msgUtil x k , i = msgUtil + M x k , i ( v k,i ) compute ub x k , i by Eq. (8) if ub x k , i > lb then if x k , next (cid:54) = null then retUtil ← FDSPRec( F k ( x k ) , x k , next , x k , t , M x k \ x k , t , msgUtil x k , i ) util ∗ ← max( util ∗ , retUtil ) else retUtil ← F k ( Assign ) + msgUtil x k , i util ∗ ← max( util ∗ , retUtil ) lb ← max( lb, util ∗ ) return util ∗ Figure 4: The sketch of state pruningof a partial assignment is no greater than the lower bound,the search space corresponding to the partial assignment willbe discarded. Fig. 4 gives the pseudo code of SP.The algorithm begins with calculating the message esti-mation for each variable x k , i ∈ x k \ x k , t , which gives themaximal message utility with respect to all non-target vari-ables after it given these variables unassigned, according toEq. (7) (line 7-15). M sgEst x k , i = (cid:88) j>i ∧ j (cid:54) = t max( M x k , j ) (7)Here, x k , t is the target variable and M sgEst x k , i denotesthe upper bounds on the received messages from the vari-ables after x k , i except x k , t . In order to reduce the unneces-sary computation, F k recomputes the message estimationsfor each variable ordered before x k , i only when the mes-sage from x k , i changes. Besides, instead of directly comput-ing message estimations according to Eq. (7), F k further re-duces the computation efforts by recursively backing up themaximal message utilities from the last non-target variableo the first one. That is, the message estimation of a variableis the sum of the maximal message utility and the messageestimation of the non-target variable next to it. Consider thefunction-node F in Fig. 2(b). When we are computing themessage for x , the message estimations for x , x and x are computed as follows: M sgEst x = 0 M sgEst x = M sgEst x + max M x = 10 M sgEst x = M sgEst x + max M x = 10 + 17 = 27 Then, F k computes the maximum utility util ∗ of each as-signment of the target variable x k , t in D x k , t (line 16-22).Specifically, F k assigns assignment v k,t to x k , t accordingto the order of values in D x k , t (line 19). Thus, the currentpartial joint state Assign = {∅ , . . . , v k,t , . . . , ∅} , where ∅ represents an unassigned variable (line 20). After that, FD-SPRec is called for the variable x k , start which is the firstunassigned variable to recursively expand the search space(line 21). Finally, F k stores util ∗ to Result when util ∗ forthe current assignment v k,t is returned (line 22). The proce-dure (line 20 - 22) repeats until all the assignments of x k , t have been visited.In FDSPRec , F k first finds x k , next that is the unassignedvariable next to x k , i (line 24-26). Note that x k , t is an as-signed variable. Then, F k decides to expand the search spaceor update the maximum utility and the lower bound (line 28-39). In more detail, F k expands the search space by append-ing the assignment v k,i of x k , i to the partial joint state (line29). And then, F k computes the utilities contributed (i.e., msgU til x k , i ) by the incoming messages with respect to thecurrent Assign by summing the accumulated msgU til withthe entry in terms of M x k , i and assignment v k,i (line 30).Then, F k computes the current upper bound ub x k , i accord-ing to Eq. (8) (line 31). ub x k , i = (cid:26) msgUtil x k , i + MsgEst x k , i + F unEst x k , i (cid:0) Assign | x k , i x k , (cid:1) i > tmsgUtil x k , i + MsgEst x k , i + F unEst x k , t = v k,t x k , i (cid:0) Assign | x k , i x k , (cid:1) i < t (8)Specifically, if i > t (i.e., x k , i is after x k , t ), which means theassignments of all variables before x k , t have been given, thecurrent upper bound of the local function is provided by theuninformed function estimation of variable x k , i . Otherwise,the upper bound is computed by the informed function esti-mation. In other words, the informed function estimation isused to compute a tight upper bound whenever it is applica-ble.Next, F k decides whether to expand the search space ac-cording to the lower bound (line 32-39). If the upper bound ub x k , i is greater than the current lower bound lb and x k , i is not the last variable, the algorithm proceeds by callingthe recursive function FDSPRec to expand the search space(line 33-35). Otherwise, the search space corresponding to
Assign can be discarded. If x k , i is the last non-target vari-able, i.e., the search space has been fully expanded, F k com-putes the current utility retU til of the complete assignmentby adding the local utility F k ( Assign ) and msgU til x k , i .Then, F k updates the maximum utility util ∗ and the lowerbound lb (line 36-39). Finally, when all the assignments of x k , i have been visited, the algorithm returns util ∗ (line 40). {∅ , ∅ , ∅ , R }{ R, ∅ , ∅ , R }{ R, R, ∅ , R }{ R, R, R, R } {
R, R, G, R } [ −∞ ,
62] 1 [62 , −∞ ,
62] 2 [62 , −∞ ,
38] 3 [38 , Figure 5: Calculating R F → x ( x = R ) using FDSPFig. 5 shows an example for calculating the message fromfunction F to variable x when x = R (i.e., R F → x ( x = R ) ) in Fig. 2, where the numbers with circles represent thetrace of SP. Since x is fixed to assignment R , F needs tocompute the maximum utility util ∗ by extending the partialassignment Assign = {∅ , ∅ , ∅ , R } . Firstly, F visits the firstassignment R of x and computes ub x = 9 + 27 + 26 = 62 by Eq. (8). Then, it expands Assign = { R, ∅ , ∅ , R } by vis-iting the first assignment R of x since ub x > lb (= −∞ ) .Similarly, F expands Assign = { R, R, ∅ , R } . At thispoint, since ub x > lb and Assign is fully assigned, F computes the utility retU til of Assign = { R, R, R, R } : retU til = F ( Assign ) + msgU til x = 4 + (9 + 17 + 8) =38 , and updates util ∗ = 38 and lb = 38 . Next, F vis-its the next assignment G of x . Similarly, F computesthe current upper bound ub x = 62 and the current utility retU til = 62 corresponding to Assign = { R, R, G, R } ,and updates lb and util ∗ . After that, F visits the secondassignment G of x since all assignments of x have beenexhausted. And, F computes ub x = 32 which is lessthan lb , so Assign = { R, G, ∅ , R } is discarded. Similarly, Assign = { G, ∅ , ∅ , R } is also discarded. Finally, F findsthe maximum utility util ∗ = 62 , i.e., R F → x ( x = R ) =62 .As seen from the example, FDSP can prune at least 75%of the search space during computing the message from thefunction-node F to variable-node x , where d = 2 and n =4 . Theoretical Analysis
In this section, we will theoretically prove that FDSP canspeed up belief propagation based incomplete algorithmswithout an effect on solution quality, i.e., FDSP can pro-vide monotonically non-increasing upper bounds and neverprunes the optimal assignment with the maximum utility util ∗ . Lemma 1.
For a function-node F k ( x k ) and a given partialassignment P A with ( x k,t = v k,t ) in which ( x k,i = v k,i ) is the last non-target entry, the upper bound of any directsubsequent partial assignment P A (cid:48) = P A ∪ ( x k,j = v k,j ) isat least as low as the one of P A , where x k,j is the variablenext to x k,i such that j (cid:54) = t .Proof. Recall that the upper bound of a given partial as-signment is computed according to either the uninformedfunction estimation or the informed function estimation, de-pending on the index of the target variable. Thus, three caseseed to be discussed: 1) all the upper bounds are computedaccording to the uninformed function estimations; 2) the up-per bound of
P A is computed according to the uninformedfunction estimation, while the one of
P A (cid:48) is computed ac-cording to the informed function estimation; 3) all the up-per bounds are computed according to the informed func-tion estimations. Here, we only give the prove for case 2)(i.e., i + 1 = t , t + 1 = j ) due to the limited space. Similaranalysis can be applied to case 1) and 3). ub xk,i ( P A ) =
F unEst xk,t = vk,t xk,i ( P A ) + msgUtil xk,i + MsgEst xk,i = F unEst xk,t ( P A ) + (cid:88) l ≤ i M xk,l ( v k,l ) + (cid:88) l>i ∧ l (cid:54) = t max( M xk,l ) ≥ F unEst xk,j ( P A (cid:48) ) + (cid:88) l ≤ i M xk,l ( v k,l ) + (cid:88) l>i ∧ l (cid:54) = t max( M xk,l ) ≥ F unEst xk,j ( P A (cid:48) ) + (cid:88) l ≤ j ∧ l (cid:54) = t M xk,l ( v k,l ) + (cid:88) l>j max( M xk,l )= ub xk,j ( P A (cid:48) ) Here, the second step holds since i = t − . Thus,according to Eq. (6) we have F unEst x k,t = v k,t x k,i ( P A ) =
F unEst x k,t ( P A ) . Besides, the third step andthe fourth step hold since F unEst x k,t ( P A ) =max x k,j F unEst x k,j ( P A, x k , j ) ≥ F unEst x k,j ( P A (cid:48) ) (Eq.(5)) and max( M x k,j ) ≥ M x k,j ( v k,j ) , respectively.Thus the lemma is proved. Theorem 1.
FDSP does not affect the optimality of Eq. (2).Proof.
Prove by contradiction. For a function-node F k ( x k ) ,assume that the optimal assignment of Eq. (2) is Assign ∗ ,and the corresponding utility value is val ( Assign ∗ ) . As-sume that FDSP has missed that assignment. Thus, theremust exist a partial assignment P A ⊂ Assign ∗ such that ub ( P A ) < lb ≤ val ( Assign ∗ ) . According to Lemma1, the upper bound is monotonically non-increasing, i.e., ub ( P A ) ≥ ub ( Assign ∗ ) . Note that ub xk,n ( Assign ∗ ) = F unEst xk,n ( Assign ∗ ) + msgUtil xk,n + MsgEst xk,n = F k ( Assign ∗ ) + (cid:88) l ≤ n ∧ l (cid:54) = t M xk,l ( vk,l ) = val ( Assign ∗ ) Here, n = | x k | . Thus, we have ub ( P A ) ≥ ub ( Assign ∗ ) = val ( Assign ∗ ) , which is contradict to the assumption. There-fore, the upper bound of a partial assignment cannot be lessthan the value of any subsequent full assignment and the op-timality is hereby guaranteed. Complexity Analysis
Each variable x k , i needs to compute and stores an unin-formed function estimation and d ( n − i ) informed functionestimations in the preprocessing phase. Thus, the time andspace of each variable require O ([1 + d ( n − i )] d i +1 ) and O ([1 + d ( n − i )] d i ) , respectively, where the value of i be-comes smaller as FD performs. Thus, FDSP in the prepro-cessing phase needs a small overhead.Besides, since each function-node needs to explore thesearch space with respect to the target variable, the timecomplexity in the worst case is O ( d n ) . However, with SP, only the small search space needs to be explored. Therefore,the overall overhead is small. For this point, our empiricalevaluation also verifies that FDSP only requires little time torun. Empirical Evaluation
We empirically evaluate the performances of FDSP andGDP which are both applied to Max-Sum on four config-urations of n -ary random DCOPs. Since BnB-MS and BnB-FMS are not generic algorithms and G-FBP is inferior toGDP (Khan et al. 2018), we do not include them for com-parison. The complexity of a n -ary DCOP can be quantifiedby the number of function-nodes, the average/maximal ar-ity and the domain size (Kim and Lesser 2013; Khan et al.2018). In addition to these parameters, we also find the num-ber of variable-nodes can affect the complexity. Intuitively,given the number of function-nodes and the average arityper function-node, the graph density is actually determinedby the number of function-nodes. Therefore, we introduce anew parameter called variable tightness (denoted as var T )to depict the complexity from another perspective, which isdefined as follows.var T = 1 − number of variable-nodestotal number of aritiesIt can be concluded that given the function-node number andtotal arity number, the number of variable-nodes decreasesas var T increases, which will generate a denser and morecomplex problem since each variable-node has to connectmore function-nodes.For each configuration other than the first one, we gener-ate sparse factor graphs and dense factor graphs by randomlyselecting var T from [0.1, 0.5] and (0.5, 0.9], respectively. Inthe first configuration, we set the number of function-nodesto 100 and the minimal arity to 2, and uniformly select thecosts, the domain size and the maximal arity from [1, 100],[2, 10] and [2, 7], respectively. And, var T varies from 0.1to 0.9. In the second one, we vary the maximal arity from2 to 7. In the third configuration, we vary the number offunction-nodes from 10 to 100. In the last one, we set thenumber of function-nodes to 50 and vary the domain sizefrom 2 to 7. Also, we benchmark Max-Sum ADVP+FDSPto demonstrate the generalization of FDSP. To guaranteeMax-Sum ADVP to converge, we alternate its directions ev-ery 100 iterations. All the omitted parameters except var Tin each configuration are the same as the ones in the firstconfiguration. For each of the setting, we generate 25 ran-dom instances and the results are averaged over all instances.The algorithms terminate after 200 iterations for each in-stance.Fig. 6 gives the comparison under different var T. It canbe observed that FDSP clearly outperforms GDP under dif-ferent var T, and the gap is widen as var T grows. Con-cretely, FDSP can prune at least 97% of the search spacewhile GDP only prunes at most 87% of the search spacewhen computing Eq. (2). Moreover, FDSP performs simi-larly as var T grows, which indicates FDSP is less sensitiveto the complexity of problems. On the other hand, the perfor-mance of GDP decreases as var T increases, and GDP per- .1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9var_T0102030405060708090100 % o f S e a r c h Sp a c e P r un e d FDSPGDP
Figure 6: Performance comparison on different var T % o f S e a r c h Sp a c e P r un e d FDSP-Sparse SettingsGDP-Sparse SettingsFDSP-Dense SettingsGDP-Dense Settings
Figure 7: Performance comparison on different aritiesforms poorly when solving the problems with var T > = 0 . .This is because the sum of the difference between the maxi-mal value and the value corresponding to the maximal localutility in each message will increase when the graph densityincreases as var T grows. As a result, GDP provides a largepruned range so as to prune only a small proportion of thesearch space.Fig. 7 shows the performance comparison on differentmaximal arities. It can be concluded that FDSP outperformsGDP in both sparse and dense factor graphs, especially indense factor graphs. FDSP prunes around 60%-99% of thesearch space in both sparse and dense factor graphs, whileGDP can only prune at most 80% and 36%, respectively.That is because FDSP provides tighter bounds to make Max-Sum explore fewer combinations.Fig. 8 presents the results under different number offunction-nodes. Similar to the first configuration, FDSPprunes at least 97% of the search space in both sparse anddense factor graphs, while GDP can only prune at most 88%and 55% of the search space in sparse and dense factorgraphs, respectively. This is because GDP is an one-shotpruning procedure and cannot use the learned experiencefrom the assignment combinations explored to dynamicallyprune the search space.Fig. 9 gives the runtime under different domain sizes. Itcan be seen that our FDSP exhibits great superiority overGDP and Max-Sum ADVP when solving the problems withlarge domain sizes, which indicates that FDSP can scale upwell and only requires few computation efforts. GDP would
10 20 30 40 50 60 70 80 90 100Function-node Number0102030405060708090100 % o f S e a r c h Sp a c e P r un e d FDSP-Sparse SettingsGDP-Sparse SettingsFDSP-Dense SettingsGDP-Dense Settings
Figure 8: Performance comparison on different function-node numbers R un T i m e ( s ) FDSP-Sparse SettingsGDP-Sparse SettingsMax-Sum_ADVP-Sparse SettingsMax-Sum_ADVP+FDSP-Sparse SettingsFDSP-Dense SettingsGDP-Dense SettingsMax-Sum_ADVP-Dense SettingsMax-Sum_ADVP+FDSP-Dense Settings
Figure 9: Runtime on different domain sizesperform even worse in practice since the runtime presentedin Fig. 9 actually does not take sorting, which is quite expen-sive when the domain size is large, into consideration. Be-sides, one can easily observe that Max-Sum ADVP+FDSPis superior to Max-Sum ADVP when solving the problemswith large domain sizes in sparse and dense factor graphs,which indicates that FDSP can also effectively accelerate thevariants of Max-Sum.
Conclusion
In this paper, we propose FDSP, a generic, fast and easy-to-use method based branch and bound, which significantly ac-celerates belief propagation based incomplete DCOP algo-rithms. Specifically, we first propose function decomposing(FD) to effectively compute the function estimation, whichdramatically reduces the overheads in computing an upperbound of a partial assignment. Then, we further presentstate pruning (SP) based on branch and bound to reducethe search space. Besides, we theoretically prove that ourbounds are monotonically non-increasing during the searchprocess and FDSP never prunes the assignment with themaximum utility. Our experimental results clearly show thatFDSP can prune around 97%-99% of the search space andonly requires little time, especially for the large and complexproblems. cknowledgment
This research is funded by Chongqing Research Pro-gram of Basic Research and Frontier Technology (No.cstc2017jcyjAX0030), Fundamental Research Funds for theCentral Universities (No. 2018CDXYJSJ0026) and Grad-uate Research and Innovation Foundation of Chongqing,China (Grant No. CYS18047).
References [Arshad and Silaghi 2004] Arshad, M., and Silaghi, M. C.2004. Distributed simulated annealing.
Distributed Con-straint Problem Solving and Reasoning in Multi-Agent Sys-tems
Computer Journal
Proc. of the 16thConference on AAMAS , 195–202.[Enembreck and Barths 2012] Enembreck, F., and Barths, J.P. A. 2012. Distributed constraint optimization with mulbs:A case study on collaborative meeting scheduling.
Journalof Network & Computer Applications
Proc. of the 7th Conference on AAMAS , 639–646.[Farinelli, Rogers, and Jennings 2014] Farinelli, A.; Rogers,A.; and Jennings, N. R. 2014. Agent-based decentralisedcoordination for sensor networks using the max-sum al-gorithm.
Autonomous Agents and Multi-Agent Systems
Proc. of the 16th Conference on AAMAS ,999–1007.[Gershman, Meisels, and Zivan 2009] Gershman, A.;Meisels, A.; and Zivan, R. 2009. Asynchronous for-ward bounding for distributed cops.
Journal of ArtificialIntelligence Research
International Conference on Principles and Practice ofConstraint Programming , 222–236. Springer.[Khan et al. 2018] Khan, M.; Tran-Thanh, L.; Jennings, N.;et al. 2018. A generic domain pruning technique for gdl-based dcop algorithms in cooperative multi-agent systems.In
Proc. of the 17th Conference on AAMAS , 1595–1603.[Kim and Lesser 2013] Kim, Y., and Lesser, V. 2013. Im-proved max-sum algorithm for dcop with n-ary constraints.In
International Conference on Autonomous Agents andMulti-Agent Systems , 191–198. [Macarthur et al. 2011] Macarthur, K. S.; Stranders, R.;Ramchurn, S. D.; and Jennings, N. R. 2011. A distributedanytime algorithm for dynamic task allocation in multi-agent systems. In
AAAI Conference on Artificial Intelli-gence , 701–706.[Maheswaran, Pearce, and Tambe 2004] Maheswaran, R. T.;Pearce, J. P.; and Tambe, M. 2004. Distributed algorithmsfor dcop: A graphical-game-based approach. In
Proceedingsof the International Conference on Parallel and DistributedComputing Systems (PDCS) , 432–439.[Modi et al. 2005] Modi, P. J.; Shen, W.-M.; Tambe, M.; andYokoo, M. 2005. Adopt: Asynchronous distributed con-straint optimization with quality guarantees.
Artificial Intel-ligence
Proc. of the 12th Con-ference on AAMAS , 167–174.[Okamoto, Zivan, and Nahon 2016] Okamoto, S.; Zivan, R.;and Nahon, A. 2016. Distributed breakout: beyond satisfac-tion. In
Proc. of the 25th IJCAI , 447–453.[Ottens, Dimitrakakis, and Faltings 2012] Ottens, B.; Dimi-trakakis, C.; and Faltings, B. 2012. Duct: An upper confi-dence bound approach to distributed constraint optimizationproblems. In
Twenty-Sixth AAAI Conference on ArtificialIntelligence , 528–534.[Petcu and Faltings 2005] Petcu, A., and Faltings, B. 2005.A scalable method for multiagent constraint optimization. In
Proc. of the 19th IJCAI , 266–271.[Rogers et al. 2011] Rogers, A.; Farinelli, A.; Stranders, R.;and Jennings, N. R. 2011. Bounded approximate decen-tralised coordination via the max-sum algorithm.
ArtificialIntelligence
IJCAI2009, Proceedings of the International Joint Conference onArtificial Intelligence, Pasadena, California, Usa, July , 299–304.[Vinyals, Rodriguez-Aguilar, and Cerquides 2009] Vinyals,M.; Rodriguez-Aguilar, J. A.; and Cerquides, J. 2009. Gen-eralizing dpop: Action-gdl, a new complete algorithm fordcops. In
Proceedings of The 8th International Conferenceon Autonomous Agents and Multiagent Systems-Volume2 , 1239–1240. International Foundation for AutonomousAgents and Multiagent Systems.[Yeoh, Felner, and Koenig 2008] Yeoh, W.; Felner, A.; andKoenig, S. 2008. Bnb-adopt: an asynchronous branch-and-bound dcop algorithm. In
International Joint Conference onAutonomous Agents and Multiagent Systems , 591–598.[Zhang et al. 2005] Zhang, W.; Wang, G.; Xing, Z.; and Wit-tenburg, L. 2005. Distributed stochastic search and dis-tributed breakout: properties, comparison and applicationsto constraint optimization problems in sensor networks.
Ar-tificial Intelligence