On the Practical use of Variable Elimination in Constraint Optimization Problems: 'Still-life' as a Case Study
X c A B C D E
123 4 23 1 12 32 1
Figure 1: A: A 3 (cid:2) 3 still-life. B: constraint graph of a simple WCSP instance with fourvariables and three cost functions. C: the constraint graph after assigning variablex4. D: the constraint graph after clustering variables x3 and x4. E: the constraintgraph after eliminating variable x4.in the next iteration, (2) if a cell has exactly three living neighbors then it is alive in thenext iteration and (3) if a cell has fewer than two or more than three living neighbors,then it is dead in the next iteration. Although de(cid:12)ned in terms of extremely simple rules,the game of life has proven mathematically rich and it has attracted the interest of bothmathematicians and computer scientists.The still-life problem SL(n) consist on (cid:12)nding a n(cid:2)n stable pattern of maximum densityin the game of life. All cells outside the pattern are assumed to be dead. Considering therules of the game, it is easy to see that each cell (i; j) must satisfy the following threeconditions: (1) if the cell is alive, it must have exactly two or three living neighbors, (2) ifthe cell is dead, it must not have three living neighbors, and (3) if the cell is at the gridboundary (i.e, i = 1 or i = n or j = 1 or j = n), it cannot be part of a sequence of threeconsecutive living cells along the boundary. The last condition is needed because threeconsecutive living cells at a boundary would produce living cells outside the grid.Example 1 Figure 1.A shows a solution to SL(3). It is easy to verify that all its cellssatisfy the previous conditions, hence it is stable. The pattern is optimal because it has 6living cells and no 3 (cid:2) 3 stable pattern with more that 6 living cells exists.2.2 Weighted CSPA weighted constraint satisfaction problem (WCSP) (Bistarelli, Montanari, & Rossi, 1997)is de(cid:12)ned by a tuple (X; D; F ), where X = fx1; : : : ; xng is a set of variables taking valuesfrom their (cid:12)nite domains Di 2 D. F is a set of weighted constraints (i.e., cost functions).Each f 2 F is de(cid:12)ned over a subset of variables, var(f ), called its scope. The objectivefunction is the sum of all functions in F ,F = Xf2F fand the goal is to (cid:12)nd the instantiation of variables that minimizes the objective function.Example 2 Consider a WCSP with four variables X = fxig4i=1 with domains Di = f0; 1gand three cost functions: f1(x1; x4) = x1 + x4, f2(x2; x3) = x2x3 and f3(x2; x4) = x2 + x4.423arrosa, Morancho & NisoThe objective function is F (x1; x2; x3; x4) = x1 + x4 + x2x3 + x2 + x4. Clearly, the optimalcost is 0, which is obtained with every variable taking value 0.Constraints can be given explicitly by means of tables, or implicitly as mathematicalexpressions or computing procedures. Infeasible partial assignments are speci(cid:12)ed by con-straints that assign cost 1 to them. The assignment of value a to variable xi is notedxi = a. A partial assignment is a tuple t = (xi1 = v1; xi2 = v2; (cid:1) (cid:1) (cid:1) ; xij = vj). The extensionof t to xi = a is noted t (cid:1) (xi = a). WCSPs instances are graphically depicted by meansof their interaction or constraint graph, which has one node per variable and one edge con-necting any two nodes that appear in the same scope of some cost function. For instance,Figure 1.B shows the constraint graph of the problem in the previous example.2.3 Overview of Some Solving TechniquesIn this Subsection we review some solving techniques widely used when reasoning withconstraints.2.3.1 searchWCSPs are typically solved with depth-(cid:12)rst search. Search algorithms can be de(cid:12)ned interms of instantiating functions,De(cid:12)nition 1 Let P = (X; D; F ) a WCSP instance, f a function in F , xi a variable invar(f ), and v a value in Di. Instantiating f with xi = v is a new function with scopevar(f ) (cid:0) fxig which returns for each tuple t, f (t (cid:1) (xi = v)). Instantiating P with xi = v isa new problem P jxi=v= (X (cid:0) fxig; D (cid:0) fDig; F 0), where F 0 is obtained by instantiating allthe functions in F that mention xi with xi = v.For instance, instantiating the problem of Example 2 with x4 = 1, produces a newproblem with three variables fxig3i=1 and three cost functions: f1(x1; x4 = 1) = x1 + 1,f2(x2; x3) = x2x3 and f3(x2; x4 = 1) = x2 + 1. Figure 1.C shows the correspondingconstraint graph, obtained from the original graph by removing the instantiated variable x4and all adjacent edges. Observe that the new graph depends on the instantiated variable,but does not depend on the value assigned to it.Search algorithms transform the current problem P into a set of subproblems. Usuallyit is done by selecting one variable xi which is instantiated with its di(cid:11)erent domain values(P jxi=v1 ; P jxi=v2 ; (cid:1) (cid:1) (cid:1) ; P jxi=vd ). This transformation is called branching. In each subprob-lem the same process is recursively applied, which de(cid:12)nes a tree of subproblems. Searchalgorithms expand subproblems until a trivial case is achieved: there is no variable left, ora pruning condition is detected. In optimization problems, pruning conditions are usuallyde(cid:12)ned in terms of lower and upper bounds. Search keeps the cost of the best solution sofar, which is an upper bound of the optimal cost. At each node, a lower bound of the bestcost obtainable underneath is computed. If the lower bound is greater than or equal to theupper bound, it is safe to backtrack.The size of the search tree is O(dn) (being d the size of the largest domain) which boundsthe time complexity. If the tree is traversed depth-(cid:12)rst, the space complexity is polynomial.424n the practical use of variable elimination2.3.2 clusteringA well-known technique for constraint processing is clustering (Dechter & Pearl, 1989). Itmerges several variables into one meta-variable, while preserving the problem semantics.Clustering variables xi and xj produces meta-variable xk, whose domain is Di (cid:2) Dj. Costfunctions must be accordingly clustered. For instance, in the problem of Example 2, cluster-ing variables x3 and x4 produces variable xc with domain Dc = f(0; 0); (0; 1); (1; 0); (1; 1)g.Cost functions f2 and f3 are clustered into fc(x2; xc) = f2 + f3. With the new variablenotation fc = x2xc[1] + x2 + xc[2], where xc[i] denotes the i-th component of xc. Functionf1 needs to be reformulated as f1(x1; xc) = x1 + xc[2]. The constraint graph of the resultingproblem is obtained by merging the clustered variables and connecting the meta-node withall nodes that were adjacent to some of the clustered variables. Figure 1.D shows the con-straint graph after the clustering of x3 and x4. The typical use of clustering is to transforma cyclic constraint graph into an acyclic one, which can be solved e(cid:14)ciently thereafter.2.3.3 variable eliminationVariable elimination is based on the following two operations,De(cid:12)nition 2 The sum of two functions f and g, noted (f + g), is a new function withscope var(f ) [ var(g) which returns for each tuple the sum of costs of f and g,(f + g)(t) = f (t) + g(t)De(cid:12)nition 3 The elimination of variable xi from f , noted f + xi, is a new function withscope var(f ) (cid:0) fxig which returns for each tuple t the cost of the best extension of t to xi,(f + xi)(t) = mina2Diff (t (cid:1) (xi = a))gObserve that when f is a unary function (i.e., arity one), eliminating the only variablein its scope produces a constant.De(cid:12)nition 4 Let P = (X; D; F ) be a WCSP instance. Let xi 2 X be an arbitrary variableand let Bi be the set of all cost functions having xi in their scope (Bi is called the bucket ofxi). We de(cid:12)ne gi as gi = ( Xf2Bi f ) + xiThe elimination of xi transforms P into a new problem P +xi= fX (cid:0) fxig; D (cid:0) fDig; (F (cid:0)Bi) [ fgigg. In words, P +xi is obtained by replacing xi and all the functions in its bucketby gi.P and P +xi have the same optimal cost because, by construction, gi compensates theabsence of xi. The constraint graph of P +xi is obtained by forming a clique with all thenodes adjacent to node xi and then removing xi and all its adjacent edges. For example,eliminating x4 in the problem of Example 2 produces a new problem with three variablesfxig3i=1 and two cost functions: f2 and g4. The scope of g4 is fx1; x2g and it is de(cid:12)ned as,425arrosa, Morancho & Nisog4 = (f1 + f3) + x4 = (x1 + x4 + x2 + x4) + x4 = x1 + x2. Figure 1.D shows the constraintgraph after the elimination.In the previous example, the new function g4 could be expressed as a mathematical ex-pression. Unfortunately, in general, the result of summing functions or eliminating variablescannot be expressed intensionally, and new cost functions must be stored extensionally intables. Consequently, the space complexity of computing P +xi is proportional to the num-ber of entries of gi, which is: (cid:2)(Qxj2var(gi) jDjj). Since xj 2 var(gi) i(cid:11) xj is adjacent to xiin the constraint graph, the previous expression can be rewritten as (cid:2)(Qxj2N(i;GP ) jDjj),where GP is the constraint graph of P and N (i; GP ) is the set of neighbors of xi in GP .The time complexity of computing P +xi is its space complexity multiplied by the cost ofcomputing each entry of gi.Bucket elimination (BE) works in two phases. In the (cid:12)rst phase, it eliminates variablesone at a time in reverse order. In the elimination of xi, the new gi function is computedand added to the corresponding bucket. The elimination of x1 produces an empty-scopefunction (i.e., a constant) which is the optimal cost of the problem. In the second phase, BEconsiders variables in increasing order and generates the optimal assignment of variables.The time and space complexity of BE is exponential on a structural parameter from theconstraint graph, called induced width, which captures the maximum arity among all thegi functions. Without any additional overhead BE can also compute the number of optimalsolutions (see Dechter, 1999, for details).2.3.4 super-bucketsIn some cases, it may be convenient to eliminate a set of variables simultaneously (Dechter& Fatah, 2001). The elimination of the set of variables Y is performed by collecting in BYthe set of functions mentioning at least one variable of Y . Variables in Y and functions inBY are replaced by a new function gY de(cid:12)ned as,gY = ( Xf2BY f ) + YThe set BY is called a super-bucket. Note that the elimination of Y can be seen as theclustering of its variables into a meta-variable xY followed by its elimination.2.3.5 mini-bucketsWhen the space complexity of BE is too high, an approximation, called mini buckets(Dechter & Rish, 2003), can be used. Consider the elimination of xi, with its associatedbucket Bi = ffi1; : : : ; fik g. BE would compute,gi = ( Xf2Bi f ) + xiThe time and space complexity of this computation depends on the arity of gi. If it is beyondour available resources, we can partition bucket Bi into so-called mini-buckets Bi1 ; : : : ; Bikwhere the number of variables in the scopes of each mini-bucket is bounded by a parameter.Then we can compute, gij = ( Xf2Bij f ) + xi; j = 1::k426n the practical use of variable elimination
51 928 3 746 51 928 3 46
51 928 3 4 51 928 4
BA C D
Figure 2: A constraint graph and its evolution over a sequence of variable eliminations andinstantiations.where each gij has a bounded arity. Since,( giz }| {Xf2Bi f ) + xi (cid:21) kXj=1 gijz }| {( Xf2Bij f ) + xithe elimination of variables using mini-buckets yields a lower bound of the actual optimalcost.2.3.6 combining search and variable eliminationWhen plain BE is too costly in space, we can combine it with search (Larrosa & Dechter,2003). Consider a WCSP whose constraint graph is depicted in Figure 2.A. Suppose thatwe want to eliminate a variable but we do not want to compute and store constraints witharity higher than two. Then we can only take into consideration variables connected to atmost two variables. In the example, variable x7 is the only one that can be selected. Itselimination transforms the problem into another one whose constraint graph is depicted inFigure 2.B. Now x6 has its degree decreased to two, so it can also be eliminated. Thenew constraint graph is depicted in Figure 2.C. At this point, every variable has degreegreater than two, so we switch to a search schema which selects a variable, say x3, branchesover its values and produces a set of subproblems, one for each value in its domain. All ofthem have the same constraint graph, depicted in Figure 2.D. For each subproblem, it ispossible to eliminate variable x8 and x4. After their elimination it is possible to eliminatex2 and x9, and subsequently x5 and x1. Eliminations after branching have to be done atevery subproblem since the new constraints with which the eliminated variables are replaceddi(cid:11)er from one subproblem to another. In the example, only one branching has been made.Therefore, the elimination of variables has reduced the search tree size from d9 to d, whered is the size of the domains. In the example, we bounded the arity of the new constraintsto two, but it can be generalized to an arbitrary value.3. Solving Still-life with Variable EliminationSL(n) can be easily formulated as a WCSP. The most natural formulation associates onevariable xij with each cell (i; j). Each variable has two domain values. If xij = 0 the cell is427arrosa, Morancho & Niso i jiiii j j j j
A B X +2 −2+1−1−2 +2+1−1 XXXXX
Figure 3: A: Structure of the constraint graph of SL(n). The node in the center, associatedto cell (i; j), is linked to all cells it interacts with. The shadowed area indicatesthe scope of fij. B (left): Constraint graph of SL(6) after clustering cells intorow variables. B (from left to right: Evolution of the constraint graph during theexecution of BE.dead, if xij = 1 it is alive. There is a cost function fij for each variable xij. The scope offij is xij and all its neighbors. It evaluates the stability of xij: if xij is unstable given itsneighbors, fij returns 1; else fij returns 1 (cid:0) xij.4 The objective function to be minimizedis, F = nXi=1 nXj=1 fijIf the instantiation X represents an unstable pattern, F (X) returns 1; else it returns thenumber of dead cells. fij can be stored as a table with 29 entries and evaluated in constanttime.Figure 3.A illustrates the structure of the constraint graph of SL(n). The picture showsan arbitrary node xij linked to all the nodes it interacts with. For instance, there is an edgebetween xij and xi;j+1 because xi;j+1 is a neighbor of xij in the grid and, consequently,both variables are in the scope of fij. There is an edge between xij and xi(cid:0)1;j(cid:0)2 becauseboth cells are neighbors of xi(cid:0)1;j(cid:0)1 in the grid and, therefore, both appear in the scope offi(cid:0)1;j(cid:0)1. The shadowed area represents the scope of fij (namely, xij and all its neighbors).The complete graph is obtained by extending this connectivity pattern to all nodes in thegraph.For the sake of clarity, we use an equivalent but more compact SL(n) formulationthat makes BE easier to describe and implement: we cluster all variables of each rowinto a single meta-variable. Thus, xi denotes the state of cells in the i-th row (namely,xi = (xi1; xi2; : : : ; xin) with xij 2 f0; 1g). Accordingly, it takes values over the sequences ofn bits or, equivalently, over the natural numbers in the interval [0::2n (cid:0) 1]. Cost functionsare accordingly clustered: there is a cost function fi associated with each row i, de(cid:12)ned as,fi = nXj=1 fij4. Recall that, as a WCSP, the task is to minimize the number of dead cells. Therefore, we give cost 1 todead cells and cost 0 to living cells. 428n the practical use of variable eliminationFor internal rows, the scope of fi is fxi(cid:0)1; xi; xi+1g. The cost function of the top row, f1,has scope fx1; x2g. The cost function of the bottom row, fn, has scope fxn(cid:0)1; xng. If thereis some unstable cell in xi, fi(xi(cid:0)1; xi; xi+1) = 1. Else, it returns the number of dead cellsin xi. Evaluating fi is (cid:2)(n) because all the bits of the arguments need to be checked. Thenew, equivalent, objective function is, F = nXi=1 fiFigure 3.B (left) shows the constraint graph of SL(6) with this formulation. An arbitraryvariable xi is connected with the two variables above and the two variables below. Thesequential structure of the constraint graph makes BE very intuitive. It eliminates variablesin decreasing orders. The elimination of xi produces a new function gi = (fi(cid:0)1 + gi+1)
XXXXXX
XXXXXX X i L X i C X i R X n L X n C X n R X X X XXXXXX
A BDC