[PDF] Automated Aggregator -- Rewriting with the Counting Aggregate

Abstract

Answer set programming is a leading declarative constraint programming paradigm with wide use for complex knowledge-intensive applications. Modern answer set programming languages support many equivalent ways to model constraints and specifications in a program. However, so far answer set programming has failed to develop systematic methodologies for building representations that would uniformly lend well to automated processing. This suggests that encoding selection, in the same way as algorithm selection and portfolio solving, may be a viable direction for improving performance of answer-set solving. The necessary precondition is automating the process of generating possible alternative encodings. Here we present an automated rewriting system, the Automated Aggregator or AAgg, that given a non-ground logic program, produces a family of equivalent programs with complementary performance when run under modern answer set programming solvers. We demonstrate this behavior through experimental analysis and propose the system's use in automated answer set programming solver selection tools.

Full PDF

FF. Ricca, A. Russo et al. (Eds.): Proc. 36th International Conferenceon Logic Programming (Technical Communications) 2020 (ICLP 2020)EPTCS 325, 2020, pp. 96–109, doi:10.4204/EPTCS.325.17

Automated Aggregator — Rewriting with the CountingAggregate

Michael Dingess and Miroslaw Truszczynski ∗ Department of Computer Science, University of Kentucky, United States [email protected], [email protected]

Automated Ag-gregator or AAgg , that given a non-ground logic program, produces a family of equivalent programswith complementary performance when run under modern answer set programming solvers. Wedemonstrate this behavior through experimental analysis and propose the system’s use in automatedanswer set programming solver selection tools.

Developers of answer set programming (ASP) solutions often face situations where individual constraintsof a problem or even the problem as a whole can be expressed in several syntactically different but seman-tically equivalent ways. Picking the right representation is crucial to designing these solutions because,given an instance, certain representations perform better (often much better) than others when processedwith modern ASP grounders and solvers. However, techniques for selecting a particular representationare often ad hoc and tailored to the needs of the particular application, and require signiﬁcant program-ming expertise from the programmer.About a decade ago, Gebser et al. presented a set of ”rules-of-thumb” used by their team in manualtuning of ASP solutions [6]. These rules include suggestions on program rewritings that often result insubstantial performance gains. This was veriﬁed experimentally by Gebser et al., with all rewritings usedin their experiments generated manually [6]. Buddenhagen and Lierler studied the impact of these rewrit-ings on an ASP-based natural language processor and reported orders of magnitude gains in memory andtime consumption as a result of some program transformations they executed manually [2]. Because ofthese promising results, researchers proposed to automate the task of program rewriting. Bichler et al.investigated rewritings of long rules guided by the tree decomposition of graphs built of program rules (aform of join optimization) [1]. Hippen and Lierler proposed a method of applying projection to rewriterules based on estimates of the size of the ground program [8].These projects demonstrated that automated program rewritings may lead to programs performingbetter than the original ones when run under current ASP solvers such as gringo/clasp [5]. However, theeffects of rewritings are not uniformly the same. In fact, depending on the actual instance they are run ∗ Research supported by National Science Foundation grant 1707371 ichael Dingess and MiroslawTruszczynski 97with, rewritten programs may perform worse that the original ones. This is a problem because a non-uniform behavior makes the process of selecting the uniformly best encoding ill deﬁned. Nevertheless,automated rewritings potentially can signiﬁcantly improve the state-of-the-art of ASP solving. Namely,the non-uniform behavior of programs obtained by rewriting opens a possibility of using families ofequivalent alternative programs in algorithm selection and portfolio solving [13, 14, 9]. In fact, a recentstudy of suite of six encodings of the Hamiltonian cycle problem shows that one can train performanceprediction models to select, given an instance, a program from the suite to run on that instance, guaran-teeing a much better overall performance on large sets of instances than that of each of the six programsalone [12].In this work, we focused on rewritings in which rules using counting based on explicit naming of arequired number of objects are rewritten with the use of a counting aggregate. We designed a software,

Automated Aggregator ( AAgg ), for automating such rewritings into several equivalent forms. We studiedthe software’s applicability and effectiveness. In our experiments, we applied the software to programssubmitted to past ASP Competitions. We found that while many of them already included aggregateexpressions and

AAgg did not detect any rules to which it could apply, in several cases, it was applicable!For those cases, we studied the performance of the original program and the rewritings produced by

AAgg . The results showed that depending on an instance, rewritings produced by our software oftenperformed better than the original programs. In other words, the family of encodings generated by

AAgg (the original program and its rewritings) showed a complementary performance on both the instancesused in the ASP Competitions and on instances which we generate ourselves. These results show that

AAgg can be used as a tool for generating collections of encodings to be used in algorithm selection andportfolio solving. A systematic experimental veriﬁcation of this claim will be the subject of a futurework.

In this section we describe the aggregate equivalence rewriting, its input and output forms. Currently, wesupport one input form and three output forms. The input form is a rule that expresses a constraint thatthere are b different objects with a certain property by explicitly introducing b variables to name these b objects. The output forms model the same property by relying, in some way, on the counting aggregate.For each of the rewritings we establish its correctness and experimentally study its performance. Thecorrectness follows from Theorems A.3 and A.5, presented and proved in the appendix. We follow the

ASP-Core-2 Input Language Format [3]. We consider rules of the form head ← body . The head may consist of a single literal or be empty, the latter representing a contradiction, making the rulea constraint . The body may contain one or more literals or be empty, which constitutes a fact . Literals are composed of an atom, which may be preceded by not . Negative literals include not ; positive literalsdo not. Atoms have the form p ( t , . . . , t k ) , where p is a predicate symbol of arity k and each t i is a term,that is, a constant, variable, or expression of the form f ( t , . . . , t k ) where f is a function symbol of arity k > t i is a term. Atoms may also take the form of an aggregate expression. In this work, we focus on counting AAgg also accepts other term expressions following the deﬁnition used by clingo [5]. s ≺ count { t : L ; . . . ; t n : L n } ≺ s (1)In (1), t i and L i form an aggregate element , which is a non-empty tuple of terms and literals, re-spectively. The count operation simply counts the number of unique term tuples t i whose correspondingcondition L i holds. The result of the count function is compared by the comparison predicates ≺ and ≺ to the terms s and s . These comparison predicates may be one of { <, ≤ , = , = } . One or both ofthese comparisons can be omitted [5]. The aggregate equivalence rewriting takes as input rules of the form: H ← ^ ≤ i ≤ b F ( X i , Y ) , ^ ≤ i < j ≤ b X i = X j , G . (2)where • H is the head of a rule ( H may be empty making the rule a constraint) • F is a predicate of arity 1 + | Y |• X , . . . , X b are variables, all in the same position in F • Y is a comma-separated list of variables, identical in variables and variable positions for all F inthe rule • G is the remaining body of the rule, possibly empty,and the following hold true: • b ≥ • H , G , and Y have no occurrences of variables X , . . . , X b • The terms V ≤ i < j ≤ b X i = X j may instead be a continuous chain of comparisons: V ≤ i ≤ b − X i < X i + or V ≤ i ≤ b − X i > X i + .Note that X i ’s need not be in the ﬁrst position in F , so long as they are all in the same position in F and the other variables in F (if any) are identical in all occurrences of F in the rule. Additionally, someother forms logically equivalent to V ≤ i < j ≤ b X i = X j are also acceptable. For instance, the condition X = X may be expressed as X + a = X + a , for some integer a . The form of the output depends on the size of Y . When | Y | =

0, the output form is: H ← b ≤ count { X : F ( X ) } , G . (3)where • H , b , F , and G are as above • Aggregate element X : F ( X ) follows the form term : literal as deﬁned above. The prior, X , is atuple of one term, which in this case is a variable. The second, F ( X ) , is a literal.ichael Dingess and MiroslawTruszczynski 99When | Y | >

0, we perform projection to project out the variable X from F . This gives us an outputform consisting of two rules: H ← b ≤ count { X : F ( X , Y ) } , F ′ ( Y ) , G . F ′ ( Y ) ← F ( X , Y ) . (4)where • H , b , F , Y , and G are as above • Aggregate element X : F ( X , Y ) follows the form term : literal as above • F ′ is a new predicate symbol of arity equal to the size of Y , that is, equal to the arity of F minusone; introducing F ′ ensures that variables in Y are universally quantiﬁed.The correctness of this rewriting, follows from the results presented in the appendix (Theorem A.3).Speciﬁcally, we show there that if a program is obtained from another program by rewriting one of itsrules in the way described above, then both programs have the same answer sets (modulo atoms F ′ ( y ) ,if F ′ is introduced). Two alternative, logically equivalent output forms are also available, each derived from the output formpresented above. First, we observe that b ≤ F is logically equivalent to the negation of F < b . We canthen restate the original literal as follows: not count { X : F ( X , Y ) } < b . (5)Second, we note that the input language we consider permits integer-only arithmetic. Consequently,the expression not a < b for some integers a and b is equivalent to the conjunctive expression: ¬ ( a = − ∞ ) ∧ ¬ ( a = − ∞ + ) ∧ . . . ∧ ¬ ( a = b − ) ∧ ¬ ( a = b − ) .Additionally, the result of the count never returns a negative number. Therefore, we can restate theaggregate literal in this alternative output form as a conjunction of aggregate literals having the form: not count { X : F ( X , Y ) } = , not count { X : F ( X , Y ) } = ,. . . , not count { X : F ( X , Y ) } = b − . (6)Note that, due to the precise semantics of logic programs, the equivalence of these two alternative logicforms relies on additional assumptions about the input program (it has to be splittable) and the rule to berewritten. Theorem A.5 provides conditions under which the rewriting is guaranteed to be correct. Theseconditions are checked by our software and only when they hold, the software proceeds with rewritinginto an alternative form (5) or (6), as selected by the user.00 Automated Aggregator —Rewriting withthe Counting Aggregate We now present the

Automated Aggregator ( AAgg ) software system for performing the AggregateEquivalence rewriting. The software provides an automated way to detect rules within a given pro-gram following the input format (2) and rewrite those rules into an equivalent output format (3/4), (5), or(6).

The software relies on the clingo

Python module provided by the Potassco suite [7]. The module iswritten in Python 2.7. As such, Python 2.7 is required to run the

Automated Aggregator system. Instal-lation information is provided in the software’s README ﬁle. The

Automated Aggregator is invoked asfollows: python aagg/main.py[-h,--help] [-o,--output FILENAME] [--no-rewrite][--no-prompt] [--use-anonymous-variable][--aggregate-form ID] [-d,--debug] [-r,--run-clingo][encoding_1 encoding_2 ...]

The -h ﬂag lists the help options. The encoding(s) are the ﬁlename(s) of the input encoding(s), andthe output is the desired name for the output ﬁle. If no output ﬁlename is given, one is generated basedon the ﬁrst input ﬁlename given. When a candidate rule is discovered, the user is shown the proposedrewriting and prompted for conﬁrmation. If the --no-rewrite ﬂag is given, no prompts are given andno rewriting is performed. If the --no-prompt ﬂag is given, no prompts are given and rewriting isperformed where possible. The ID supplied to the --aggregate-form argument informs the programwhich aggregate form to use when performing rewrites: its values 1, 2, and 3 correspond to aggregateforms (3/4), (5), and (6), respectively. The -d debug ﬂag directs the application to operate with verbosity,printing details during the rewriting candidate discovery process and printing some statistics after theapplication’s conclusion. The --r run-clingo ﬂag directs the application to run the resulting programthrough clingo after any rewritings are performed.Finally, the --use-anonymous-variable ﬂag indicates an additional modiﬁcation of the outputform to be performed. It uses the anonymous variable ‘ ’ in place of the variable X in the outputforms listed above, with some additional modiﬁcations of the rule to ensure the transformation is cor-rect. Speciﬁcally, we replace the aggregate element X : F ( X , Y ) with F ( , Y ) : F ( , Y ) rather than with: F ( , Y ) , because the latter is not a valid gringo syntax. We mention this option for completeness sake,since it is available in our implementation. However, we found that the programs generated when usingand when not using the anonymous variable are identical after grounding, so we neither discuss it furthernor use these rewritings in our experiments.By default all boolean ﬂags are disabled and the aggregate-form ID is set to 1 indicating output form(3). At least one input encoding ﬁlename must be speciﬁed. The methodology used for discovering whether a rule is a candidate for the aggregate equivalence rewrit-ing is as follows. The given logic program(s) are parsed by the clingo module, generating an abstract The system and all encodings, test instances, and driver programs can be found online at: https://drive.google.com/drive/folders/1lqRsy9HGIDHvyX_Pvkc8zKt1-xvbwcAp?usp=sharing ichael Dingess and MiroslawTruszczynski 101syntax tree for each rule. Each such tree is passed to a transformer class for preprocessing. After prepro-cessing, some information is gathered from the program as a whole; speciﬁcally, predicate dependenciesare determined, which in turn determine when output forms (5) and (6) are appropriate for a given rule(see the appendix for more details). Rules are then passed individually to an equivalence transformer class for processing. After processing, and if the requested rewriting is possible and conﬁrmed by theuser, the rewritten form of the rule is returned. Otherwise the original rule is returned. All returned rules,rewritten or not, are collected and output to the desired output ﬁle location. Optionally, the resultingprogram is then run using clingo .When a rule is passed to the equivalence transformer for processing, it ﬁrst undergoes a process ofexploration, which traverses the rule’s abstract syntax tree, recording comparison literals between twovariables as well as other pertinent information found along the way. These comparison literals arescrutinized to determine whether a subset of the comparisons follows the form given in (2) or someequivalent format as detailed in the section 2.2. The process also identiﬁes those variables that play therole of X , . . . , X b as in (2). The rule is then analyzed to determine whether there are b occurrences ofsome positive literal of predicate F with the arguments X i and Y , where each X i , 1 ≤ i ≤ b , exists atleast once in the set of occurrences, and Y is the same for each occurrence and contains no variables X , . . . , X b . Let us call this set of b literals combined with the corresponding comparisons following theform given in (2) or equivalent, the rule’s counting literals . Similarly, we denote the variables playingthe role of the variables X i as given in (2) as the rule’s counting variables .After gathering counting literals and counting variables, the equivalence transformer veriﬁes that thecounting variables are not used within literals anywhere in the rule excluding literals within the set ofcounting literals. If this veriﬁcation fails or if any of the constraints for the counting literals cannot besatisﬁed or if no such set of counting variables can be obtained, then the rule is not ﬁt for rewriting. Asa result, no rewriting is performed on the rule and the original rule is returned.In the other case, if the veriﬁcation succeeds (the counting literals and the counting variables satisfyall the required constraints), then we proceed as follows. If the requested output form is (3/4), then wecheck whether | Y | > | Y | > | Y | > Here we discuss the limitations of the

Automated Aggregator in its current form.1. The rewriting is one-directional. The software will only rewrite rules from the form (2) into rulesof the forms (3/4), (5), and (6). As it stands, the system will not rewrite rules given in any of theforms (3/4), (5), or (6) into rules of any of other form.2. In the cases when multiple rewritings are possible for a single rule, only one rewriting will bedetected and performed. To illustrate, if the form given in (2) occurs twice in one rule over adisjoint set of variables and predicates, where both sets of counting literals consider the other setas part of G and all conditions hold, then only the set of counting literals with the highest numberof counting variables will be used for rewriting. If both sets contain the same number of counting02 Automated Aggregator —Rewriting withthe Counting AggregateTable 1: Problem Domains of ASP Competition Encodings with Rules for Rewriting ASP Competition Year Problem Domain of Encodings

The

Automated Aggregator system has been made available for download online. Encodings with whichthe application was tested, their corresponding instances, and (Python) scripts for driving such tests, areincluded there too.

The

Automated Aggregator system was applied to logic programs in gringo syntax submitted to the 2009,2014, and 2015 Answer Set Programming Competitions. Of the 58 encodings given to the application,ﬁve contained rules which were candidates for the rewriting described. Table 1 lists the encoding problemnames and the ASP Competitions for which they were developed.Additionally, to provide an example of the functionality of

AAgg , the system was applied to anencoding for the Hamiltonian Cycle problem. The original program is shown in Figure 1. The rewrittenprogram, as output by

AAgg , is shown in Figure 2. We see that the ﬁrst two constraints in the programare both rewritten with the necessary projection performed as in (4). Experimental results for theseencodings when applied to a generated set of hard instances are given in the following section.

Results were gathered by systematically grounding and solving each instance-encoding pair within fam-ilies of encodings for each problem type. The data sets of instances we used are available together withthe

AAgg tool (see the url listed earlier). Each encoding in each family of encodings was run for eachinstance. The total grounding plus solving time of instance-encoding pairs were recorded and compared.The precise encodings used as input to the

AAgg software for gathering results are listed in Table2. Two output encodings are generated for each input encoding and together they form the encodingfamily for that problem domain. The two output forms used were those shown in equations (3/4) and https://drive.google.com/drive/folders/1lqRsy9HGIDHvyX_Pvkc8zKt1-xvbwcAp?usp=sharing We thank Daniel Houston and Liu Liu for providing us with the data sets of instances for the Latin Square and the Hamil-tonian Cycle problems, respectively. The data sets for the remaining problems were generated by software tools we developedfor the purpose. These tools and descriptions of instance sets used in experiments can be found at the

AAgg site. ichael Dingess and MiroslawTruszczynski 103 node(X) :- edge(X,Y).node(X) :- edge(Y,X).{ hc(X,Y) } :- edge(X,Y).:- hc(X,Y), hc(X,Z), Y!=Z.:- hc(X,Y), hc(Z,Y), X!=Z.reach(X,Y) :- hc(X,Y).reach(X,Y) :- hc(X,Z), reach(Z,Y).:- node(X), node(Y), not reach(X,Y).

Figure 1: Example Hamiltonian Cycle problem encoding. Original version. node(X) :- edge(X,Y).node(X) :- edge(Y,X).{ hc(X,Y) } :- edge(X,Y).:- 2 <=

Figure 2: Example Hamiltonian Cycle problem encoding. Rewritten version.04 Automated Aggregator —Rewriting withthe Counting AggregateTable 2: Sources of Encodings Used for Experimental Results

Source of Encoding Problem Domain of Encodings

ASP Competition 2009 Wire RoutingASP Competition 2015 Steiner Tree & Graceful GraphsHome-Brewed Latin Squares& Hamiltonian Cycle a a We are thankful to Daniel Houston and Liu Liu for supplying the Latin Squares and Hamiltonian Cycle instance sets,respectively.

Table 3: Result Statistics by Problem Domain

Encoding Wins Exclusive Wins Wins by 20% Wins by 50%

Wire Routing Input Encoding 122 (58.4%) 34 (27.8%) 115 (92.0%) 97 (77.6%)AAgg Output Form (3/4) 50 (24.0%) 26 (52.0%) 48 (87.3%) 38 (69.1%)AAgg Output Form (6) 37 (17.7%) 13 (35.1%) 39 (90.7%) 23 (53.5%)Steiner Tree Input Encoding 130 (33.9%) 0 (0%) 1 (0.8%) 0 (0%)AAgg Output Form (3/4) 123 (32.1%) 0 (0%) 4 (3.3%) 0 (0%)AAgg Output Form (6) 130 (33.9%) 0 (0%) 6 (4.6%) 0 (0%)Graceful Graphs Input Encoding 212 (37.3%) 62 (29.2%) 182 (85.8%) 128 (60.4%)AAgg Output Form (3/4) 97 (17.1%) 23 (23.7%) 82 (84.5%) 58 (59.8%)AAgg Output Form (6) 259 (45.6%) 51 (19.7%) 229 (88.4%) 169 (63.7%)Latin Squares Input Encoding 5611 (77.7%) 5 (0.1%) 4435 (79.0%) 2003 (35.7%)AAgg Output Form (3/4) 1432 (19.8%) 0 (0%) 655 (45.7%) 86 (6.0%)AAgg Output Form (6) 176 (2.4%) 0 (0%) 64 (36.4%) 7 (4.0%)Hamiltonian Cycle Input Encoding 72 (28.7%) 47 (65.3%) 59 (81.9%) 34 (47.2%)AAgg Output Form (3/4) 102 (40.6%) 69 (67.7%) 79 (77.5%) 52 (51.0%)AAgg Output Form (6) 77 (30.7%) 45 (58.4%) 64 (83.1%) 44 (57.1%)(6); previous experiments showed extreme similarity between forms (3/4) and (5), so (5) was left out topreserve machine time. The machine used for testing contained an Intel(R) Core(TM) i7-7700 CPU @3.60GHz with 16GB RAM.Result statistics are shown in Table 3. Data is grouped by problem domain. The ﬁrst line of eachgrouping shows statistics for the input encoding. The second line shows statistics for the encoding outputby the

AAgg software using the ﬁrst line as input and the output rule form (3/4) as the selected outputform. The third and ﬁnal line of each grouping shows statistics for the encoding output by the

AAgg software again using the ﬁrst line as input but now selecting the output form (6) as the chosen outputform.A win is when an encoding grounds and solves for an instance in the shortest amount of time ascompared with the other two encoding in its grouping. The percentage value beside the win count is theproportion of instances which that encoding won relative to the number of instances in the instance setfor which at least one of the three encodings terminated within the total time limit set for grounding andsolving. This time limit was set to 200 seconds in all experiments, except those with encodings of theHamiltonian Cycle problem that used a timeout value of 400 seconds due to the relative difﬁculty of theinstance set (in all problem domains except for the Hamiltonian Cycle domain, the number of instancesfor which no encoding computed an answer within the time limit was very small; for the Hamiltonianichael Dingess and MiroslawTruszczynski 105Cycle problem, even with the increased time limit, it was signiﬁcant).An exclusive win is when the encoding is the only encoding in its grouping to ﬁnd a solution (ordetermine there is no solution) for an instance while both of the other two encodings failed to do sowithin the time limit. The column shows the number and the percentage of wins which were exclusivewins for the encoding. The next column shows the number and the percentage of wins by a margin ofat least 20% (this is when the best encoding runs at least 20% faster than the second one; in case theof exclusive wins, we count a win as by at least 20%, if it is faster by at least 20% than the time limitused). The data in the last column shows the numbers and percentages of wins by at least 50% (it is tobe interpreted similarly to the data in the previous column).Results indicate that in some cases, the rewriting can provide complementary performance of encod-ings. Speciﬁcally, the results for the Wire Routing, Graceful Graphs, and Hamiltonian Cycle problemssupport the claim. This is shown ﬁrst by the fact that for each problem, each of the encodings scoresa signiﬁcant proportion of wins, and that among those wins a signiﬁcant proportion are exclusive wins,and an even greater proportion (about 50% or more) are wins by at least 50%. This means that about halfof the times when an encoding outperforms the two other encodings, it outperforms both by a factor ofat least two.The Steiner Tree results indicate that sometimes the rewriting produces little to no effect at all. Whileeach encoding for the problem registers a similar proportion of wins, there are very few instances (justeleven out of 283) when the best encoding outperforms the other two by 20% or more, and no instanceswhen the best encoding would outperform the other two by 50% or more. The Latin Squares results aremixed in the sense that for the most part the original encoding works best. However, one of the rewritingsregisters almost 20% of wins. Moreover, about 45% of those wins are by 20% or more and 6% by 50%or more.In summary, we see that the rewriting can provide programs that perform complementary to theoriginal. This complementarity is perhaps somewhat surprising, because aggregates are assumed to leadto better performance. Our results show that the picture is more complicated and whether rewriting withaggregates yields better performance is instance-dependent. It is also interesting to note that for someencodings replacing simple counting, like that used in the Latin Square encodings (no two identicalelements in a row or column), with the count aggregates does not lead to substantial improvements.Finally, for some problems (in our experiments for the Steiner Tree problem) where complementarybehavior does emerge, the differences in performance are relatively small.

As detailed in Section 3.3, the Automated Aggregator software can be improved. Extending the softwareso that it rewrites rules by eliminating the counting aggregate by inverting the current rewriting seemsto be a potentially most beneﬁcial direction. Indeed, our results show that introducing aggregates doesnot guarantee uniformly improved performance. This gives reason to think that removing the counting aggregate, that is, rewriting it in an explicit way, has a potential of generating encodings that on manyinstances may perform better. Just like the present version of the

AAgg system, this could yield collec-tions of encodings of complementary strengths. Moreover, this form of rewriting would be quite widelyapplicable, as the counting aggregate is commonly used.Expanding the software to detect and perform multiple aggregate equivalence rewritings on a singlerule and to detect more obscure forms of representations are two other directions for improvement.While necessary to ensure the software has a possibly broad scope of applicability, we do not expect06 Automated Aggregator —Rewriting withthe Counting Aggregatethese extensions to have a major practical impact due to low frequency with which such less intuitive andconvoluted forms of modeling are found in programs.Automated rewriting has been studied before. The P

ROJECTOR system [8] and the lpopt system [1]are two notable examples. That earlier work sought to develop rewritings improving the performanceover the original ones. That goal is in general difﬁcult to meet; both systems were shown to offer gains,but the rewritten encodings are not always performing better. Our goal was different. We aimed atrewritings of varying relative performance depending on input instances. With the

AAgg system weshowed that even a rather simple rewriting consisting of introducing the count aggregate often leads tofamilies of encodings with complementary strengths (areas of superior performance). This opens a wayfor using machine learning to develop models in support of effective encoding selection or encodingportfolio solving [12]. In such work, to generate promising collections of encodings, one could usethe

AAgg system, with extensions mentioned above, but also other program rewriting software such asP

ROJECTOR and lpopt which, as noted, while they do yield good encodings, often better than the originalone, they do not perform uniformly better.

References [1] Manuel Bichler, Michael Morak & Stefan Woltran (2016): lpopt: A Rule Optimization Tool for Answer SetProgramming . In Manuel V. Hermenegildo & Pedro L´opez-Garc´ıa, editors: Proceedingsofthe26thInterna-tionalSymposiumonLogic-BasedProgramSynthesisandTransformation,LOPSTR2016,SelectedPapers,LectureNotesinComputerScience 10184, Springer, pp. 114–130, doi: .[2] Matthew Buddenhagen & Yuliya Lierler (2015):

Performance Tuning in Answer Set Programming . InFrancesco Calimeri, Giovambattista Ianni & Miroslaw Truszczynski, editors: Proceedings of the 13th In-ternationalConferenceonLogicProgrammingandNonmonotonicReasoning,LPNMR2015, LectureNotesinComputerScience 9345, Springer, pp. 186–198, doi: .[3] Francesco Calimeri, Wolfgang Faber, Martin Gebser, Giovambattista Ianni, Roland Kaminski, Thomas Kren-nwallner, Nicola Leone, Marco Maratea, Francesco Ricca & Torsten Schaub (2020):

ASP-Core-2 Input Lan-guage Format . TheoryPract.Log.Program.20(2), pp. 294–309, doi: .[4] Paolo Ferraris, Joohyung Lee, Vladimir Lifschitz & Ravi Palla (2009):

Symmetric Splitting in the GeneralTheory of Stable Models . In Craig Boutilier, editor: Proceedingsof the 21st InternationalJoint Conferenceon Artiﬁcial Intelligence, IJCAI 2009, pp. 797–803. Available at http://ijcai.org/Proceedings/09/Papers/137.pdf .[5] M. Gebser, R. Kaminski, B. Kaufmann, M. Lindauer, M. Ostrowski, J. Romero, T. Schaub & S. Thiele(2015):

Potassco User Guide . Available at https://github.com/potassco/guide/releases/ .[6] Martin Gebser, Roland Kaminski, Benjamin Kaufmann & Torsten Schaub (2011):

Challenges in Answer SetSolving . In Marcello Balduccini & Tran Cao Son, editors: LogicProgramming,KnowledgeRepresentation,andNonmonotonicReasoning-EssaysDedicatedtoMichaelGelfondontheOccasionofHis65thBirthday,LectureNotesinComputerScience 6565, Springer, pp. 74–90, doi: .[7] Martin Gebser, Roland Kaminski, Benjamin Kaufmann & Torsten Schaub (2019):

Multi-shot ASP solvingwith clingo . TheoryPract.Log.Program.19(1), pp. 27–82, doi: .[8] Nicholas Hippen & Yuliya Lierler (2019):

Automatic Program Rewriting in Non-Ground Answer Set Pro-grams . In Jos´e J´ulio Alferes & Moa Johansson, editors: Proceedings of the 21th InternationalSymposiumon Practical Aspects of Declarative Languages, PADL 2019, Lecture Notes in Computer Science 11372,Springer, pp. 19–36, doi: .[9] Holger H. Hoos, Marius Lindauer & Torsten Schaub (2014): claspfolio 2: Advances in Algorithm Se-lection for Answer Set Programming . Theory Pract. Log. Program. 14(4-5), pp. 569–585, doi: . ichael Dingess and MiroslawTruszczynski 107 [10] Yuliya Lierler (2019): Strong Equivalence and Program’s Structure in Arguing Essential Equivalence Be-tween First-Order Logic Programs . In Jos´e J´ulio Alferes & Moa Johansson, editors: Proceedings of 21thInternational Symposium on Practical Aspects of Declarative Languages, PADL 2019, Lecture Notes inComputerScience 11372, Springer, pp. 1–18, doi: .[11] Vladimir Lifschitz & Hudson Turner (1994):

Splitting a Logic Program . In Pascal Van Hentenryck, editor:Proceedingsofthe11thInternationalConferenceonLogicProgramming,ICLP1994, MIT Press, pp. 23–37.[12] Liu Liu & Miroslaw Truszczynski (2019):

Encoding Selection for Solving Hamiltonian Cycle Problemswith ASP . In Bart Bogaerts, Esra Erdem, Paul Fodor, Andrea Formisano, Giovambattista Ianni, DanielaInclezan, Germ´an Vidal, Alicia Villanueva, Marina De Vos & Fangkai Yang, editors: Proceedingsofthe35thInternationalConferenceon LogicProgramming,ICLP 2019, TechnicalCommunications, EPTCS 306, pp.302–308, doi: .[13] John R. Rice (1976):

The Algorithm Selection Problem . Advances in Computers 15, pp. 65–118, doi: .[14] Lin Xu, Frank Hutter, Holger H. Hoos & Kevin Leyton-Brown (2008):

SATzilla: Portfolio-based AlgorithmSelection for SAT . J.Artif.Intell.Res. 32, pp. 565–606, doi: . A Correctness of Rewritings

Let us consider the following program rule ( H may be ⊥ ): H ← ^ ≤ i ≤ b F ( X i , ZZZ , ZZZ ′′′ ) , Q ( XXX ) , G ( ZZZ ′′′ , ZZZ ′′′′′′ ) . (7)where F is a predicate, XXX a tuple of variables X , . . . , X b , Q ( XXX ) is a list of literals over variables X , . . . , X b ,and ZZZ , ZZZ ′′′ and

ZZZ ′′′′′′ are three pairwise disjoint tuples of variables, and disjoint with

XXX , and H contains noneof X i .Let P be a program containing rule (7) and let P ′ be the program obtained from P by replacing thatrule with the two rules H ← ^ ≤ i ≤ b F ( X i , ZZZ , ZZZ ′′′ ) , Q ( XXX ) , G ( ZZZ ′′′ , ZZZ ′′′′′′ ) , F ′ ( ZZZ ) . F ′ ( ZZZ ) ← F ( X , ZZZ , ZZZ ′′′ ) . (8)where F ′ is a predicate not occurring in P . Theorem A.1.

The programs P and P ′ have the same answer sets modulo ground atoms of the formF ′ ( zzz ) .Proof. (Sketch) Consider a ground instance r of rule (7) and let a be the ﬁrst atoms in the body of r (thatis, a ground instance of the atom F ( X , ZZZ , ZZZ ′′′ ) ). In P ′ there are ground rules r ′ and r ′′ obtained from (8)using the same variable instantiation as that used to produce r . Clearly, r contributes to the reduct of ground ( P ) if and only if r ′ and r ′′ contribute to the reduct of ground ( P ′ ) . Moreover, if they do, r “ﬁres”in the least model computation if and only if r ′ ﬁres in the least model computation. It follows that aninterpretation I of P is an answer set of P if and only if I ∪ J is an answer set of P ′ , where J consists ofall atoms F ′ ( zzz ) such that F ( x , zzz , zzz ′′′ ) ∈ I , for some constant x and a tuple of constants zzz ′′′ .Next, we recall the following theorem proved by Lierler [10].08 Automated Aggregator —Rewriting withthe Counting Aggregate Theorem A.2.

Let H be an atom (H may be ⊥ ), G a list of literals, X a variable, ZZZ a tuple of variables,each different from X and each with at least one occurrence in a literal in G, and F a predicate symbolof arity + | ZZZ | . If b is an integer, and X , . . . , X b are variables without any occurrence in H and G, thenthe logic program rule H ← ^ ≤ i ≤ b F ( X i , ZZZ ) , ^ ≤ i < j ≤ b X i = X j , G . (9) is strongly equivalent to the logic program ruleH ← b ≤ count { X : F ( X , ZZZ ) } , G . (10) where V is used to represent a sequence of expressions separated by commas. Combining the two results leads to the following result that proves the correctness of the ﬁrst rewrit-ing implemented by our tool

AAgg . Theorem A.3.

Let P be a program containing a rule r of the formH ← ^ ≤ i ≤ b F ( X i , ZZZ , ZZZ ′′′ ) , ^ ≤ i < j ≤ b X i = X j , G ( ZZZ ′′′ , ZZZ ′′′′′′ ) . (11) under the same assumptions about variable tuples XXX , ZZZ , ZZZ ′′′ and ZZZ ′′′′′′ as before. If ZZZ is empty, let P ′ beP. Otherwise, let P ′ be obtained from P by replacing r with the rules (8), adjusting the ﬁrst of them tocontain V ≤ i < j ≤ b X i = X j in place of Q ( XXX ) . Let us call the ﬁrst of these two rules r (reusing the name ofthe original rule). Finally, let P ′′ be obtained from P ′ by replacing r with the corresponding rule (10).Then, the programs P and P ′ have the same answer sets (in the case, when ZZZ is not empty, the sameanswer sets modulo atoms F ′ ( zzz ) ).Proof. Clearly, rule r is a special case of a rule of the form (7). By A.1, programs P and P ′ have the sameanswer sets (when ZZZ , is not empty, the same answer sets modulo atoms F ′ ( zzz ) ). The rule r in P ′ is of theform (9) required by Theorem A.2. Applying this theorem shows that programs P ′ and P ′′ have the sameanswer sets and the assertion follows.Once a program has a rule of the form (10), we can often modify it further by exploiting alternativeencodings of the aggregate expressions. In particular, under some assumptions about the structure of theprogram, we can replace a rule (10) by H ← not count { X : F ( X , ZZZ ) } < b , G . (12)where we assume that the variable X is not a variable in ZZZ , and that all variables in

ZZZ appear in G .We recall that a partition ( P b , P t ) of a program P is a splitting of P if no predicate appearing in thehead of a rule from P t appears in P b [11, 4]. A well-known result on splitting states that answer setof programs that have a splitting can be described in terms of answer sets of programs that form thesplitting. Theorem A.4.

Let P be a logic program and let ( P b , P t ) be a splitting of P. For every answer set I b ofP b , every answer set of the program P t ∪ I b is an answer set of P. Conversely, for every answer set I of P,there is an answer set I b of P b such that I is an answer set of P t ∪ I b . ichael Dingess and MiroslawTruszczynski 109This result implies that in programs that have a splitting, a rule in P t containing in its body an ag-gregate expression involving only predicates appearing in P b can be replaced by a rule in which thisaggregate expression is replaced by any of its (classically) equivalent forms. We formally state this resultfor the case of rules of the form (10). Theorem A.5.

Let P be a logic program and let ( P b , P t ) be a splitting of P. If P t contains a rule of theform (10) and F appears in P b , then P and the program P ′ obtained from P by replacing the rule (10) bythe rule (12) have the same answer sets.Proof. (Sketch) Let P ′ t be the program obtained from P t by replacing the rule (10) by the rule (12). It isclear that ( P b , P ′ t ) is a splitting of P ′ . Let I be an answer set of P . By A.4, there is an answer set I b of P b such that I is an answer set of P t ∪ I b . In particular, I is an answer set of the program I b ∪ ground ( P t ) . Let Q be the program obtained by simplifying the bodies of the rules in ground ( P t ) as follows. If a conjunct c in the body of a rule in ground ( P t ) involves only atoms from the Herbrand base HB ( P b ) of P b , we remove c if I b | = c , and we remove the rule, if I b = c . Because atoms from HB ( P b ) do not appear in the heads ofthe rules in ground ( P t ) , I is an answer set of Q ∪ I b .We denote by Q ′ the program obtained by the same simpliﬁcation process from ground ( P ′ t ) . Fromthe deﬁnition of P ′ t it follows that Q ′ = Q (indeed, the only difference between P t and P ′ t is in the bodiesof some rules, in which an aggregate built entirely from the atoms in HB ( P b ) is replaced by a classicallyequivalent one; thus, both expressions evaluate in the same way under I b and the contribution of thecorresponding rules to Q and Q ′ in each case is the same). Consequently, I is an answer set of Q ′ ∪ I b .Because atoms from HB ( P b ) do not appear in the heads of the rules in ground ( P ′ t ) , I is an answer set of ground ( P ′ t ) ∪ I b and, because ( P b , P ′ t ) is a splitting of P ′ , also an answer set of P ′ . A similar argumentshows that answer sets of P ′ are also answer sets of P .It is clear that the same argument applies to other similar rewritings, for instance, to the one thatreplaces the rule (9) by the rule H ← ^ ≤ i < b − not i = count { X : F ( X , ZZZ ) } , G . (13)where, as before, we assume that the variable X is not a variable in ZZZ , and that all variables in