[PDF] Efficient Incremental Modelling and Solving

Abstract

In various scenarios, a single phase of modelling and solving is either not sufficient or not feasible to solve the problem at hand. A standard approach to solving AI planning problems, for example, is to incrementally extend the planning horizon and solve the problem of trying to find a plan of a particular length. Indeed, any optimization problem can be solved as a sequence of decision problems in which the objective value is incrementally updated. Another example is constraint dominance programming (CDP), in which search is organized into a sequence of levels. The contribution of this work is to enable a native interaction between SAT solvers and the automated modelling system Savile Row to support efficient incremental modelling and solving. This allows adding new decision variables, posting new constraints and removing existing constraints (via assumptions) between incremental steps. Two additional benefits of the native coupling of modelling and solving are the ability to retain learned information between SAT solver calls and to enable SAT assumptions, further improving flexibility and efficiency. Experiments on one optimisation problem and five pattern mining tasks demonstrate that the native interaction between the modelling system and SAT solver consistently improves performance significantly.

Full PDF

EEfﬁcient Incremental Modelling and Solving

G¨okberk Koc¸ak, ¨Ozg¨ur Akg¨un, Nguyen Dang, Ian Miguel

School of Computer Science, University of St Andrews, UK { gk34,ozgur.akgun,nttd,ijm } @st-andrews.ac.uk Abstract.

In various scenarios, a single phase of modelling and solving is eithernot sufﬁcient or not feasible to solve the problem at hand. A standard approach tosolving AI planning problems, for example, is to incrementally extend the plan-ning horizon and solve the problem of trying to ﬁnd a plan of a particular length.Indeed, any optimization problem can be solved as a sequence of decision prob-lems in which the objective value is incrementally updated. Another example isconstraint dominance programming (CDP), in which search is organized into asequence of levels. The contribution of this work is to enable a native interactionbetween SAT solvers and the automated modelling system S

AVILE R OW to sup-port efﬁcient incremental modelling and solving. This allows adding new decisionvariables, posting new constraints and removing existing constraints (via assump-tions) between incremental steps. Two additional beneﬁts of the native couplingof modelling and solving are the ability to retain learned information betweenSAT solver calls and to enable SAT assumptions, further improving ﬂexibilityand efﬁciency. Experiments on one optimisation problem and ﬁve pattern miningtasks demonstrate that the native interaction between the modelling system andSAT solver consistently improves performance signiﬁcantly. Keywords:

Constraint Programming · Constraint Modelling · Incremental Solv-ing · Constraint Optimization · Planning · Data Mining · Itemset Mining · PatternMining · Dominance Programming

When approaching the solution of a class of problems, in many cases a simple single-phase approach works well: formulate a model parameterised on the data that deﬁnesan individual instance of the problem class, and solve each instance in a single solvingphase. In some scenarios however, as we will illustrate below, this approach is either notsufﬁcient or not feasible to solve the problem at hand. Instead, a larger or more difﬁcultproblem instance is solved as a sequence of smaller or simpler related instances. Inthis situation, communication between a modelling system that prepares an instance forsolution for a low-level solver and the solver itself can become a bottleneck, with muchwork repeated between consecutive, very similar instances.Incremental modelling and solving is a process of constructing an initial low levelinstance and obtaining further instances in a sequence by modelling and encoding justthe differences between the previous and the new instance. Most SAT solvers are ca-pable of working incrementally by allowing to append new irrevocable clauses or setcertain assumptions that are temporary to each call. a r X i v : . [ c s . A I] S e p G Koc¸ak et al.

To illustrate, consider the task of pattern mining, the process of extracting useful pat-terns from large data sets. The most well-known pattern mining task, frequent itemsetmining [1], requires us to ﬁnd the sets of items whose number of occurrences together(known as the support ) in a transactional database exceeds a speciﬁed threshold. Spe-cialised, efﬁcient tools exist for standard pattern mining tasks [26]. However, ﬁndingall frequent patterns is rarely useful since it usually produces a very large volume ofresults. Rather, an end-user is typically interested in focusing on a much smaller set ofpatterns for further inspection. One approach is to seek patterns that compactly repre-sent the full set of patterns [23], another is to consider domain-speciﬁc side constraints[4] that further reduce the volume of patterns returned. Both methods require a moresophisticated search for patterns and hence carry an increase in computational cost.Constraint-based mining [8] offers a general means of modelling more sophisticatedpattern mining tasks. Its ﬂexibility means that side constraints can easily be added tothe basic model of a pattern mining problem, which is difﬁcult to do with a specialisedmining tool. We distinguish local and non-local constraints in modelling pattern miningproblems. The former, such as the frequent itemset property, can be expressed simply ona candidate solution, e.g. by constraining the support of a candidate itemset to be equalto or greater than the threshold. Non-local constraints, however, must be expressed be-tween candidate solutions and are therefore more challenging to model. Closed frequentitemset mining [23,13], which is one approach to representing the full set of frequentitemsets more compactly, is an illustrative example: it stipulates that an itemset is closedfrequent if its support exceeds that of all of its supersets.Constraint Dominance Programming (CDP) [19] provides a method of supportingconstraints between solutions via dominance blocking constraints : every time a newsolution is found, a new blocking constraint is added to disallow solutions that it woulddominate. An extension to CDP, CDP+I [11,12] exploits incomparability between so-lutions (solutions A and B are incomparable if A does not dominate B and B does notdominate A ) so that they may be found in batches. The search is organized into levelsin which all solutions are incomparable, and hence may be found together through asingle call to a solver without the need for additional per-solution blocking constraints.Operating on CDP, which requires posting new constraints after each solution, and op-erating on CDP+I, which has requirements similar to CDP’s but for a batch of solutions,are incremental modelling and solving examples.Other problem types that might be considered for incrementality are constrainedoptimisation problems (COP), where an objective function is given in addition to astandard constraint satisfaction problem, or AI planning problems where we can incre-mentally extend the planning horizon and solve the problem of trying to ﬁnd a plan ofa particular length.CP solvers like M INION [9] or chuffed [7] are typically capable of supporting COPdirectly in addition to CSP. However other solver types, such as standard SAT solvers,sometimes lack the facility to represent objective values. Instead of using standard SATencoding for the problem, a maximal satisﬁability problem encoding (MaxSAT) canbe used to represent the objective function. However, converting a SAT encoding to aMaxSAT encoding may be time consuming depending on the size of the instance. fﬁcient Incremental Modelling and Solving 3

Alternatively, using SAT or SMT solvers is possible for optimisation and planningproblems via a sequence of solver calls in an incremental structure. The COP can beencoded as a pair of CSP’s with a different optimisation value encoded into each CSP.Afterwards, those CSP instances can be solved for satisﬁability. The threshold wherethe problem switches from SAT to UNSAT or the other way around can indicate theproven optimal value for the original COP instance. Searching for this threshold willhave multiple solver calls that can be adjusted for efﬁciency.

Contribution

This paper proposes to enable a native interaction between the SATsolver and the automated modelling system that organizes the CDP+I mining processand the optimization process using a SAT backend. This is done to remove a major bot-tleneck in which the consecutive SAT calls are operated. Two additional beneﬁts of thisnative coupling are the ability to retain learned information between SAT solver callsand to enable SAT assumptions, further improving efﬁciency by reducing redundantsearch between levels.Our experiments on one optimization problem and ﬁve pattern mining tasks demon-strate that the native interaction between the modelling system and SAT solver consis-tently improves the performance of each system signiﬁcantly. E SSENCE [2] is an abstract high-level constraint speciﬁcation language. It has the powerto represent complex abstract structures, such as sets, multisets, sequences, and parti-tions. It supports arbitrary nesting of these structures and also supports quantiﬁcationover decision variables. Hence, the language is ideally suited to expressing data miningproblems. E

SSENCE can be reﬁned into a constraint model in E

SSENCE P RIME [21]using C

ONJURE [2]. Due to the high-level abstract nature of the speciﬁcation, there aremultiple ways of compiling E

SSENCE to E

SSENCE P RIME . C

ONJURE has a number ofbuilt in heuristics to make modelling decisions automatically. Alternatively, the mod-elling decisions can be manually selected. S

AVILE R OW translates E SSENCE P RIME into input suitable for a variety of black-box solvers while applying solver speciﬁcoptimisations to the model, such as rewriting constraint expressions, common sub-expression elimination and using M

INION to enforce strong levels of consistency ina preprocessing step [22].A constraint satisfaction problem consists of decision variables ( V ), their domains( D ) and problem constraints ( C ). CDP extends constraint satisfaction problems (CSP)by adding a dominance relation ( R ), which deﬁnes the condition under which an as-signment to the decision variables is dominated by another assignment. In CDP, anassignment is a solution if it is not dominated by any other solution. When enumerat-ing all solutions of a CDP instance, dominance blocking constraints can be generatedfor each solution as soon as they are found. These constraints will eliminate all fu-ture dominated assignments. However, a post-processing step may still be needed [19]. G Koc¸ak et al. language Essence letting

ITEM be domain int (...) letting

SUPPORT be domain int (...) given db : mset of set of ITEM given minSupport : intfind itemset: set of ITEM find support: SUPPORT such that support = sum entry in db . toInt (itemset subsetEq entry),support >= minSupport, SideConstraints dominanceRelation (itemset subsetEq fromSolution (itemset))-> (support != fromSolution (support)) incomparabilityFunction descending |itemset|

Fig. 1: Closed Frequent Itemset Mining in E

SSENCE . The dominance relation deﬁnes theclosedness property between the currently sought solution and the previous solutions via fromSolution . The incomparability function is deﬁned on cardinality using a descendingorder, since closedness is deﬁned by a superset relation.

CDP+I extends CDP by deﬁning an incomparability function ( I ), which deﬁnes whentwo assignments are incomparable (mutually non-dominating).An itemset mining problem can be speciﬁed naturally in E SSENCE as a multisetof transactions. Depending on the nature of the mining task, each transaction can berepresented using a set of integer item labels or ornamented (using tuples or records)with additional information such as a class label. Figure 1 presents the speciﬁcation ofthe Closed Frequent Itemset Mining problem in three parts. The ﬁrst part is the declara-tion of the parameters, the decision variables and any constraints that concern a singlesolution. The second part gives the dominance relation in terms of previously foundsolutions. The third part deﬁnes the incomparability function, which in this problem isany two solutions that have the same itemset cardinality.Algorithm 1 makes use of both the dominance relation and the incomparabilityfunction when solving CDP+I instances. The CDP+I algorithm aims to ﬁnd all non-dominated solutions. It achieves this by partitioning the search space into levels ex-tracted from the incomparability function. For example, for the closed itemset miningproblem, a separate search is conducted for every value in the domain of |itemset| .For every level, we take the base CSP model and start by adding a level restriction con- fﬁcient Incremental Modelling and Solving 5

Algorithm 1

CDP+I ( V, D, C, R, I ) ← CDP + I levels ← getLevels ( I ) for l ← levels do C ← C ∪ levelRestriction ( l ) CSP ← ( V, D, C ) S ← findAllSolutions ( CSP ) B ← generateDominanceBlocking ( R, S ) C ← C − levelRestriction ( l ) C ← C ∪ B straint to it. In our running example, this corresponds to posting a cardinality constrainton the itemset. Then, we enumerate all solutions and generate the corresponding dom-inance blocking constraints. The problem constraints are then updated to remove thelevel restriction constraint before adding the new dominance blocking constraints.Previous implementations of CDP+I made a separate solver call for each level whenusing an AllSAT solver and a separate solver call for each solution when using a stan-dard SAT solver. This allows for a simple implementation of the CDP+I algorithm at thecost of losing learned clauses between separate solver calls. The performance of mod-ern SAT solvers relies heavily on learned clauses [16]. Section 4 presents our approachfor enabling native interaction with SAT and AllSAT solvers. Through the use of as-sumptions in SAT, we achieve improved performance without changing the high-levelproblem speciﬁcations.The use of E SSENCE for specifying the problems allows access to a large number ofdifferent models (via C

ONJURE options), different preprocessing options (via S

AVILE R OW options), and different solvers (SAT and AllSAT). A COP problem can be rewritten as a series of CSP problems where the objective func-tion value is encoded differently in each of them. A naive but inefﬁcient approach wouldbe to exhaustively try all possible values and pick the best one which satisﬁes the in-stance. Alternatively, we can apply a search for the optimal objective function valuein its domain space. Three different search strategies which are supported by S

AVILE R OW can be considered for this purpose, namely Linear, UNSAT, and Bisect. They areexplained as follows (assuming that we are solving a maximisation problem). Linear search

Linear search is a straightforward search strategy to search for the opti-mal value. It starts from the lowest value and increase the optimal by one incrementallyuntil the problem becomes unsatisﬁable.

UNSAT search

This is also a straightforward strategy which starts from the highestobjective function value and decreases it one by one until the problem becomes satisﬁ-able.

G Koc¸ak et al.

Bisect search

This is a binary search strategy also known as dichotomic search. Itstarts with splitting the objective function’s domain into two. This results in two CSPproblems, each with half of split domain. The satisﬁable CSP problem is chosen andthe same procedure is repeated until the objective function’s domain size reduce to one(the optimal objective function value).

Throughout this paper we will experiment on six problem classes to demonstrate the en-hancements we will introduce. Five of these problem classes are pattern mining prob-lems encoded in CDP+I and the instances we use are taken from the supplementarymaterial of [12]. The sixth problem class is Multi-Mode Resource Constrained ProjectScheduling Problem (MRCPSP).The pattern mining problems are variations of the frequent itemset mining problem,each parameterised over a dataset of transactions. The task is to ﬁnd a set of frequentitems that satisfy minimum value and maximum cost side constraints. In addition, eachproblem class has a different constraint among assignments which encodes the domi-nance relationship.

Closed frequent itemset mining (CFIS)

A frequent itemset is closed if and only if itssupport is greater than all of its supersets [23]. The support of an itemset is the numberof times the set occurs together in the transactions database. Maximal itemset mining isa similar problem class where the only difference is that a frequent itemset is maximal if none of its supersets are frequent. We do not include maximal itemset mining in ourexperiments since it is a simpler version of closed itemset mining.

Generator frequent itemset mining (GFIS)

Generator itemsets (also called free item-sets or key itemsets) [5] are frequent itemsets which do not have any frequent subsetswith the same support.

Minimal rare itemset mining (MRIM)

A minimal rare itemset is an infrequent item-set whose subsets are all frequent [25].

Closed discriminative itemset mining (DFIS)

Discriminative itemset mining [6] isparameterised over a dataset of transactions that also have a class label (positive/neg-ative). Instead of a single support value, we maintain two support values: the positivesupport of an itemset is the number of transactions that are labelled positive and havethe itemset as a subset. The negative support similarly is the number of transactionsthat are labelled negative and have the itemset as a subset. A discriminative itemset isone where the difference between the positive and the negative support is greater thana given threshold. A closed discriminative itemset is a discriminative itemset that hassupport greater than all of its supersets. fﬁcient Incremental Modelling and Solving 7

Relevant subgroup discovery (RSD)

Relevant subgroup discovery [15] is similar todiscriminative itemset mining. While discriminative itemset mining reasons on the sup-port numbers of different classes of transactions, relevant subgroup discovery reasonsusing the actual sets of transactions that provide the support [19]. A relevant subgroup X is an itemset where at least one of following conditions hold; 1) For positive trans-actions, no other itemset covers a superset of the transactions covered by X , 2) Fornegative transactions, no other itemset covers a subset of the transactions covered by X or 3) For both kinds of transactions, no other itemset that has the same total cover is asuperset of X . Multi-mode resource constrained project scheduling problem (MRCPSP)

This is avariant of the project scheduling problem [14], a classical and well-known optimisationproblem in operations research. Given a number of activities and a set of renewableresources. Each activity is associated with a duration and demands for some resources.The activites are non-interrupted and there are precedence constraints which states thatsome activities can only start once some others are ﬁnished. The variant considered inthis paper is the multi-mode [18], where each activity may have multple modes. Eachmode dictates the duration and resource demands of the activity. The goal is to schedulethe activities and choose a mode for each of them so that the makespan (the latestcompletion time) is minimised. An E

SSENCE speciﬁcation of this problem is presentedin Appendix A (Figure 7).

The main CDP+I algorithm (Algorithm 1) and the SAT optimisation backend requiresmultiple solver calls. For CDP+I, each solver calls occur once per level when usingan AllSAT solver and once per solution when using a standard SAT solver. Solutionsfrom a level are used to produce dominance blocking constraints for the next level.Furthermore, level restriction constraints are both added and removed between levels.Likewise, for optimisation problems using a standard SAT backend, multiple solvercalls occur to apply three optimization strategies to reach to optimal value. In addi-tion to adding temporary constraints, the ability to remove added constraints is alsorequired. Adding constraints during search is relatively common even without an incre-mental process. However, removing constraints requires special treatment by the solverin question. A direct implementation of these algorithms would indeed call the solverseveral times and consequently would not beneﬁt from any learned clauses betweensolver calls.There are two main ways of maintaining learned clauses between solver calls. Theﬁrst option works by extracting learned clauses once the solver ﬁnishes the search andpost-processing them to keep a relevant subset for a future solver invocation. [24] usesa similar approach to learn candidate implied constraints from a learning solver. Thesecond option works by keeping the solver active, modifying the active model by post-ing additional constraints and restarting search. Adding new variables and constraintsin this way is a relatively common operation, available in ipasir , an incrementalityAPI for SAT solvers used in SAT competitions [10]. Removing constraints requires the

G Koc¸ak et al. assumptions machinery that is available in most modern SAT solvers. Constraints thatare going to be removed are posted as conditional new clauses dependent on an assump-tion. Hence, when the assumption is lifted (and the constraint is removed) any learnedclauses which depend on that assumption can be deactivated.We deﬁne a new API for SAT solvers that shares most of the functionality of ipasir , including methods for adding new clauses, adding assumptions, solving andretrieving solutions. We extend this basic API to also include methods for reportingdetailed statistics about learned clauses and the solver’s state, in addition to triggeringsolution callbacks. Our extended API is implemented using the Rust programming lan-guage. It works with SAT solvers

GLUCOSE , CADICAL and

MINISAT and the AllSATsolver

NBC MINISAT ALL . Our Rust implementation encapsulates the required func-tionality of these solvers and compiles them into a shared library.The entire pipeline of tools starts with C

ONJURE , which produces an E

SSENCE P RIME model for each problem class. A modiﬁed S

AVILE R OW is then used to instan-tiate the problem class model using a given data ﬁle, preprocessing it using M INION to shave domains, and then encoding into SAT using the standard encodings found inS

AVILE R OW [20]. Prior to our work, S AVILE R OW worked by producing a DIMACSﬁle that has the entire encoding in it and calling a SAT solver on this ﬁle. Thanks tothe new API we deﬁne and implement, S AVILE R OW now skips building this ﬁle anddirectly makes calls to the SAT solver to create the model.Our solver API layer is implemented in Rust while S AVILE R OW was implementedin Java. We use the Java Native Interface (JNI) to integrate the API layer into S AVILE R OW . To demonstrate the effectiveness of keeping SAT learnt clauses between levels duringthe optimisation process using native interaction, we evaluate the three optimisationstrategies explained in Section 2.2 on 928 MRCPSP instances from the PSPlib [14]. TheSAT solver

GLUCOSE [3] is combined with each of the three optimisation strategies. Wealso compare the the resulting performance with Open-WBO [17], a MaxSAT solverand with

Chuffed [7], a learning CP solver.Each run on an instance is given a time limit of one CPU hour, and is repeated threetimes. The average solving time is recorded. The comparison of the usage of native in-teraction on

GLUCOSE is shown in Figure 2. Results suggest that for all three strategies,the native interaction boosts the efﬁciency signiﬁcantly on all tested instances.Comparison against Open-WBO and Chuffed are plotted in Figure 3. While in theﬁrst ﬁgure only includes the default SAT strategies, the second ﬁgure replaces themwith their native equivalents. Results suggest that the native interaction create a drasticperformance improvement for the SAT backend

GLUCOSE and results on these probleminstances are competitive against the two established optimisation solvers. fﬁcient Incremental Modelling and Solving 9

Without native interaction W i t h n a t i v e i n t e r a c t i o n glucose-bisect Without native interaction W i t h n a t i v e i n t e r a c t i o n glucose-linear Without native interaction W i t h n a t i v e i n t e r a c t i o n glucose-UNSAT Fig. 2: Solving time of

GLUCOSE with versus without native interaction on 928 MRCPSP in-stances.

Instances T i m e ( s ) glucose-bisectglucose-linearglucose-UNSATOpen-WBOChuffed (a) Without native interaction Instances T i m e ( s ) glucose-bisectglucose-linearglucose-UNSATOpen-WBOChuffed (b) With native interactionFig. 3: Solving time of GLUCOSE with three settings (bisect, linear and UNSAT), Open-WBOand Chuffed on 928 MRCPSP instances.

GLUCOSE ’s results are shown without (top) and with(bottom) native interaction.0 G Koc¸ak et al.

In order to evaluate the ef-fectiveness of maintaining learned clauses and using SAT assumptions between CDP+Ilevels, we solve 240 instances across 5 problem classes (see Section 3). Within a 6-hourtime limit, the native version solves 210 instances whereas pure CDP+I solves only 173instances. We believe this is due to needing fewer search nodes, which is made possibleby pruning large parts of the search tree via the learned clauses.Figure 4 presents the median number of search nodes per level. Since instances havedifferent numbers of levels, we normalise the number of levels on the horizontal axis.The plot also shows that the default CDP+I’s performance can vary amongst differentinstances, while CDP+I-native’s performance has more stability, indicating that CDP+I-native is more robust.

Normalized Levels0500010000 S o l v e r n o d e s (a) CFIS

Normalized Levels0500010000 S o l v e r n o d e s CDP+I CDP+I native (b)

GFIS

Normalized Levels050001000015000 S o l v e r n o d e s (c) MRIM

Normalized Levels02000 S o l v e r n o d e s (d) DFIM

Normalized Levels010002000 S o l v e r n o d e s (e) RSD

Normalized Levels020004000 S o l v e r n o d e s (f) All problem classesFig. 4: Median solver nodes per CDP+I level. Error bars range between the 45 th and the 55 th percentile. Horizontal axis represents normalised levels between instances. Native CDP+I usessigniﬁcantly fewer search nodes, thanks to accumulated learned clauses between levels. CDP+I-native uses fewer search nodes than pure CDP+I, due to maintaining a sub-set of learned clauses between levels. Figure 6a presents a comparison of total solver fﬁcient Incremental Modelling and Solving 11 run time of the two CDP+I variants on

NBC MINISAT ALL and shows that native in-teraction clearly results in faster run times as well. On PAR2 average, CDP+I-nativespends 493 seconds per instance whereas pure CDP+I spends 8,210 seconds.

A Case Study on

CFIS

Tumor 20% instance

To evaluate whether keeping learnedclauses improves efﬁciency, we will demonstrate this by examining one particular in-stance in detail as a case study.Figure 5 presents two plots. The ﬁrst shows that CDP+I-native uses fewer searchnodes on each level. The second illustrates the increased number of SAT clauses ineach level that result from keeping learnt clauses. The improved efﬁciency seen on theﬁrst plot is a direct result of the restricted search space from having more clauses.

Levels02000400060008000 S o l v e r n o d e s CDP+I CDP+I-native

Levels110000112500115000117500120000122500125000127500130000 N u m b e r o f S A T C l a u s e s Fig. 5: A comparison on one CDP+I instance with and without native interaction using

NBC MINISAT ALL

AllSAT solver. The example instance is

CFIS

Tumor with 20% frequency.Each plot is averaged out from a single model and multiple random seeds. The plot on the leftshows the number of solver nodes on each level, while the plot on the right shows the total numberof SAT clauses on each level.

Computational Evaluation with a Standard SAT Solver

CDP+I on a standard SATsolver operates by generating solution blocking clauses between each solver call in alevel. Once a level is completed, the dominance blocking clauses generated by S

AVILE R OW are encoded and passed on to the next level. The solution blocking clauses are notencoded again since they are redundant and already implied in the dominance blockingconstraints.Implementing a native interactive system on a standard SAT solver will bring bothcosts and beneﬁts to its performance. AllSAT solvers are already capable of keepinglearned information in a level due to their all solution enumeration behaviour. The nativeinteraction will grant the standard SAT solver this capability, in addition to making thelearned information persistent between levels. Thus, the increase of the standard SATsolver’s performance will be relatively much higher than the increase of the AllSATsolver’s performance. However, since we will still be using solution blocking clauses in a level and since the system cannot eliminate the redundant solution blocking clausesonce the level is done, the standard SAT model might expand far beyond its non-nativeequivalent. AllSAT solvers are not susceptible to this because they can operate withoutthe use of solution blocking clauses, regardless of whether they use native interaction. CDP+I-native NBC10 C D P + I N B C (a) Comparing total solver time using the All-SAT solver NBC MINISAT ALL . CDP+I-native Glucose10 C D P + I G l u c o s e (b) Comparing total solver time using the stan-dard SAT solver GLUCOSE .Fig. 6: Comparison plot between pure CDP+I and CDP+I-native. The time limit is 6 hours perinstance. Each data point is averaged out from a single model and multiple random seeds.

Figure 6b illustrates a comparison of CDP+I with and without native interactionusing the standard SAT solver

GLUCOSE . Native interaction increases the performanceamongst all instances signiﬁcantly. The results also suggest that the anticipated decreasein performance due to the expansion of the model did not outweigh the increase pro-vided by native interaction.In this section we have evaluated the effect of native interaction on the performanceof CDP+I. We conducted our analysis on an AllSAT solver and a standard SAT solver.In the next section we evaluate the conﬁguration space of CDP+I-native.

We have proposed and implemented a new native interaction component to bridge thegap between low level SAT solving and higher level model compilation in S

AVILE R OW . We integrated this component into S AVILE R OW to be able to use in the CDP+Iframework and optimization problems. Our experiments on different pattern miningtasks and an optimization problem (MRCPSP) show that the native component boostedsolving performance signiﬁcantly. This interaction enabled accessing SAT assumptionsto encode level information in a transparent way and also made learned informationpersistent across multiple runs. fﬁcient Incremental Modelling and Solving 13 Future work includes evaluating the native interaction component with differentproblem classes. We believe this native interaction can be a viable option for multi ob-jective optimization tasks as well. Additionally, there is a large space of possible con-ﬁgurable options which is yet to cover, including different modelling and reformulationmethods, other SAT solvers and SMT solvers.

Acknowledgements

This work is supported by EPSRC grant EP/P015638/1. NguyenDang is a Leverhulme Trust Early Career Fellow (ECF-2020-168).

References

1. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: 20th int.conf. very large data bases, VLDB. vol. 1215, pp. 487–499 (1994)2. Akg¨un, ¨O., Frisch, A.M., Gent, I.P., Hussain, B.S., Jefferson, C., Kotthoff, L., Miguel, I.,Nightingale, P.: Automated symmetry breaking and model selection in conjure. In: Inter-national Conference on Principles and Practice of Constraint Programming. pp. 107–116.Springer (2013)3. Audemard, G., Simon, L.: On the glucose sat solver. International Journal on Artiﬁcial Intel-ligence Tools (01), 1840001 (2018)4. Bonchi, F., Lucchese, C.: On closed constrained frequent pattern mining. In: Fourth IEEEInternational Conference on Data Mining (ICDM’04). pp. 35–42. IEEE (2004)5. Boulicaut, J.F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by means offree-sets. In: European Conference on Principles of Data Mining and Knowledge Discovery.pp. 75–85. Springer (2000)6. Cheng, H., Yan, X., Han, J., Hsu, C.W.: Discriminative frequent pattern analysis for effectiveclassiﬁcation. In: 2007 IEEE 23rd International Conference on Data Engineering. pp. 716–725. IEEE (2007)7. Chu, G., Stuckey, P.J.: Chuffed solver description, 20148. De Raedt, L., Guns, T., Nijssen, S.: Constraint programming for itemset mining. In: SIGKDDinternational conference on Knowledge discovery and data mining. pp. 204–212. ACM(2008)9. Gent, I.P., Jefferson, C., Miguel, I.: Minion: A fast scalable constraint solver. In: ECAI.vol. 141, pp. 98–102 (2006)10. J¨arvisalo, M., Le Berre, D., Roussel, O., Simon, L.: The international sat solver competitions.Ai Magazine (1), 89–92 (2012)11. Koc¸ak, G., Akg¨un, ¨O., Guns, T., Miguel, I.: Towards improving solution dominance withincomparability conditions: A case-study using generator itemset mining. arXiv preprintarXiv:1910.00505 (2019)12. Koc¸ak, G., Akg¨un, ¨O., Guns, T., Miguel, I.: Exploiting incomparability in solution domi-nance: Improving general purpose constraint-based mining. In: ECAI (2020)13. Koc¸ak, G., Akg¨un, ¨O., Miguel, I., Nightingale, P.: Closed frequent itemset mining with ar-bitrary side constraints. In: 2018 IEEE International Conference on Data Mining Workshops(ICDMW). pp. 1224–1232. IEEE (2018)14. Kolisch, R., Sprecher, A.: Psplib-a project scheduling problem library: Or software-orsepoperations research software exchange program. European journal of operational research (1), 205–216 (1997)15. Lemmerich, F., Rohlfs, M., Atzmueller, M.: Fast discovery of relevant subgroup patterns. In:Twenty-Third International FLAIRS Conference (2010)4 G Koc¸ak et al.16. Marques-Silva, J., Lynce, I., Malik, S.: Conﬂict-driven clause learning sat solvers. In: Hand-book of satisﬁability, pp. 131–153. ios Press (2009)17. Martins, R., Manquinho, V., Lynce, I.: Open-wbo: A modular maxsat solver. In: InternationalConference on Theory and Applications of Satisﬁability Testing. pp. 438–445. Springer(2014)18. Mori, M., Tseng, C.C.: A genetic algorithm for multi-mode resource constrained projectscheduling problem. European Journal of Operational Research (1), 134–141 (1997)19. Negrevergne, B., Dries, A., Guns, T., Nijssen, S.: Dominance programming for itemset min-ing. In: 2013 IEEE 13th International Conference on Data Mining. pp. 557–566. IEEE (2013)20. Nightingale, P., Akg¨un, ¨O., Gent, I.P., Jefferson, C., Miguel, I., Spracklen, P.: Automaticallyimproving constraint models in savile row. Artiﬁcial Intelligence , 35–61 (2017)21. Nightingale, P., Rendl, A.: Essence’ description (2016), arXiv:1601.02865 [cs.AI]22. Nightingale, P., Spracklen, P., Miguel, I.: Automatically improving sat encoding of constraintproblems through common subexpression elimination in savile row. In: International Confer-ence on Principles and Practice of Constraint Programming. pp. 330–340. Springer (2015)23. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets forassociation rules. In: International Conference on Database Theory. pp. 398–416. Springer(1999)24. Shishmarev, M., Mears, C., Tack, G., de la Banda, M.G.: Learning from learning solvers. In:International conference on principles and practice of constraint programming. pp. 455–472.Springer (2016)25. Szathmary, L., Napoli, A., Valtchev, P.: Towards rare itemset mining. In: 19th IEEE Interna-tional Conference on Tools with Artiﬁcial Intelligence (ICTAI 2007). vol. 1, pp. 305–312.IEEE (2007)26. Zaki, M.J.: Scalable algorithms for association mining. IEEE transactions on knowledge anddata engineering (3), 372–390 (2000)fﬁcient Incremental Modelling and Solving 15 A Essence speciﬁcation for MRCPSP language Essence given nonRenewableResources new type enum given renewableResources new type enum given jobs new type enum given startDummy, endDummy : jobs given modes new type enum given renewableLimits: function ( total ) renewableResources --> intgiven nonRenewableLimits : function ( total ) nonRenewableResources --> intgiven successors : function ( total ) jobs --> set of jobs given renewableResourceUsage : function (jobs, modes, renewableResources) --> intgiven nonRenewableResourceUsage : function (jobs, modes, nonRenewableResources) --> intgiven duration : function (jobs,modes) --> intgiven horizon : intletting timesRange be domain int (1..horizon) find start: function ( total ) jobs --> timesRange find mode: function ( total ) jobs --> modes find jobActive: function ( total ) (jobs,timesRange) --> boolsuch thatforAll job : jobs . forAll jobSuccessor in successors(job) .start(jobSuccessor) >= start(job) + duration((job,mode(job))) such thatforAll job : jobs . forAll time : timesRange .jobActive((job,time)) <->(time >= start(job) /\ time < start(job) + duration((job,mode(job)))) such thatforAll resource : nonRenewableResources . sum ([nonRenewableResourceUsage((job, mode(job), resource) )| job : jobs])<= nonRenewableLimits(resource) such thatforAll resource : renewableResources . forAll time : timesRange . sum ([renewableResourceUsage((job,mode(job),resource)) |job : jobs, jobActive((job,time))])<= renewableLimits(resource) such that start(startDummy)=1 minimising start(endDummy)start(endDummy)