[PDF] Collaborative Management of Benchmark Instances and their Attributes

Abstract

Experimental evaluation is an integral part in the design process of algorithms. Publicly available benchmark instances are widely used to evaluate methods in SAT solving. For the interpretation of results and the design of algorithm portfolios their attributes are crucial. Capturing the interrelation of benchmark instances and their attributes is considerably simplified through our specification of a benchmark instance identifier. Thus, our tool increases the availability of both by providing means to manage and retrieve benchmark instances by their attributes and vice versa. Like this, it facilitates the design and analysis of SAT experiments and the exchange of results.

Full PDF

CCollaborative Management of BenchmarkInstances and their Attributes

Markus Iser, Luca Springer, Carsten Sinz

Karlsruhe Institute of Technology (KIT), Germany {markus.iser,carsten.sinz}@[email protected]

Abstract.

Experimental evaluation is an integral part in the designprocess of algorithms. Publicly available benchmark instances are widelyused to evaluate methods in SAT solving. For the interpretation of re-sults and the design of algorithm portfolios their attributes are crucial.Capturing the interrelation of benchmark instances and their attributesis considerably simpliﬁed through our speciﬁcation of a benchmark in-stance identiﬁer. Thus, our tool increases the availability of both byproviding means to manage and retrieve benchmark instances by theirattributes and vice versa. Like this, it facilitates the design and analysisof SAT experiments and the exchange of results.

SAT benchmark instances are used to compare and evaluate the performance ofstate-of-the-art SAT solvers, e.g., in international competitive events [2]. Mostexperiments in research on SAT solving are based on benchmark instances sub-mitted to and compiled for the annual SAT competitions [10,11].Research in automated algorithm selection or conﬁguration shows that theperformance of the algorithmic method under test can heavily rely on speciﬁcattributes of the benchmark instances in use [15,18,5]. Some strategies in SATsolving work well on speciﬁc types of problems but so not on others.In our project “Global Benchmark Database” (GBD), we collect attributesof SAT instances and develop tools to organize, distribute and query that data.The core contribution of GBD is the speciﬁcation of a benchmark instance iden-tiﬁer, which we use to associate benchmark attributes such as solver runtimesor problem families. The initial ideas have been presented and discussed at thePragmatics of SAT Workshop 2018 [13].GBD ﬁlls a gap in practical SAT research due to several reasons. Benchmarkinstance feature data which is crucial for a deep analysis of experimental results isnot always available and if so it is not easily retrievable. Furthermore, associationof instance feature data to actual instances based on ﬁlenames is unreliable andsometimes a matter of guessing. Compilations of SAT instances with speciﬁcattributes are hard to obtain and many existing compilations contain duplicateinstances. a r X i v : . [ c s . A I] S e p Benchmark Instance Identiﬁcation

Maintenance and distribution of benchmark instances and their attributes facesthe problem of instance identiﬁcation. Instances can be huge and their ﬁlenamesare not reliable for identiﬁcation, such that, compilations of benchmark instancesoften contain duplicates.We solve these problems by specifying a hash-based instance identiﬁer for ﬁlesin DIMACS CNF [9]. Due to its ubiquity, we use the md5sum hash function.For robustness of this identiﬁer, we specify a sequence of normalization stepsincluding the removal of comments and normalization of white-space characters.Function GBD Hash speciﬁes the complete procedure.The exact speciﬁcation of normalization steps used in

GBD Hash has beendiscussed in the Pragmatics of SAT Workshop 2018. A strong argument againstcapturing diﬀerent kinds of isomorphisms by

GBD Hash , such as clause orderor variable names is that solver runtimes associated with two given isomorphicinstances can diverge tremendously, e.g., due to branching order.Associating benchmark instance attributes with

GBD Hash has the advantagethat it becomes easy to exchange meta-information about benchmark instances.Attributes such as instance family, author or result (SAT or UNSAT) can easilybe made available and used to compile instance sets of speciﬁc properties. Fur-thermore, it is much easier to persist and aggregate that data for future analysis.In order to capture equivalence classes, dedicated identiﬁers can be associatedwith

GBD Hash .Note that we recorded no hash collisions while using

GBD Hash . However,compilations of benchmark instances might contain duplicates. We analyzed thenominal and actual numbers of distinct benchmark instances used in previouscompetitive events associated with SAT Conference from 2006 to 2019.In some past competitions the organizers corrected the initial nominal num-ber of benchmark instances and evaluated the results with respect to the actualamount of distinct instances. However, in many cases these deviations have notbeen noticed. Table 1 lists these numbers for all tracks where these numbersdiverge without being noticed by the organizers.The most extreme result is exposed for the instances used in the AgileTracks of SAT Competitions 2016 and 2017 [6,7]. For only of these instances,no duplicate exists in the dataset. For other instances, there exist up to duplicate ﬁles in the dataset.

Function

GBD Hash(Benchmark Instance)

Input:

Benchmark Instance (DIMACS)

Output:

GBD Hash remove comments and header replace sequences of white-spaces and line-breaks by a single blank append trailing zero to last clause if missing return md5sum of the remaining content able 1: Divergent nominal and actual numbers of distinct benchmark instances Competitive Event Number of InstancesCompetition Track Nominal ActualSAT Competition 2011 MUS 300 299SAT Challenge 2012 Application 600 596SAT Challenge 2012 Portfolio 600 599SAT Competition 2014 Application 300 299SAT Race 2015 Main 300 291SAT Race 2015 Parallel 100 96SAT Competition 2016 Agile 5000 1580SAT Competition 2016 Application 300 299SAT Competition 2017 Agile 5000 2379SAT Competition 2017 Random 300 294SAT Race 2019 Main 400 399

GBD Tools include the GBD command-line tool gbd (Section 3.1) and the GBDweb service gbd-server (Section 3.2). Both applications are available in thePython Package Index (PyPI) [1] . For contributions, we maintain a publicrepository on Github [14].In order to use gbd or gbd-server a path to a database in form of an SQLiteﬁle has to be speciﬁed, by using either the appropriate command line parameteror the environment variable GBD_DB . We maintain a public database providingseveral benchmark attributes for all instances available at SAT competition web-sites [2] including benchmark instances of SAT Competitions 2006 to 2019. Ourdatabase can be downloaded under http://gbd.iti.kit.edu/getdatabase . In order to complete the setup of

GBD Tools , the paths to the locally availablebenchmark instances should be registered in the database by using the command gbd init (cid:104) path (cid:105) , which recursively scans the directory under (cid:104) path (cid:105) and savesthe association of local benchmark instance paths and their

GBD Hash . Further gbd commands, e.g. import and bootstrap , assist in bootstrap-ping and management of instance attributes. The command gbd get (cid:104) query (cid:105) -r (cid:104) attr (cid:105) is used to query for instances and their attributes. Table 2 summa-rizes the commands of the GBD command line interface. Details about commandusage are given via gbd (cid:104) command (cid:105) --help .Figure 1 depicts the query language of GBD Tools in Extended Backus-Naur-Form (EBNF) [3]. Thus,

GBD Tools facilitate querying for benchmark instanceswith speciﬁc attributes by automatic translation of the simpliﬁed queries to SQLcommands. pip3 install gbd-tools We strongly encourage users to enable parallel initialization with --jobs= (cid:104) cores (cid:105) . able 2: Commands of GBD Command Line Interface Initialization and Bootstrapping gbd init

Initialize Database with Local Instance Paths gbd bootstrap

Bootstrap Database with Hard-coded Set of Attributes gbd import

Import Attributes from Given CSV File

Attribute Management gbd group

Create new Attribute in Database gbd set

Set Attribute Value for Given Instance

Attribute Retrieval gbd get

Query for Instances and/or Attributes gbd hash

Calculate GBD Hash for Given File

The GBD command-line tool is used for management and retrieval of instanceattributes. Its seamless integration facilitates experimentation and analysis ofresults. Example 1 shows an exemplary query for benchmark instances used inSAT Race 2019. The query result is resolved to display their local paths.

Example 1. gbd get "competition_track = main_2019" -r local

In an experiment, e.g., the thus obtained paths can be used as input to aSAT solver. Runtimes or other newly calculated instance attributes can thenbe associated with the corresponding

GBD Hash and stored in the database aswell. The thus obtained dataset can be further analyzed with respect to otheravailable instance attributes.Example 2 shows an exemplary query for benchmark instances with morethan , , variables. The query result is resolved to display their numberof variables and clauses. Example 2. gbd get "variables > 5000000" -r variables clauses

Example 3 shows an exemplary query for benchmark instances with morethan of clauses being horn clauses. The query result is resolved to displaytheir instance family. (cid:104) start (cid:105) = (cid:104) query (cid:105) | (cid:15) (cid:104) query (cid:105) = ’ ( ’ , (cid:104) query (cid:105) , ’ ) ’ | (cid:104) query (cid:105) , ( ’ and ’ | ’ or ’ ) , (cid:104) query (cid:105) | (cid:104) constraint (cid:105)(cid:104) constraint (cid:105) = (cid:104) name (cid:105) , ( ’ = ’ | ’ != ’ ) , (cid:104) value (cid:105) | (cid:104) name (cid:105) , ’ like ’ , [ ’ % ’ ] , (cid:104) value (cid:105) , [ ’ % ’ ] | ’ ( ’ , (cid:104) term (cid:105) , ( ’ = ’ | ’ != ’ | ’ < ’ | ’ > ’ ) , (cid:104) term (cid:105) , ’ ) ’ (cid:104) term (cid:105) = (cid:104) name (cid:105) | (cid:104) number (cid:105) | ’ ( ’ , (cid:104) term (cid:105) , ( ’ + ’ | ’ - ’ | ’ * ’ | ’ / ’ ) , (cid:104) term (cid:105) , ’ ) ’ (cid:104) name (cid:105) = (cid:104) letter (cid:105) , {(cid:104) letter (cid:105) | (cid:104) digit (cid:105) | ’ _ ’ }(cid:104) number (cid:105) = [ ’ - ’ ] (cid:104) digit (cid:105){(cid:104) digit (cid:105)} [ ’ . ’ (cid:104) digit (cid:105){(cid:104) digit (cid:105)} ] (cid:104) value (cid:105) = {(cid:104) letter (cid:105) | (cid:104) digit (cid:105) | ’ _ ’ | ’ . ’ | ’ - ’ | ’ / ’ }(cid:104) letter (cid:105) = ’ a ’ | ’ b ’ | · · · | ’ z ’ | ’ A ’ | ’ B ’ | · · · | ’ Z ’ (cid:104) digit (cid:105) = ’ ’ | ’ ’ | · · · | ’ ’ Fig. 1: Query Language of GBD Command Line Interfaceig. 2: Screenshot of GBD Website https://gbd.iti.kit.edu

Example 3. gbd get "(clauses_horn / clauses) > .9" -r family

GBD Tools also includes the tool gbd-server which provides a web interface(Figure 2) to the GBD database. The web interface allows to query the databasefrom the web browser in order to download benchmark instances or their at-tributes from our archive. Our instance of GBD Server is hosted at http://gbd.iti.kit.edu and provides all benchmark instance of competitive eventsaﬃliated with SAT Conference dating back to SAT Competition 2006.Furthermore, gbd-server exposes a couple of micro-services (Table 3) whichcan be used to access speciﬁc attributes or download speciﬁc benchmark in-stances by using the instance identiﬁer. Example 4 shows an exemplary access toour database via a GBD micro-service. The URI returns the known SAT/UNSATresult of the instance with the given

GBD Hash . Example 4. wget http://gbd.iti.kit.edu/attribute/result/ (cid:104) gbd-hash (cid:105)

An early approach to create a structured public collection of benchmark in-stances for experiments in SAT solving was Satlib [12]. Data about the instanceTable 3: URI Schemes of GBD Micro-Services

Attribute Value http://gbd.iti.kit.edu/attribute/ (cid:104) name (cid:105) / (cid:104) gbd-hash (cid:105) All Attribute Values http://gbd.iti.kit.edu/info/ (cid:104) gbd-hash (cid:105)

Download Instance http://gbd.iti.kit.edu/file/ (cid:104) gbd-hash (cid:105) ompilations used in the annual SAT Competitions can be found in the accordingproceedings [11]. Calculable instance attributes are used for instance classiﬁca-tion in order to solve the per-instance algorithm selection problem [18,15,4,8,5]and have also been employed to reduce redundancy in experimentation [16]. Bigdata analysis has recently also been used to predict the usefulness of learnedclauses [17].

GBD Hash enables the compilation of sets of unique benchmark instancesbased on their attributes. Furthermore, improved accessibility of benchmark in-stances and their attributes facilitates the diﬀerentiated analysis of new algo-rithms and heuristics in order to gain a better understanding of the methodunder analysis.Recent work on

GBD Tools includes the development of specialized function-ality for compiling competition sets of benchmark instances and the analysis ofruntime experiments for competitive events. We plan to assist in the transferto other domains by providing an architecture that facilitates the integration ofdedicated identiﬁers of benchmark instances used in other domains.We continuously integrate further instance attributes in our database. Thiscould include the runtimes of historic SAT solvers in order to study the develop-ment and progress in SAT solving. Future work includes the integration of toolsfor instance classiﬁcation and data visualization. As the focus

GBD Tools is oncollaboration, future versions will include features to simplify the aggregation ofinstance attributes from multiple data sources.

References

1. Global Benchmark Database Tool, https://pypi.org/project/global-benchmark-database-tool/

2. SAT Competition Website,

3. ISO/IEC 14977:1996 information technology - syntactic metalanguage - ExtendedBNF (1996)4. Alfonso, E.M., Manthey, N.: New cnf features and formula classiﬁcation. In: FifthPragmatics of SAT workshop, a workshop of the SAT 2014 conference, July 13,2014, Vienna, Austria (2014)5. Ansótegui, C., Bonet, M.L., Giráldez-Cru, J., Levy, J.: Structure features for SATinstances classiﬁcation. J. Applied Logic , 27–39 (2017)6. Balyo, T., Heule, M.J.: Proceedings of sat competition 2016; solver and benchmarkdescriptions (2016)7. Balyo, T., Heule, M.J., Järvisalo, M.: Proceedings of sat competition 2017; solverand benchmark descriptions (2017)8. Bischl, B., Kerschke, P., Kotthoﬀ, L., Lindauer, M.T., Malitsky, Y., Fréchette, A.,Hoos, H.H., Hutter, F., Leyton-Brown, K., Tierney, K., Vanschoren, J.: Aslib: Abenchmark library for algorithm selection. Artif. Intell. , 41–58 (2016)9. DIMACS: Satisﬁability suggested format (1993),

10. Heule, M.J.H., Järvisalo, M., Suda, M.: Proceedings of sat competition 2018; solverand benchmark descriptions (2018)11. Heule, M.J.H., Järvisalo, M., Suda, M.: Proceedings of sat competition 2019; solverand benchmark descriptions (2019)2. Hoos, H., Stützle, T.: SATLIB: An online resource for research on SAT. SAT 2000pp. 283–292 (04 2000)13. Iser, M., Sinz, C.: A problem meta-data library for research in SAT. In: Proceedingsof Pragmatics of SAT 2018, Oxford, UK, July 7, 2018. pp. 144–152 (2018)14. Iser, M., Springer, L.: GBD, https://github.com/Udopia/gbd

15. Kadioglu, S., Malitsky, Y., Sellmann, M., Tierney, K.: ISAC - instance-speciﬁcalgorithm conﬁguration. In: ECAI 2010 - 19th European Conference on ArtiﬁcialIntelligence, Lisbon, Portugal, August 16-20, 2010, Proceedings. pp. 751–756 (2010)16. Möhle, S., Manthey, N.: Better evaluations by analyzing benchmark structure. In:Seventh Pragmatics of SAT workshop, a workshop of the SAT 2016 conference,July 4th, 2016, Bordeaux, France (2016)17. Soos, M., Kulkarni, R., Meel, K.S.: Crystalball: Gazing in the black box of SATsolving. In: Janota, M., Lynce, I. (eds.) Theory and Applications of SatisﬁabilityTesting - SAT 2019 - 22nd International Conference, SAT 2019, Lisbon, Portugal,July 9-12, 2019, Proceedings. Lecture Notes in Computer Science, vol. 11628, pp.371–38718. Xu, L., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Satzilla: Portfolio-based algo-rithm selection for SAT. CoRR abs/1111.2249 (2011),(2011),