Collaborative Management of Benchmark Instances and their Attributes
CCollaborative Management of BenchmarkInstances and their Attributes
Markus Iser, Luca Springer, Carsten Sinz
Karlsruhe Institute of Technology (KIT), Germany {markus.iser,carsten.sinz}@[email protected]
Abstract.
Experimental evaluation is an integral part in the designprocess of algorithms. Publicly available benchmark instances are widelyused to evaluate methods in SAT solving. For the interpretation of re-sults and the design of algorithm portfolios their attributes are crucial.Capturing the interrelation of benchmark instances and their attributesis considerably simplified through our specification of a benchmark in-stance identifier. Thus, our tool increases the availability of both byproviding means to manage and retrieve benchmark instances by theirattributes and vice versa. Like this, it facilitates the design and analysisof SAT experiments and the exchange of results.
SAT benchmark instances are used to compare and evaluate the performance ofstate-of-the-art SAT solvers, e.g., in international competitive events [2]. Mostexperiments in research on SAT solving are based on benchmark instances sub-mitted to and compiled for the annual SAT competitions [10,11].Research in automated algorithm selection or configuration shows that theperformance of the algorithmic method under test can heavily rely on specificattributes of the benchmark instances in use [15,18,5]. Some strategies in SATsolving work well on specific types of problems but so not on others.In our project “Global Benchmark Database” (GBD), we collect attributesof SAT instances and develop tools to organize, distribute and query that data.The core contribution of GBD is the specification of a benchmark instance iden-tifier, which we use to associate benchmark attributes such as solver runtimesor problem families. The initial ideas have been presented and discussed at thePragmatics of SAT Workshop 2018 [13].GBD fills a gap in practical SAT research due to several reasons. Benchmarkinstance feature data which is crucial for a deep analysis of experimental results isnot always available and if so it is not easily retrievable. Furthermore, associationof instance feature data to actual instances based on filenames is unreliable andsometimes a matter of guessing. Compilations of SAT instances with specificattributes are hard to obtain and many existing compilations contain duplicateinstances. a r X i v : . [ c s . A I] S e p Benchmark Instance Identification
Maintenance and distribution of benchmark instances and their attributes facesthe problem of instance identification. Instances can be huge and their filenamesare not reliable for identification, such that, compilations of benchmark instancesoften contain duplicates.We solve these problems by specifying a hash-based instance identifier for filesin DIMACS CNF [9]. Due to its ubiquity, we use the md5sum hash function.For robustness of this identifier, we specify a sequence of normalization stepsincluding the removal of comments and normalization of white-space characters.Function GBD Hash specifies the complete procedure.The exact specification of normalization steps used in
GBD Hash has beendiscussed in the Pragmatics of SAT Workshop 2018. A strong argument againstcapturing different kinds of isomorphisms by
GBD Hash , such as clause orderor variable names is that solver runtimes associated with two given isomorphicinstances can diverge tremendously, e.g., due to branching order.Associating benchmark instance attributes with
GBD Hash has the advantagethat it becomes easy to exchange meta-information about benchmark instances.Attributes such as instance family, author or result (SAT or UNSAT) can easilybe made available and used to compile instance sets of specific properties. Fur-thermore, it is much easier to persist and aggregate that data for future analysis.In order to capture equivalence classes, dedicated identifiers can be associatedwith
GBD Hash .Note that we recorded no hash collisions while using
GBD Hash . However,compilations of benchmark instances might contain duplicates. We analyzed thenominal and actual numbers of distinct benchmark instances used in previouscompetitive events associated with SAT Conference from 2006 to 2019.In some past competitions the organizers corrected the initial nominal num-ber of benchmark instances and evaluated the results with respect to the actualamount of distinct instances. However, in many cases these deviations have notbeen noticed. Table 1 lists these numbers for all tracks where these numbersdiverge without being noticed by the organizers.The most extreme result is exposed for the instances used in the AgileTracks of SAT Competitions 2016 and 2017 [6,7]. For only of these instances,no duplicate exists in the dataset. For other instances, there exist up to duplicate files in the dataset.
Function
GBD Hash(Benchmark Instance)
Input:
Benchmark Instance (DIMACS)
Output:
GBD Hash remove comments and header replace sequences of white-spaces and line-breaks by a single blank append trailing zero to last clause if missing return md5sum of the remaining content able 1: Divergent nominal and actual numbers of distinct benchmark instances Competitive Event Number of InstancesCompetition Track Nominal ActualSAT Competition 2011 MUS 300 299SAT Challenge 2012 Application 600 596SAT Challenge 2012 Portfolio 600 599SAT Competition 2014 Application 300 299SAT Race 2015 Main 300 291SAT Race 2015 Parallel 100 96SAT Competition 2016 Agile 5000 1580SAT Competition 2016 Application 300 299SAT Competition 2017 Agile 5000 2379SAT Competition 2017 Random 300 294SAT Race 2019 Main 400 399
GBD Tools include the GBD command-line tool gbd (Section 3.1) and the GBDweb service gbd-server (Section 3.2). Both applications are available in thePython Package Index (PyPI) [1] . For contributions, we maintain a publicrepository on Github [14].In order to use gbd or gbd-server a path to a database in form of an SQLitefile has to be specified, by using either the appropriate command line parameteror the environment variable GBD_DB . We maintain a public database providingseveral benchmark attributes for all instances available at SAT competition web-sites [2] including benchmark instances of SAT Competitions 2006 to 2019. Ourdatabase can be downloaded under http://gbd.iti.kit.edu/getdatabase . In order to complete the setup of
GBD Tools , the paths to the locally availablebenchmark instances should be registered in the database by using the command gbd init (cid:104) path (cid:105) , which recursively scans the directory under (cid:104) path (cid:105) and savesthe association of local benchmark instance paths and their
GBD Hash . Further gbd commands, e.g. import and bootstrap , assist in bootstrap-ping and management of instance attributes. The command gbd get (cid:104) query (cid:105) -r (cid:104) attr (cid:105) is used to query for instances and their attributes. Table 2 summa-rizes the commands of the GBD command line interface. Details about commandusage are given via gbd (cid:104) command (cid:105) --help .Figure 1 depicts the query language of GBD Tools in Extended Backus-Naur-Form (EBNF) [3]. Thus,
GBD Tools facilitate querying for benchmark instanceswith specific attributes by automatic translation of the simplified queries to SQLcommands. pip3 install gbd-tools We strongly encourage users to enable parallel initialization with --jobs= (cid:104) cores (cid:105) . able 2: Commands of GBD Command Line Interface Initialization and Bootstrapping gbd init
Initialize Database with Local Instance Paths gbd bootstrap
Bootstrap Database with Hard-coded Set of Attributes gbd import
Import Attributes from Given CSV File
Attribute Management gbd group
Create new Attribute in Database gbd set
Set Attribute Value for Given Instance
Attribute Retrieval gbd get
Query for Instances and/or Attributes gbd hash
Calculate GBD Hash for Given File
The GBD command-line tool is used for management and retrieval of instanceattributes. Its seamless integration facilitates experimentation and analysis ofresults. Example 1 shows an exemplary query for benchmark instances used inSAT Race 2019. The query result is resolved to display their local paths.
Example 1. gbd get "competition_track = main_2019" -r local
In an experiment, e.g., the thus obtained paths can be used as input to aSAT solver. Runtimes or other newly calculated instance attributes can thenbe associated with the corresponding
GBD Hash and stored in the database aswell. The thus obtained dataset can be further analyzed with respect to otheravailable instance attributes.Example 2 shows an exemplary query for benchmark instances with morethan , , variables. The query result is resolved to display their numberof variables and clauses. Example 2. gbd get "variables > 5000000" -r variables clauses
Example 3 shows an exemplary query for benchmark instances with morethan of clauses being horn clauses. The query result is resolved to displaytheir instance family. (cid:104) start (cid:105) = (cid:104) query (cid:105) | (cid:15) (cid:104) query (cid:105) = ’ ( ’ , (cid:104) query (cid:105) , ’ ) ’ | (cid:104) query (cid:105) , ( ’ and ’ | ’ or ’ ) , (cid:104) query (cid:105) | (cid:104) constraint (cid:105)(cid:104) constraint (cid:105) = (cid:104) name (cid:105) , ( ’ = ’ | ’ != ’ ) , (cid:104) value (cid:105) | (cid:104) name (cid:105) , ’ like ’ , [ ’ % ’ ] , (cid:104) value (cid:105) , [ ’ % ’ ] | ’ ( ’ , (cid:104) term (cid:105) , ( ’ = ’ | ’ != ’ | ’ < ’ | ’ > ’ ) , (cid:104) term (cid:105) , ’ ) ’ (cid:104) term (cid:105) = (cid:104) name (cid:105) | (cid:104) number (cid:105) | ’ ( ’ , (cid:104) term (cid:105) , ( ’ + ’ | ’ - ’ | ’ * ’ | ’ / ’ ) , (cid:104) term (cid:105) , ’ ) ’ (cid:104) name (cid:105) = (cid:104) letter (cid:105) , {(cid:104) letter (cid:105) | (cid:104) digit (cid:105) | ’ _ ’ }(cid:104) number (cid:105) = [ ’ - ’ ] (cid:104) digit (cid:105){(cid:104) digit (cid:105)} [ ’ . ’ (cid:104) digit (cid:105){(cid:104) digit (cid:105)} ] (cid:104) value (cid:105) = {(cid:104) letter (cid:105) | (cid:104) digit (cid:105) | ’ _ ’ | ’ . ’ | ’ - ’ | ’ / ’ }(cid:104) letter (cid:105) = ’ a ’ | ’ b ’ | · · · | ’ z ’ | ’ A ’ | ’ B ’ | · · · | ’ Z ’ (cid:104) digit (cid:105) = ’ ’ | ’ ’ | · · · | ’ ’ Fig. 1: Query Language of GBD Command Line Interfaceig. 2: Screenshot of GBD Website https://gbd.iti.kit.edu
Example 3. gbd get "(clauses_horn / clauses) > .9" -r family
GBD Tools also includes the tool gbd-server which provides a web interface(Figure 2) to the GBD database. The web interface allows to query the databasefrom the web browser in order to download benchmark instances or their at-tributes from our archive. Our instance of GBD Server is hosted at http://gbd.iti.kit.edu and provides all benchmark instance of competitive eventsaffiliated with SAT Conference dating back to SAT Competition 2006.Furthermore, gbd-server exposes a couple of micro-services (Table 3) whichcan be used to access specific attributes or download specific benchmark in-stances by using the instance identifier. Example 4 shows an exemplary access toour database via a GBD micro-service. The URI returns the known SAT/UNSATresult of the instance with the given
GBD Hash . Example 4. wget http://gbd.iti.kit.edu/attribute/result/ (cid:104) gbd-hash (cid:105)
An early approach to create a structured public collection of benchmark in-stances for experiments in SAT solving was Satlib [12]. Data about the instanceTable 3: URI Schemes of GBD Micro-Services
Attribute Value http://gbd.iti.kit.edu/attribute/ (cid:104) name (cid:105) / (cid:104) gbd-hash (cid:105) All Attribute Values http://gbd.iti.kit.edu/info/ (cid:104) gbd-hash (cid:105)
Download Instance http://gbd.iti.kit.edu/file/ (cid:104) gbd-hash (cid:105) ompilations used in the annual SAT Competitions can be found in the accordingproceedings [11]. Calculable instance attributes are used for instance classifica-tion in order to solve the per-instance algorithm selection problem [18,15,4,8,5]and have also been employed to reduce redundancy in experimentation [16]. Bigdata analysis has recently also been used to predict the usefulness of learnedclauses [17].
GBD Hash enables the compilation of sets of unique benchmark instancesbased on their attributes. Furthermore, improved accessibility of benchmark in-stances and their attributes facilitates the differentiated analysis of new algo-rithms and heuristics in order to gain a better understanding of the methodunder analysis.Recent work on
GBD Tools includes the development of specialized function-ality for compiling competition sets of benchmark instances and the analysis ofruntime experiments for competitive events. We plan to assist in the transferto other domains by providing an architecture that facilitates the integration ofdedicated identifiers of benchmark instances used in other domains.We continuously integrate further instance attributes in our database. Thiscould include the runtimes of historic SAT solvers in order to study the develop-ment and progress in SAT solving. Future work includes the integration of toolsfor instance classification and data visualization. As the focus
GBD Tools is oncollaboration, future versions will include features to simplify the aggregation ofinstance attributes from multiple data sources.
References
1. Global Benchmark Database Tool, https://pypi.org/project/global-benchmark-database-tool/
2. SAT Competition Website,
3. ISO/IEC 14977:1996 information technology - syntactic metalanguage - ExtendedBNF (1996)4. Alfonso, E.M., Manthey, N.: New cnf features and formula classification. In: FifthPragmatics of SAT workshop, a workshop of the SAT 2014 conference, July 13,2014, Vienna, Austria (2014)5. Ansótegui, C., Bonet, M.L., Giráldez-Cru, J., Levy, J.: Structure features for SATinstances classification. J. Applied Logic , 27–39 (2017)6. Balyo, T., Heule, M.J.: Proceedings of sat competition 2016; solver and benchmarkdescriptions (2016)7. Balyo, T., Heule, M.J., Järvisalo, M.: Proceedings of sat competition 2017; solverand benchmark descriptions (2017)8. Bischl, B., Kerschke, P., Kotthoff, L., Lindauer, M.T., Malitsky, Y., Fréchette, A.,Hoos, H.H., Hutter, F., Leyton-Brown, K., Tierney, K., Vanschoren, J.: Aslib: Abenchmark library for algorithm selection. Artif. Intell. , 41–58 (2016)9. DIMACS: Satisfiability suggested format (1993),
10. Heule, M.J.H., Järvisalo, M., Suda, M.: Proceedings of sat competition 2018; solverand benchmark descriptions (2018)11. Heule, M.J.H., Järvisalo, M., Suda, M.: Proceedings of sat competition 2019; solverand benchmark descriptions (2019)2. Hoos, H., Stützle, T.: SATLIB: An online resource for research on SAT. SAT 2000pp. 283–292 (04 2000)13. Iser, M., Sinz, C.: A problem meta-data library for research in SAT. In: Proceedingsof Pragmatics of SAT 2018, Oxford, UK, July 7, 2018. pp. 144–152 (2018)14. Iser, M., Springer, L.: GBD, https://github.com/Udopia/gbd
15. Kadioglu, S., Malitsky, Y., Sellmann, M., Tierney, K.: ISAC - instance-specificalgorithm configuration. In: ECAI 2010 - 19th European Conference on ArtificialIntelligence, Lisbon, Portugal, August 16-20, 2010, Proceedings. pp. 751–756 (2010)16. Möhle, S., Manthey, N.: Better evaluations by analyzing benchmark structure. In:Seventh Pragmatics of SAT workshop, a workshop of the SAT 2016 conference,July 4th, 2016, Bordeaux, France (2016)17. Soos, M., Kulkarni, R., Meel, K.S.: Crystalball: Gazing in the black box of SATsolving. In: Janota, M., Lynce, I. (eds.) Theory and Applications of SatisfiabilityTesting - SAT 2019 - 22nd International Conference, SAT 2019, Lisbon, Portugal,July 9-12, 2019, Proceedings. Lecture Notes in Computer Science, vol. 11628, pp.371–38718. Xu, L., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Satzilla: Portfolio-based algo-rithm selection for SAT. CoRR abs/1111.2249 (2011),(2011),