Interactive Visualization of Saturation Attempts in Vampire
IInteractive Visualization of Saturation Attempts inVampire
Bernhard Gleiss , Laura Kovács , , and Lena Schnedlitz TU Wien, Austria Chalmers University of Technology, Sweden
Abstract.
Many applications of formal methods require automated reasoningabout system properties, such as system safety and security. To improve the per-formance of automated reasoning engines, such as SAT/SMT solvers and first-order theorem prover, it is necessary to understand both the successful and failingattempts of these engines towards producing formal certificates, such as logicalproofs and/or models. Such an analysis is challenging due to the large numberof logical formulas generated during proof/model search. In this paper we focuson saturation-based first-order theorem proving and introduce the S AT V IS toolfor interactively visualizing saturation-based proof attempts in first-order theoremproving. We build S AT V IS on top of the world-leading theorem prover V AMPIRE ,by interactively visualizing the saturation attempts of V
AMPIRE in S AT V IS . Ourwork combines the automatic layout and visualization of the derivation graph in-duced by the saturation attempt with interactive transformations and search func-tionality. As a result, we are able to analyze and debug (failed) proof attempts ofV AMPIRE . Thanks to its interactive visualisation, we believe S AT V IS helps bothexperts and non-experts in theorem proving to understand first-order proofs andanalyze/refine failing proof attempts of first-order provers. Many applications of formal methods, such as program analysis and verification, re-quire automated reasoning about system properties, such as program safety, securityand reliability. Automated reasoners, such as SAT/SMT solvers [1,5] and first-ordertheorem provers [9,13], have therefore become a key backbone of rigorous system en-gineering. For example, proving properties over the computer memory relies on first-order reasoning with both quantifiers and integer arithmetic.Saturation-based theorem proving is the leading approach for automating reasoning infull first-order logic. In a nutshell, this approach negates a given goal and saturatesits given set of input formulas (including the negated goal), by deriving logical con-sequences of the input using a logical inference system, such as binary resolution orsuperposition. Whenever a contradiction (false) is derived, the saturation process ter-minates reporting validity of the input goal. State-of-the-art theorem provers, such asV
AMPIRE [9] and E [13], implement saturation-based proof search using the (ordered)superposition calculus [11]. These provers rely on powerful indexing algorithms, selec-tion functions and term orderings for making saturation-based theorem proving efficientand scalable to a large set of first-order formulas, as evidenced in the yearly CASC sys-tem competition of first-order provers [14]. a r X i v : . [ c s . L O ] J a n ver the past years, saturation-based theorem proving has been extended to first-orderlogic with theories, such as arithmetic, theory of arrays and algebraic datatypes [8].Further, first-class boolean sorts and if-then-else and let-in constructs have also beenintroduced as extensions to the input syntax of first-order theorem provers [7]. Thanksto these recent developments, first-order theorem provers became better suited in appli-cations of formal methods, being for example a competitive alternative to SMT-solvers[1,5] in software verification and program analysis. Recent editions of the SMT-COMP and CASC system competitions show, for example, that V AMPIRE successfully com-petes against the leading SMT solvers Z3 [5] and CVC4 [1] and vice-versa.By leveraging the best practices in first-order theorem proving in combination withSMT solving, in our recent work [3] we showed that correctness of a software programcan be reduced to a validity problem in first-order logic. We use V
AMPIRE to prove theresulting encodings, outperforming SMT solvers. Our initial results demonstrate thatfirst-order theorem proving is well-suited for applications of (relational) verification,such as safety and non-interference. Yet, our results also show that the performanceof the prover crucially depends on the logical representation of its input problem andthe deployed reasoning strategies during proof search. As such, users and developersof first-order provers, and automated reasoners in general, typically face the burden ofanalysing (failed) proof attempts produced by the prover, with the ultimate goal to re-fine the input and/or proof strategies making the prover succeed in proving its input.Understanding (some of) the reasons why the prover failed is however very hard andrequires a considerable amount of work by highly qualified experts in theorem proving,hindering thus the use of theorem provers in many application domains.In this paper we address this challenge and introduce the S AT V IS tool to ease the taskof analysing failed proof attempts in saturation-based reasoning . We designed S AT V IS to support interactive visualization of the saturation algorithm used in V AMPIRE , withthe goal to ease the manual analysis of V
AMPIRE proofs as well as failed proof attemptsin V
AMPIRE . Inputs to S AT V IS are proof (attempts) produced by V AMPIRE . Our toolconsists of (i) an explicit visualization of the DAG-structure of the saturation proof (at-tempt) of V
AMPIRE and (ii) interactive transformations of the DAG for pruning andreformatting the proof (attempt). In its current setting, S AT V IS can be used only inthe context of V AMPIRE . Yet, by parsing/translating proofs (or proof attempts) of otherprovers into the V
AMPIRE proof format, S AT V IS can be used in conjunction with otherprovers as well.When feeding V AMPIRE proofs to S AT V IS , S AT V IS supports both users and develop-ers of V AMPIRE to understand and refactor V
AMPIRE proofs, and to manually proofcheck soundness of V
AMPIRE proofs. When using S AT V IS on failed proof attempts ofV AMPIRE , S AT V IS supports users and developers of V AMPIRE to analyse how V AM - PIRE explored its search space during proof search, that is, to understand which clauseswere derived and why certain clauses have not been derived at various steps during sat-uration. By doing so, the S AT V IS proof visualisation framework gives valuable insightson how to revise the input problem encoding of V AMPIRE and/or implement domain- https://smt-comp.github.io/ pecific optimizations in V AMPIRE . We therefore believe that S AT V IS improves thestate-of-the-art in the use and applications of theorem proving at least in the followingscenarios: (i) helping V AMPIRE developers to debug and further improve V
AMPIRE , (ii)helping V
AMPIRE users to tune V
AMPIRE to their applications, by not treating V AM - PIRE as a black-box but by understanding and using its appropriate proof search options;and (iii) helping unexperienced users in saturation-based theorem proving to learn usingV
AMPIRE and first-order proving in general.
Contributions.
The contribution of this paper comes with the design of the S AT V IS tool for analysing proofs, as well as proof attempts of the V AMPIRE theorem prover.S AT V IS is available at: https://github.com/gleiss/saturation-visualization .We overview proof search steps in V AMPIRE specific to S AT V IS (Section 2), discuss thechallenges we faced for analysing proof attempts of V AMPIRE (Section 3), and describeimplementation-level details of S AT V IS Related work.
While standardizing the input format of automated reasoners is an ac-tive research topic, see e.g. the SMT-LIB [2] and TPTP [14] standards, coming up withan input standard for representing and analysing proofs and proof attempts of auto-mated reasoners has received so far very little attention. The TSTP library [14] providesinput/output standards for automated theorem proving systems. Yet, unlike S AT V IS ,TSTP does not analyse proof attempts but only supports the examination of first-orderproofs. We note that V AMPIRE proofs (and proof attempts) contain first-order formulaswith theories, which is not fully supported by TSTP.Using a graph-layout framework, for instance Graphviz [6], it is relatively straight-forward to visualize the DAG derivation graph induced by a saturation attempt of afirst-order prover. For example, the theorem prover E [13] is able to directly output itssaturation attempt as an input file for Graphviz. The visualizations generated in thisway are useful however only for analyzing small derivations with at most 100 infer-ences, but cannot practically be used to analyse and manipulate larger proof attempts.We note that it is quite common to have first-order proofs and proof attempts with morethan 1,000 or even 10,000 inferences, especially in applications of theorem proving insoftware verification, see e.g. [3]. In our S AT V IS framework, the interactive features ofour tool allow one to analyze such large(r) proof attempts.The framework [12] eases the manual analysis of proof attempts in Z3 [5] by visualizingquantifier instantiations, case splits and conflicts. While both [12] and S AT V IS are builtfor analyzing (failed) proof attempts, they target different architectures (SMT-solvingresp. superposition-based proving) and therefore differ in their input format and in theinformation they visualize. The frameworks [4,10] visualize proofs derived in a naturaldeduction/sequent calculus. Unlike these approaches, S AT V IS targets clausal deriva-tions generated by saturation-based provers using the superposition inference system.As a consequence, our tool can be used to focus only on the clauses that have been ac-tively used during proof search, instead of having to visualize the entire set of clauses,including unused clauses during proof search. We finally note that proof checkers, suchas DRAT-trim [15], support the soundness analysis of each inference step of a proof,and do not focus on failing proof attempts nor do they visualize proofs. Proof Search in V
AMPIRE
We first present the key ingredients for proof search in V
AMPIRE , relevant to analysingsaturation attempts.
Derivations and proofs. An inference I is a tuple ( F , . . . , F n , F ) , where F , . . . , F n , F are formulas. The formulas F , . . . , F n are called the premises of I and F is called the conclusion of I . In our setting, an inference system is a set of inferences and we relyon the superposition inference systems [11]. An axiom of an inference system is anyinference with n = 0 . Given an inference system I , a derivation from axioms A isan acyclic directed graph (DAG), where (i) each node is a formula and (ii) each nodeeither is an axiom in A and does not have any incoming edges, or is a formula F / ∈ A ,such that the incoming edges of F are exactly ( F , F ) , . . . , ( F n , F ) and there exists aninference ( F , . . . , F n , F ) ∈ I . A refutation of axioms A is a derivation which containsthe empty clause ⊥ as a node. A derivation of a formula F is called a proof of F if it isfinite and all leaves in the derivation are axioms. Proof search in Vampire.
Given an input set of axioms A and a conjecture G , V AM - PIRE searches for a refutation of A ∪{¬ G } , by using a preprocessing phase followed bya saturation phase. In the preprocessing phase, V AMPIRE generates a derivation from A ∪ {¬ G } such that each sink-node of the DAG is a clause. Then, V AMPIRE entersthe saturation phase, where it extends the existing derivation by applying its saturationalgorithm using the sink-nodes from the preprocessing phase as the input clauses tosaturation. The saturation phase of V
AMPIRE terminates in either of the following threecases: (i) the empty clause ⊥ is derived (hence, a proof of G was found), (ii) no moreclauses are derived and the empty clause ⊥ was not derived (hence, the input is satu-rated and G is satisfiable), or (iii) an a priory given time/memory limit on the V AMPIRE run is reached (hence, it is unknown whether G is satisfiable/valid).Saturation-based proving in V AMPIRE is performed using the following high-level de-scription of the saturation phase of V
AMPIRE . The saturation algorithm divides the setof clauses from the proof space of V
AMPIRE into a set of
Active and
Passive clauses,and iteratively refines these sets using its superposition inference system: the
Active setkeeps the clauses between which all possible inferences have been performed, whereasthe
Passive set stores the clauses which have not been added to
Active yet and arecandidates for being used in future steps of the saturation algorithm. During satura-tion, V
AMPIRE distinguishes between so-called simplifying and generating inferences .Intuitively, simplifying inferences delete clauses from the search space and hence arecrucial for keeping the search space small. A generating inference is a non-simplifyingone, and hence adds new clauses to the search space. As such, at every iteration ofthe saturation algorithm, a new clause from
Passive is selected and added to
Active ,after which all generating inferences between the selected clause and the clauses in
Active are applied. Conclusions of these inferences yield new clauses which are addedto
Passive to be selected in future iterations of saturation. Additionally at any step ofthe saturation algorithm, simplifying inferences and deletion of clauses are allowed. a sink-node is a node such that no edge emerges out of it. ..[SA] passive: 160. v = a(l11(s(nl8)),$sum(i(main_end),1)) [superposition 70,118][SA] active: 163. i(main_end) != -1 [term algebras distinctness 162][SA] active: 92. ~’Sub’(X5,p(X4)) | ’Sub’(X5,X4) | zero = X4 [superposition 66,44][SA] new: 164. ’Sub’(p(p(X0)),X0) | zero = X0 | zero = p(X0) [resolution 92,94][SA] passive: 164. ’Sub’(p(p(X0)),X0) | zero = X0 | zero = p(X0) [resolution 92,94][SA] active: 132. v = a(l11(s(s(zero))),2) [superposition 70,124][SA] new: 165. v = a(l8(s(s(zero))),2) | i(l8(s(s(zero)))) = 2 [superposition 132,72][SA] new: 166. v = a(l8(s(s(zero))),2) | i(l8(s(s(zero)))) = 2 [superposition 72,132][SA] active: 90. s(X1) != X0 | p(X0) = X1 | zero = X0 [superposition 22,44][SA] new: 167. X0 != X1 | p(X0) = p(X1) | zero = X1 | zero = X0 [superposition 90,44][SA] new: 168. p(s(X0)) = X0 | zero = s(X0) [equality resolution 90][SA] new: 169. p(s(X0)) = X0 [term algebras distinctness 168]... Fig. 1.
Screenshot of a saturation attempt of V
AMPIRE . AMPIRE
We now discuss how to efficiently analyze saturation attempts of V
AMPIRE in S AT V IS . Analyzing saturation attempts.
To understand saturation (attempts), we have to analyzethe generating inferences performed during saturation (attempts).On the one hand, we are interested in the useful clauses: that is, the derived and activatedclauses that are part of the proof we expect V
AMPIRE to find. In particular, we checkwhether these clauses occur in
Active . (i) If this is the case for a given useful clause (ora simplified variant of it), we are done with processing this useful clause and optionallycheck the derivation of that clause against the expected derivation. (ii) If not, we have toidentify the reason why the clause was not added to
Active , which can either be the casebecause (ii.a) the clause (or a simplified version of it) was never chosen from
Passive to be activated or (ii.b) the clause was not even added to
Passive . In case (ii.a), weinvestigate why the clause was not activated. This involves checking which simplifiedversion of the clause was added to
Passive and checking the value of clause selectionin V
AMPIRE on that clause. In case (ii.b), it is needed to understand why the clause wasnot added to
Passive , that is, why no generating inference between suitable premiseclauses was performed. This could for instance be the case because one of the premiseswas not added to
Active , in which case we recurse with the analysis on that premise, orbecause clause selection in V
AMPIRE prevented the inference.On the other hand, we are interested in the useless clauses: that is, the clauses whichwere generated or even activated but are unrelated to the proof V
AMPIRE will find.These clauses often slow down the proof search by several magnitudes. It is thereforecrucial to limit their generation or at least their activation. To identify the useless clausesthat are activated, we need to analyze the set
Active , whereas to identify the uselessclauses, which are generated but never activated, we have to investigate the set
Passive . Saturation output.
We now discuss how S AT V IS reconstructs the clause sets Active and
Passive from a V
AMPIRE saturation (attempt). V
AMPIRE is able to log a list ofevents, where each event is classified as either (i) new C (ii) passive C or (iii) active C ,for a given clause C . The list of events produced by V AMPIRE satisfies the followingproperties: (a) any clause is at most once newly created, added to
Passive and added to
Active ; (b) if a clause is added to
Passive , it was newly created in the same iteration,and (c) if a clause is added to
Active , it was newly created and added to
Passive atome point. Figure 1 shows a part of the output logged by V
AMPIRE while performinga saturation attempt ( SA ).Starting from an empty derivation and two empty sets, the derivation graph and the sets Active and
Passive corresponding to a given saturation attempt of V
AMPIRE are com-puted in S AT V IS by traversing the list of events produced by V AMPIRE and iterativelychanging the derivation and the sets
Active and
Passive , as follows:(i) new C : add the new node C to the derivation and construct the edges ( C i , C ) , forany premise C i of the inference deriving C . The sets Active or Passive remainunchanged;(ii) passive C : add the node C to Passive . The derivation and
Active remain un-changed;(iii) active C : remove the node C from Passive and add it to
Active . The derivationremains unchanged.
Interactive Visualization.
The large number of inferences during saturation in V AM - PIRE makes the direct analysis of saturation attempts of V
AMPIRE impossible withina reasonable amount of time. In order to overcome this problem, in S AT V IS we in-teractively visualize the derivation graph of the V AMPIRE saturation. The graph-basedvisualization of S AT V IS brings the following benefits: • Navigating through the graph visualization of a V
AMPIRE derivation is easier for usersrather than working with the V
AMPIRE derivation encoded as a list of hyper-edges. Inparticular, both (i) navigating to the premises of a selected node/clause and (ii) searchingfor inferences having a selected node/clause as premise is performed fast in S AT V IS . • S AT V IS visualizes only the nodes/clauses that are part of a derivation of an activatedclause, and in this way ignores uninteresting inferences. • S AT V IS merges the preprocessing inferences, such that each clause resulting frompreprocessing has as direct premise the input formula it is derived from.Yet, a straightforward graph-based visualization of V AMPIRE saturations in S AT V IS would bring the following practical limitations on using S AT V IS :(i) displaying additional meta-information on graph nodes, such as the inference ruleused to derive a node, is computationally very expensive, due to the large number ofinferences used during saturation;(ii) manual search for particular/already processed nodes in relatively large derivationswould take too much time;(iii) subderivations are often interleaved with other subderivations due to an imperfectautomatic layout of the graph.S AT V IS addresses the above challenges using its following interactive features: – S AT V IS displays meta-information only for a selected node/clause; – S AT V IS supports different ways to locate and select clauses, such as full-text search,search for direct children and premises of the currently selected clauses, and searchfor clauses whose derivation contains all currently selected nodes; – S AT V IS supports transformations/fragmentations of derivations. In particular, it ispossible to restrict and visualize the derivation containing only the clauses thatform the derivation of a selected clause, or visualize only clauses whose derivationcontains a selected clause. – S AT V IS allows to (permanently) highlight one or more clauses in the derivation. ig. 2. Screenshot of S AT V IS showing visualized derivation and interaction menu. Figure 2 illustrates some of the above feature of S AT V IS , using output from V AMPIRE similar to Figure 1 as input to S AT V IS . AT V IS We implemented S AT V IS as a web application, allowing S AT V IS to be easily usedon any platform. Written in Python3, S AT V IS contains about 2,200 lines of code. Forthe generation of graph layouts, we rely on pygraphviz , whereas graph/derivationvisualizations are created with vis.js . We experimented with S AT V IS on the veri-fication examples of [3], using an Intel Core i5 3.1Ghz machine with 16 GB of RAM,allowing us to refine and successfully generate V AMPIRE proofs for non-interferenceand information-flow examples of [3].S AT V IS workflow. S AT V IS takes as input a text file containing the output of a V AM - PIRE saturation attempt. An example of a partial input to S AT V IS is given in Figure 1.S AT V IS then generates a DAG representing the derivation of the considered V AMPIRE saturation output, as presented in Section 3 and discussed later. Next, S AT V IS generatesthe graph layout of for the generated DAG, enriched with configured style information.Finally, S AT V IS renders and visualizes the V AMPIRE derivation corresponding to itsinput, and allows interactive visualisations of its output, as discussed in Section 3 anddetailed below.
DAG generation of saturation outputs. S AT V IS parses its input line by line using regexpattern matching in order to generate the nodes of the graph. Next, S AT V IS uses a postorder traversal algorithm to sanitize nodes and remove redundant ones. The result isthen passed to pygraphviz to generate a graph layout. While pygraphviz findslayouts for thousands of nodes within less than three seconds, we would like to improvethe scalability of the tool further.It would be beneficial to preprocess and render nodes incrementally, while ensuringstable layouts for S AT V IS graph transformations. We leave this engineering task forfuture work. https://pygraphviz.github.io https://visjs.org/ nteractive visualization The interactive features of S AT V IS support (i) various nodesearching mechanisms, (ii) graph transformations, and (iii) the display of meta-informationabout a specific node. We can efficiently search for nodes by (partial) clause, find par-ents or children of a node, and find common consequences of a number of nodes. Graphtransformations in S AT V IS allow to only render a certain subset of nodes from theS AT V IS DAG, for example, displaying only transitive parents or children of a certainnode.
We described the S AT V IS tool for interactively visualizing proofs and proof attemptsof the first-order theorem prover V AMPIRE . Our work analyses proof search in V AM - PIRE and reconstructs first-order derivations corresponding to V
AMPIRE proofs/proofattempts. The interactive features of S AT V IS ease the task of understanding both suc-cessful and failing proof attempts in V AMPIRE and hence can be used to further developand use V
AMPIRE both by experts and non-experts in first-order theorem proving.
Acknowledgements.
This work was funded by the ERC Starting Grant 2014 SYM-CAR 639270, the ERC Proof of Concept Grant 2018 SYMELS 842066, the WallenbergAcademy Fellowship 2014 TheProSE and the Austrian FWF project W1255-N23.
References
1. C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jovanovi´c, T. King, A. Reynolds, andC. Tinelli. CVC4. In
CAV , pages 171–177, 2011.2. C. Barrett, P. Fontaine, and C. Tinelli. The SMT-LIB standard: Version 2.6. Technical report,Department of Computer Science, The University of Iowa, 2017.3. G. Barthe, R. Eilers, P. Georgiou, B. Gleiss, L. Kovacs, and M. Maffei. Verifying RelationalProperties using Trace Logic. In
FMCAD , 2019. To appear.4. J. Byrnes, M. Buchanan, M. Ernst, P. Miller, C. Roberts, and R. Keller. Visualizing proofsearch for theorem prover development.
ENTCS , 226:23 – 38, 2009.5. L. De Moura and N. Bjørner. Z3: An efficient SMT solver. In
TACAS , pages 337–340, 2008.6. E. R. Gansner and S. C. North. An Open Graph Visualization System and its Applicationsto Software Engineering.
Software- Practice and Experience , 30(11):1203–1233, 2000.7. E. Kotelnikov, L. Kovács, and A. Voronkov. A FOOLish Encoding of the Next State Rela-tions of Imperative Programs. In
IJCAR , pages 405–421, 2018.8. L. Kovács, S. Robillard, and A. Voronkov. Coming to terms with quantified reasoning. In
POPL , pages 260–270. ACM, 2017.9. L. Kovács and A. Voronkov. First-Order Theorem Proving and Vampire. In
CAV , pages1–35, 2013.10. T. Libal, M. Riener, and M. Rukhaia. Advanced Proof Viewing in ProofTool. In
UITP , pages35–47, 2014.11. R. Nieuwenhuis and A. Rubio. Paramodulation-Based Theorem Proving. In
Handbook ofAutomated Reasoning , pages 371–443. 2001.12. F. Rothenberger. Integration and analysis of alternative smt solvers for software verification.Master’s thesis, ETH Zurich, Zürich, 2016. Masterarbeit. ETH Zürich. 2016.13. S. Schulz. E - a Brainiac Theorem Prover.
AI Communications , 15(2-3):111–126, 2002.14. G. Sutcliffe. TPTP, TSTP, CASC, etc. In
CSR , pages 7–23, 2007.15. N. Wetzler, M. J. H. Heule, and W. A. Hunt. Drat-trim: Efficient checking and trimmingusing expressive clausal proofs. In