[PDF] Interactive Visualization of Saturation Attempts in Vampire

Abstract

Many applications of formal methods require automated reasoning about system properties, such as system safety and security. To improve the performance of automated reasoning engines, such as SAT/SMT solvers and first-order theorem prover, it is necessary to understand both the successful and failing attempts of these engines towards producing formal certificates, such as logical proofs and/or models. Such an analysis is challenging due to the large number of logical formulas generated during proof/model search. In this paper we focus on saturation-based first-order theorem proving and introduce the SATVIS tool for interactively visualizing saturation-based proof attempts in first-order theorem proving. We build SATVIS on top of the world-leading theorem prover VAMPIRE, by interactively visualizing the saturation attempts of VAMPIRE in SATVIS. Our work combines the automatic layout and visualization of the derivation graph induced by the saturation attempt with interactive transformations and search functionality. As a result, we are able to analyze and debug (failed) proof attempts of VAMPIRE. Thanks to its interactive visualisation, we believe SATVIS helps both experts and non-experts in theorem proving to understand first-order proofs and analyze/refine failing proof attempts of first-order provers.

Full PDF

IInteractive Visualization of Saturation Attempts inVampire

Bernhard Gleiss , Laura Kovács , , and Lena Schnedlitz TU Wien, Austria Chalmers University of Technology, Sweden

Abstract.

Many applications of formal methods require automated reasoningabout system properties, such as system safety and security. To improve the per-formance of automated reasoning engines, such as SAT/SMT solvers and ﬁrst-order theorem prover, it is necessary to understand both the successful and failingattempts of these engines towards producing formal certiﬁcates, such as logicalproofs and/or models. Such an analysis is challenging due to the large numberof logical formulas generated during proof/model search. In this paper we focuson saturation-based ﬁrst-order theorem proving and introduce the S AT V IS toolfor interactively visualizing saturation-based proof attempts in ﬁrst-order theoremproving. We build S AT V IS on top of the world-leading theorem prover V AMPIRE ,by interactively visualizing the saturation attempts of V

AMPIRE in S AT V IS . Ourwork combines the automatic layout and visualization of the derivation graph in-duced by the saturation attempt with interactive transformations and search func-tionality. As a result, we are able to analyze and debug (failed) proof attempts ofV AMPIRE . Thanks to its interactive visualisation, we believe S AT V IS helps bothexperts and non-experts in theorem proving to understand ﬁrst-order proofs andanalyze/reﬁne failing proof attempts of ﬁrst-order provers. Many applications of formal methods, such as program analysis and veriﬁcation, re-quire automated reasoning about system properties, such as program safety, securityand reliability. Automated reasoners, such as SAT/SMT solvers [1,5] and ﬁrst-ordertheorem provers [9,13], have therefore become a key backbone of rigorous system en-gineering. For example, proving properties over the computer memory relies on ﬁrst-order reasoning with both quantiﬁers and integer arithmetic.Saturation-based theorem proving is the leading approach for automating reasoning infull ﬁrst-order logic. In a nutshell, this approach negates a given goal and saturatesits given set of input formulas (including the negated goal), by deriving logical con-sequences of the input using a logical inference system, such as binary resolution orsuperposition. Whenever a contradiction (false) is derived, the saturation process ter-minates reporting validity of the input goal. State-of-the-art theorem provers, such asV

AMPIRE [9] and E [13], implement saturation-based proof search using the (ordered)superposition calculus [11]. These provers rely on powerful indexing algorithms, selec-tion functions and term orderings for making saturation-based theorem proving efﬁcientand scalable to a large set of ﬁrst-order formulas, as evidenced in the yearly CASC sys-tem competition of ﬁrst-order provers [14]. a r X i v : . [ c s . L O ] J a n ver the past years, saturation-based theorem proving has been extended to ﬁrst-orderlogic with theories, such as arithmetic, theory of arrays and algebraic datatypes [8].Further, ﬁrst-class boolean sorts and if-then-else and let-in constructs have also beenintroduced as extensions to the input syntax of ﬁrst-order theorem provers [7]. Thanksto these recent developments, ﬁrst-order theorem provers became better suited in appli-cations of formal methods, being for example a competitive alternative to SMT-solvers[1,5] in software veriﬁcation and program analysis. Recent editions of the SMT-COMP and CASC system competitions show, for example, that V AMPIRE successfully com-petes against the leading SMT solvers Z3 [5] and CVC4 [1] and vice-versa.By leveraging the best practices in ﬁrst-order theorem proving in combination withSMT solving, in our recent work [3] we showed that correctness of a software programcan be reduced to a validity problem in ﬁrst-order logic. We use V

AMPIRE to prove theresulting encodings, outperforming SMT solvers. Our initial results demonstrate thatﬁrst-order theorem proving is well-suited for applications of (relational) veriﬁcation,such as safety and non-interference. Yet, our results also show that the performanceof the prover crucially depends on the logical representation of its input problem andthe deployed reasoning strategies during proof search. As such, users and developersof ﬁrst-order provers, and automated reasoners in general, typically face the burden ofanalysing (failed) proof attempts produced by the prover, with the ultimate goal to re-ﬁne the input and/or proof strategies making the prover succeed in proving its input.Understanding (some of) the reasons why the prover failed is however very hard andrequires a considerable amount of work by highly qualiﬁed experts in theorem proving,hindering thus the use of theorem provers in many application domains.In this paper we address this challenge and introduce the S AT V IS tool to ease the taskof analysing failed proof attempts in saturation-based reasoning . We designed S AT V IS to support interactive visualization of the saturation algorithm used in V AMPIRE , withthe goal to ease the manual analysis of V

AMPIRE proofs as well as failed proof attemptsin V

AMPIRE . Inputs to S AT V IS are proof (attempts) produced by V AMPIRE . Our toolconsists of (i) an explicit visualization of the DAG-structure of the saturation proof (at-tempt) of V

AMPIRE and (ii) interactive transformations of the DAG for pruning andreformatting the proof (attempt). In its current setting, S AT V IS can be used only inthe context of V AMPIRE . Yet, by parsing/translating proofs (or proof attempts) of otherprovers into the V

AMPIRE proof format, S AT V IS can be used in conjunction with otherprovers as well.When feeding V AMPIRE proofs to S AT V IS , S AT V IS supports both users and develop-ers of V AMPIRE to understand and refactor V

AMPIRE proofs, and to manually proofcheck soundness of V

AMPIRE proofs. When using S AT V IS on failed proof attempts ofV AMPIRE , S AT V IS supports users and developers of V AMPIRE to analyse how V AM - PIRE explored its search space during proof search, that is, to understand which clauseswere derived and why certain clauses have not been derived at various steps during sat-uration. By doing so, the S AT V IS proof visualisation framework gives valuable insightson how to revise the input problem encoding of V AMPIRE and/or implement domain- https://smt-comp.github.io/ peciﬁc optimizations in V AMPIRE . We therefore believe that S AT V IS improves thestate-of-the-art in the use and applications of theorem proving at least in the followingscenarios: (i) helping V AMPIRE developers to debug and further improve V

AMPIRE , (ii)helping V

AMPIRE users to tune V

AMPIRE to their applications, by not treating V AM - PIRE as a black-box but by understanding and using its appropriate proof search options;and (iii) helping unexperienced users in saturation-based theorem proving to learn usingV

AMPIRE and ﬁrst-order proving in general.

Contributions.

The contribution of this paper comes with the design of the S AT V IS tool for analysing proofs, as well as proof attempts of the V AMPIRE theorem prover.S AT V IS is available at: https://github.com/gleiss/saturation-visualization .We overview proof search steps in V AMPIRE speciﬁc to S AT V IS (Section 2), discuss thechallenges we faced for analysing proof attempts of V AMPIRE (Section 3), and describeimplementation-level details of S AT V IS Related work.

While standardizing the input format of automated reasoners is an ac-tive research topic, see e.g. the SMT-LIB [2] and TPTP [14] standards, coming up withan input standard for representing and analysing proofs and proof attempts of auto-mated reasoners has received so far very little attention. The TSTP library [14] providesinput/output standards for automated theorem proving systems. Yet, unlike S AT V IS ,TSTP does not analyse proof attempts but only supports the examination of ﬁrst-orderproofs. We note that V AMPIRE proofs (and proof attempts) contain ﬁrst-order formulaswith theories, which is not fully supported by TSTP.Using a graph-layout framework, for instance Graphviz [6], it is relatively straight-forward to visualize the DAG derivation graph induced by a saturation attempt of aﬁrst-order prover. For example, the theorem prover E [13] is able to directly output itssaturation attempt as an input ﬁle for Graphviz. The visualizations generated in thisway are useful however only for analyzing small derivations with at most 100 infer-ences, but cannot practically be used to analyse and manipulate larger proof attempts.We note that it is quite common to have ﬁrst-order proofs and proof attempts with morethan 1,000 or even 10,000 inferences, especially in applications of theorem proving insoftware veriﬁcation, see e.g. [3]. In our S AT V IS framework, the interactive features ofour tool allow one to analyze such large(r) proof attempts.The framework [12] eases the manual analysis of proof attempts in Z3 [5] by visualizingquantiﬁer instantiations, case splits and conﬂicts. While both [12] and S AT V IS are builtfor analyzing (failed) proof attempts, they target different architectures (SMT-solvingresp. superposition-based proving) and therefore differ in their input format and in theinformation they visualize. The frameworks [4,10] visualize proofs derived in a naturaldeduction/sequent calculus. Unlike these approaches, S AT V IS targets clausal deriva-tions generated by saturation-based provers using the superposition inference system.As a consequence, our tool can be used to focus only on the clauses that have been ac-tively used during proof search, instead of having to visualize the entire set of clauses,including unused clauses during proof search. We ﬁnally note that proof checkers, suchas DRAT-trim [15], support the soundness analysis of each inference step of a proof,and do not focus on failing proof attempts nor do they visualize proofs. Proof Search in V

AMPIRE

We ﬁrst present the key ingredients for proof search in V

AMPIRE , relevant to analysingsaturation attempts.

Derivations and proofs. An inference I is a tuple ( F , . . . , F n , F ) , where F , . . . , F n , F are formulas. The formulas F , . . . , F n are called the premises of I and F is called the conclusion of I . In our setting, an inference system is a set of inferences and we relyon the superposition inference systems [11]. An axiom of an inference system is anyinference with n = 0 . Given an inference system I , a derivation from axioms A isan acyclic directed graph (DAG), where (i) each node is a formula and (ii) each nodeeither is an axiom in A and does not have any incoming edges, or is a formula F / ∈ A ,such that the incoming edges of F are exactly ( F , F ) , . . . , ( F n , F ) and there exists aninference ( F , . . . , F n , F ) ∈ I . A refutation of axioms A is a derivation which containsthe empty clause ⊥ as a node. A derivation of a formula F is called a proof of F if it isﬁnite and all leaves in the derivation are axioms. Proof search in Vampire.

Given an input set of axioms A and a conjecture G , V AM - PIRE searches for a refutation of A ∪{¬ G } , by using a preprocessing phase followed bya saturation phase. In the preprocessing phase, V AMPIRE generates a derivation from A ∪ {¬ G } such that each sink-node of the DAG is a clause. Then, V AMPIRE entersthe saturation phase, where it extends the existing derivation by applying its saturationalgorithm using the sink-nodes from the preprocessing phase as the input clauses tosaturation. The saturation phase of V

AMPIRE terminates in either of the following threecases: (i) the empty clause ⊥ is derived (hence, a proof of G was found), (ii) no moreclauses are derived and the empty clause ⊥ was not derived (hence, the input is satu-rated and G is satisﬁable), or (iii) an a priory given time/memory limit on the V AMPIRE run is reached (hence, it is unknown whether G is satisﬁable/valid).Saturation-based proving in V AMPIRE is performed using the following high-level de-scription of the saturation phase of V

AMPIRE . The saturation algorithm divides the setof clauses from the proof space of V

AMPIRE into a set of

Active and

Passive clauses,and iteratively reﬁnes these sets using its superposition inference system: the

Active setkeeps the clauses between which all possible inferences have been performed, whereasthe

Passive set stores the clauses which have not been added to

Active yet and arecandidates for being used in future steps of the saturation algorithm. During satura-tion, V

AMPIRE distinguishes between so-called simplifying and generating inferences .Intuitively, simplifying inferences delete clauses from the search space and hence arecrucial for keeping the search space small. A generating inference is a non-simplifyingone, and hence adds new clauses to the search space. As such, at every iteration ofthe saturation algorithm, a new clause from

Passive is selected and added to

Active ,after which all generating inferences between the selected clause and the clauses in

Active are applied. Conclusions of these inferences yield new clauses which are addedto

Passive to be selected in future iterations of saturation. Additionally at any step ofthe saturation algorithm, simplifying inferences and deletion of clauses are allowed. a sink-node is a node such that no edge emerges out of it. ..[SA] passive: 160. v = a(l11(s(nl8)),$sum(i(main_end),1)) [superposition 70,118][SA] active: 163. i(main_end) != -1 [term algebras distinctness 162][SA] active: 92. ~’Sub’(X5,p(X4)) | ’Sub’(X5,X4) | zero = X4 [superposition 66,44][SA] new: 164. ’Sub’(p(p(X0)),X0) | zero = X0 | zero = p(X0) [resolution 92,94][SA] passive: 164. ’Sub’(p(p(X0)),X0) | zero = X0 | zero = p(X0) [resolution 92,94][SA] active: 132. v = a(l11(s(s(zero))),2) [superposition 70,124][SA] new: 165. v = a(l8(s(s(zero))),2) | i(l8(s(s(zero)))) = 2 [superposition 132,72][SA] new: 166. v = a(l8(s(s(zero))),2) | i(l8(s(s(zero)))) = 2 [superposition 72,132][SA] active: 90. s(X1) != X0 | p(X0) = X1 | zero = X0 [superposition 22,44][SA] new: 167. X0 != X1 | p(X0) = p(X1) | zero = X1 | zero = X0 [superposition 90,44][SA] new: 168. p(s(X0)) = X0 | zero = s(X0) [equality resolution 90][SA] new: 169. p(s(X0)) = X0 [term algebras distinctness 168]... Fig. 1.

Screenshot of a saturation attempt of V

AMPIRE . AMPIRE

We now discuss how to efﬁciently analyze saturation attempts of V

AMPIRE in S AT V IS . Analyzing saturation attempts.

To understand saturation (attempts), we have to analyzethe generating inferences performed during saturation (attempts).On the one hand, we are interested in the useful clauses: that is, the derived and activatedclauses that are part of the proof we expect V

AMPIRE to ﬁnd. In particular, we checkwhether these clauses occur in

Active . (i) If this is the case for a given useful clause (ora simpliﬁed variant of it), we are done with processing this useful clause and optionallycheck the derivation of that clause against the expected derivation. (ii) If not, we have toidentify the reason why the clause was not added to

Active , which can either be the casebecause (ii.a) the clause (or a simpliﬁed version of it) was never chosen from

Passive to be activated or (ii.b) the clause was not even added to

Passive . In case (ii.a), weinvestigate why the clause was not activated. This involves checking which simpliﬁedversion of the clause was added to

Passive and checking the value of clause selectionin V

AMPIRE on that clause. In case (ii.b), it is needed to understand why the clause wasnot added to

Passive , that is, why no generating inference between suitable premiseclauses was performed. This could for instance be the case because one of the premiseswas not added to

Active , in which case we recurse with the analysis on that premise, orbecause clause selection in V

AMPIRE prevented the inference.On the other hand, we are interested in the useless clauses: that is, the clauses whichwere generated or even activated but are unrelated to the proof V

AMPIRE will ﬁnd.These clauses often slow down the proof search by several magnitudes. It is thereforecrucial to limit their generation or at least their activation. To identify the useless clausesthat are activated, we need to analyze the set

Active , whereas to identify the uselessclauses, which are generated but never activated, we have to investigate the set

Passive . Saturation output.

We now discuss how S AT V IS reconstructs the clause sets Active and

Passive from a V

AMPIRE saturation (attempt). V

AMPIRE is able to log a list ofevents, where each event is classiﬁed as either (i) new C (ii) passive C or (iii) active C ,for a given clause C . The list of events produced by V AMPIRE satisﬁes the followingproperties: (a) any clause is at most once newly created, added to

Passive and added to

Active ; (b) if a clause is added to

Passive , it was newly created in the same iteration,and (c) if a clause is added to

Active , it was newly created and added to

Passive atome point. Figure 1 shows a part of the output logged by V

AMPIRE while performinga saturation attempt ( SA ).Starting from an empty derivation and two empty sets, the derivation graph and the sets Active and

Passive corresponding to a given saturation attempt of V

AMPIRE are com-puted in S AT V IS by traversing the list of events produced by V AMPIRE and iterativelychanging the derivation and the sets

Active and

Passive , as follows:(i) new C : add the new node C to the derivation and construct the edges ( C i , C ) , forany premise C i of the inference deriving C . The sets Active or Passive remainunchanged;(ii) passive C : add the node C to Passive . The derivation and

Active remain un-changed;(iii) active C : remove the node C from Passive and add it to

Active . The derivationremains unchanged.

Interactive Visualization.

The large number of inferences during saturation in V AM - PIRE makes the direct analysis of saturation attempts of V

AMPIRE impossible withina reasonable amount of time. In order to overcome this problem, in S AT V IS we in-teractively visualize the derivation graph of the V AMPIRE saturation. The graph-basedvisualization of S AT V IS brings the following beneﬁts: • Navigating through the graph visualization of a V

AMPIRE derivation is easier for usersrather than working with the V

AMPIRE derivation encoded as a list of hyper-edges. Inparticular, both (i) navigating to the premises of a selected node/clause and (ii) searchingfor inferences having a selected node/clause as premise is performed fast in S AT V IS . • S AT V IS visualizes only the nodes/clauses that are part of a derivation of an activatedclause, and in this way ignores uninteresting inferences. • S AT V IS merges the preprocessing inferences, such that each clause resulting frompreprocessing has as direct premise the input formula it is derived from.Yet, a straightforward graph-based visualization of V AMPIRE saturations in S AT V IS would bring the following practical limitations on using S AT V IS :(i) displaying additional meta-information on graph nodes, such as the inference ruleused to derive a node, is computationally very expensive, due to the large number ofinferences used during saturation;(ii) manual search for particular/already processed nodes in relatively large derivationswould take too much time;(iii) subderivations are often interleaved with other subderivations due to an imperfectautomatic layout of the graph.S AT V IS addresses the above challenges using its following interactive features: – S AT V IS displays meta-information only for a selected node/clause; – S AT V IS supports different ways to locate and select clauses, such as full-text search,search for direct children and premises of the currently selected clauses, and searchfor clauses whose derivation contains all currently selected nodes; – S AT V IS supports transformations/fragmentations of derivations. In particular, it ispossible to restrict and visualize the derivation containing only the clauses thatform the derivation of a selected clause, or visualize only clauses whose derivationcontains a selected clause. – S AT V IS allows to (permanently) highlight one or more clauses in the derivation. ig. 2. Screenshot of S AT V IS showing visualized derivation and interaction menu. Figure 2 illustrates some of the above feature of S AT V IS , using output from V AMPIRE similar to Figure 1 as input to S AT V IS . AT V IS We implemented S AT V IS as a web application, allowing S AT V IS to be easily usedon any platform. Written in Python3, S AT V IS contains about 2,200 lines of code. Forthe generation of graph layouts, we rely on pygraphviz , whereas graph/derivationvisualizations are created with vis.js . We experimented with S AT V IS on the veri-ﬁcation examples of [3], using an Intel Core i5 3.1Ghz machine with 16 GB of RAM,allowing us to reﬁne and successfully generate V AMPIRE proofs for non-interferenceand information-ﬂow examples of [3].S AT V IS workﬂow. S AT V IS takes as input a text ﬁle containing the output of a V AM - PIRE saturation attempt. An example of a partial input to S AT V IS is given in Figure 1.S AT V IS then generates a DAG representing the derivation of the considered V AMPIRE saturation output, as presented in Section 3 and discussed later. Next, S AT V IS generatesthe graph layout of for the generated DAG, enriched with conﬁgured style information.Finally, S AT V IS renders and visualizes the V AMPIRE derivation corresponding to itsinput, and allows interactive visualisations of its output, as discussed in Section 3 anddetailed below.

DAG generation of saturation outputs. S AT V IS parses its input line by line using regexpattern matching in order to generate the nodes of the graph. Next, S AT V IS uses a postorder traversal algorithm to sanitize nodes and remove redundant ones. The result isthen passed to pygraphviz to generate a graph layout. While pygraphviz ﬁndslayouts for thousands of nodes within less than three seconds, we would like to improvethe scalability of the tool further.It would be beneﬁcial to preprocess and render nodes incrementally, while ensuringstable layouts for S AT V IS graph transformations. We leave this engineering task forfuture work. https://pygraphviz.github.io https://visjs.org/ nteractive visualization The interactive features of S AT V IS support (i) various nodesearching mechanisms, (ii) graph transformations, and (iii) the display of meta-informationabout a speciﬁc node. We can efﬁciently search for nodes by (partial) clause, ﬁnd par-ents or children of a node, and ﬁnd common consequences of a number of nodes. Graphtransformations in S AT V IS allow to only render a certain subset of nodes from theS AT V IS DAG, for example, displaying only transitive parents or children of a certainnode.

We described the S AT V IS tool for interactively visualizing proofs and proof attemptsof the ﬁrst-order theorem prover V AMPIRE . Our work analyses proof search in V AM - PIRE and reconstructs ﬁrst-order derivations corresponding to V

AMPIRE proofs/proofattempts. The interactive features of S AT V IS ease the task of understanding both suc-cessful and failing proof attempts in V AMPIRE and hence can be used to further developand use V

AMPIRE both by experts and non-experts in ﬁrst-order theorem proving.

Acknowledgements.

This work was funded by the ERC Starting Grant 2014 SYM-CAR 639270, the ERC Proof of Concept Grant 2018 SYMELS 842066, the WallenbergAcademy Fellowship 2014 TheProSE and the Austrian FWF project W1255-N23.

References

1. C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jovanovi´c, T. King, A. Reynolds, andC. Tinelli. CVC4. In

CAV , pages 171–177, 2011.2. C. Barrett, P. Fontaine, and C. Tinelli. The SMT-LIB standard: Version 2.6. Technical report,Department of Computer Science, The University of Iowa, 2017.3. G. Barthe, R. Eilers, P. Georgiou, B. Gleiss, L. Kovacs, and M. Maffei. Verifying RelationalProperties using Trace Logic. In

FMCAD , 2019. To appear.4. J. Byrnes, M. Buchanan, M. Ernst, P. Miller, C. Roberts, and R. Keller. Visualizing proofsearch for theorem prover development.

ENTCS , 226:23 – 38, 2009.5. L. De Moura and N. Bjørner. Z3: An efﬁcient SMT solver. In

TACAS , pages 337–340, 2008.6. E. R. Gansner and S. C. North. An Open Graph Visualization System and its Applicationsto Software Engineering.

Software- Practice and Experience , 30(11):1203–1233, 2000.7. E. Kotelnikov, L. Kovács, and A. Voronkov. A FOOLish Encoding of the Next State Rela-tions of Imperative Programs. In

IJCAR , pages 405–421, 2018.8. L. Kovács, S. Robillard, and A. Voronkov. Coming to terms with quantiﬁed reasoning. In

POPL , pages 260–270. ACM, 2017.9. L. Kovács and A. Voronkov. First-Order Theorem Proving and Vampire. In

CAV , pages1–35, 2013.10. T. Libal, M. Riener, and M. Rukhaia. Advanced Proof Viewing in ProofTool. In

UITP , pages35–47, 2014.11. R. Nieuwenhuis and A. Rubio. Paramodulation-Based Theorem Proving. In

Handbook ofAutomated Reasoning , pages 371–443. 2001.12. F. Rothenberger. Integration and analysis of alternative smt solvers for software veriﬁcation.Master’s thesis, ETH Zurich, Zürich, 2016. Masterarbeit. ETH Zürich. 2016.13. S. Schulz. E - a Brainiac Theorem Prover.

AI Communications , 15(2-3):111–126, 2002.14. G. Sutcliffe. TPTP, TSTP, CASC, etc. In

CSR , pages 7–23, 2007.15. N. Wetzler, M. J. H. Heule, and W. A. Hunt. Drat-trim: Efﬁcient checking and trimmingusing expressive clausal proofs. In