Partial Regularization of First-Order Resolution Proofs
PPartial Regularization of First-Order Resolution Proofs
Jan Gorzny , Ezequiel Postan , and Bruno Woltzenlogel Paleo School of Computer Science, University of Waterloo, 200 University Ave.W., Waterloo, ON N2L 3G1, Canada Universidad Nacional de Rosario, Av. Pellegrini 250, S2000BTP Rosario,Santa Fe, Argentina Vienna University of Technology, Karlsplatz 13, 1040, Vienna, AustriaApril 19, 2018
Abstract
Resolution and superposition are common techniques which have seen widespreaduse with propositional and first-order logic in modern theorem provers. In these cases,resolution proof production is a key feature of such tools; however, the proofs thatthey produce are not necessarily as concise as possible. For propositional resolutionproofs, there are a wide variety of proof compression techniques. There are fewer tech-niques for compressing first-order resolution proofs generated by automated theoremprovers. This paper describes an approach to compressing first-order logic proofs basedon lifting proof compression ideas used in propositional logic to first-order logic. Onemethod for propositional proof compression is partial regularization , which removes aninference η when it is redundant in the sense that its pivot literal already occurs as thepivot of another inference in every path from η to the root of the proof. This paperdescribes the generalization of the partial-regularization algorithm RecyclePivotsWithIntersection [10] from propositional logic to first-order logic. The generalized algo-rithm performs partial regularization of resolution proofs containing resolution and fac-toring inferences with unification . An empirical evaluation of the generalized algorithmand its combinations with the previously lifted
GreedyLinearFirstOrderLowerUnits algorithm [12] is also presented.
First-order automated theorem provers, commonly based on refinements and extensions ofresolution and superposition calculi [23, 26, 35, 21, 3, 7, 18], have recently achieved a highdegree of maturity. Proof production is a key feature that has been gaining importance,as proofs are crucial for applications that require certification of a prover’s answers orthat extract additional information from proofs (e.g. unsat cores, interpolants, instancesof quantified variables). Nevertheless, proof production is non-trivial [27], and the mostefficient provers do not necessarily generate the shortest proofs. One reason for this is thatefficient resolution provers use refinements that restrict the application of inference rules.Although fewer clauses are generated and the search space is reduced, refinements mayexclude short proofs whose inferences do not satisfy the restriction.1 a r X i v : . [ c s . L O ] A p r onger and larger proofs take longer to check, may consume more memory during proof-checking and occupy more storage space, and may have a larger unsat core, if more inputclauses are used in the proof, and a larger Herbrand sequent, if more variables are instanti-ated [36, 37, 14, 15, 22]. For these technical reasons, it is worth pursuing efficient algorithmsthat compress proofs after they have been found. Furthermore, the problem of proof com-pression is closely related to Hilbert’s 24th Problem [30], which asks for criteria to judge thesimplicity of proofs. Proof length is arguably one possible criterion for some applications.For propositional resolution proofs, as those typically generated by SAT- and SMT-solvers, there is a wide variety of proof compression techniques. Algebraic properties ofthe resolution operation that are potentially useful for compression were investigated in [9].Compression algorithms based on rearranging and sharing chains of resolution inferenceshave been developed in [1] and [28]. Cotton [6] proposed an algorithm that compresses arefutation by repeatedly splitting it into a proof of a heuristically chosen literal (cid:96) and aproof of (cid:96) , and then resolving them to form a new refutation. The Reduce&Reconstruct algorithm [25] searches for locally redundant subproofs that can be rewritten into subproofsof stronger clauses and with fewer resolution steps. Bar-Ilan et al. [2] and Fontaine et al.[10] described a linear time proof compression algorithm based on partial regularization,which removes an inference η when it is redundant in the sense that its pivot literal alreadyoccurs as the pivot of another inference in every path from η to the root of the proof.In contrast, although proof output has been a concern in first-order automated reasoningfor a longer time than in propositional SAT-solving, there has been much less work onsimplifying first-order proofs. For tree-like sequent calculus proofs, algorithms based oncut-introduction [20, 13] have been proposed. However, converting a DAG-like resolution orsuperposition proof, as usually generated by current provers, into a tree-like sequent calculusproof may increase the size of the proof. For arbitrary proofs in the Thousands of Problemsfor Theorem Provers (TPTP) [29] format (including DAG-like first-order resolution proofs),there is an algorithm [32] that looks for terms that occur often in any Thousands of Solutionsfrom Theorem Provers (TSTP) [29] proof and abbreviates them.The work reported in this paper is part of a new trend that aims at liftingsuccessful propositional proof compression algorithms to first-order logic. Our firsttarget was the propositional LowerUnits ( LU ) algorithm [10], which delays resolu-tion steps with unit clauses, and we lifted it to a new algorithm that we called GreedyLinearFirstOrderLowerUnits ( GFOLU ) algorithm [12]. Here we continue this line ofresearch by lifting the
RecyclePivotsWithIntersection ( RPI ) algorithm [10], which im-proves the
RecyclePivots ( RP ) algorithm [2] by detecting nodes that can be regularizedeven when they have multiple children.Section 2 introduces the well-known first-order resolution calculus with notations thatare suitable for describing and manipulating proofs as first-class objects. Section 3 summa-rizes the propositional RPI algorithm. Section 4 discusses the challenges that arise in thefirst-order case (mainly due to unification), which are not present in the propositional case,and conclude with conditions useful for first-order regularization. Section 5 describes an al-gorithm that overcomes these challenges. Section 6 presents experimental results obtainedby applying this algorithm, and its combinations with
GFOLU , on hundreds of proofs gener-ated with the
SPASS theorem prover on TPTP benchmarks [29] and on randomly generatedproofs. Section 7 concludes the paper.It is important to emphasize that this paper targets proofs in a pure first-order resolutioncalculus (with resolution and factoring rules only), without refinements or extensions, andwithout equality rules. As most state-of-the-art resolution-based provers use variations and2xtensions of this pure calculus and there exists no common proof format, the presentedalgorithm cannot be directly applied to the proofs generated by most provers, and even
SPASS had to be specially configured to disable
SPASS ’s extensions in order to generatepure resolution proofs for our experiments. By targeting the pure first-order resolutioncalculus, we address the common theoretical basis for the calculi of various provers. In theConclusion (Section 7), we briefly discuss what could be done to tackle common variationsand extensions, such as splitting and equality reasoning. Nevertheless, they remain topicsfor future research beyond the scope of this paper.
As usual, our language has infinitely many variable symbols (e.g. x , y , z , x , x , . . . ),constant symbols (e.g. a , b , c , a , a , . . . ), function symbols of every arity (e.g f , g , f , f ,. . . ) and predicate symbols of every arity (e.g. P , Q , P , P ,. . . ). A term is any variable,constant or the application of an n -ary function symbol to n terms. An atomic formula ( atom ) is the application of an n -ary predicate symbol to n terms. A literal is an atom orthe negation of an atom. The complement of a literal (cid:96) is denoted (cid:96) (i.e. for any atom P , P = ¬ P and ¬ P = P ). The underlying atom of a literal (cid:96) is denoted | (cid:96) | (i.e. for any atom p , | P | = P and |¬ P | = P ). A clause is a multiset of literals. ⊥ denotes the empty clause .A unit clause is a clause with a single literal. Sequent notation is used for clauses (i.e. P , . . . , P n (cid:96) Q , . . . , Q m denotes the clause {¬ P , . . . , ¬ P n , Q , . . . , Q m } ). Var( t ) (resp.Var( (cid:96) ), Var(Γ)) denotes the set of variables in the term t (resp. in the literal (cid:96) and in theclause Γ). A substitution { x \ t , x \ t , . . . } is a mapping from variables { x , x , . . . } to,respectively, terms { t , t , . . . } . The application of a substitution σ to a term t , a literal (cid:96) or a clause Γ results in, respectively, the term tσ , the literal (cid:96)σ or the clause Γ σ , obtainedfrom t , (cid:96) and Γ by replacing all occurrences of the variables in σ by the corresponding termsin σ . A literal (cid:96) matches another literal (cid:96) (cid:48) if there is a substitution σ such that (cid:96)σ = (cid:96) (cid:48) .A unifier of a set of literals is a substitution that makes all literals in the set equal. Wewill use X (cid:118) Y to denote that X subsumes Y , when there exists a substitution σ such that Xσ ⊆ Y .The resolution calculus used in this paper has the following inference rules: Definition 2.1 (Resolution) . η : Γ (cid:48) L ∪ { (cid:96) L } η : Γ (cid:48) R ∪ { (cid:96) R } ψ : Γ (cid:48) L σ L ∪ Γ (cid:48) R σ R where σ L and σ R are substitutions such that (cid:96) L σ L = (cid:96) R σ R . The literals (cid:96) L and (cid:96) R are resolved literals , whereas (cid:96) L σ L and (cid:96) R σ R are its instantiated resolved literals . The pivot isthe underlying atom of its instantiated resolved literals (i.e. | (cid:96) L σ L | or, equivalently, | (cid:96) R σ R | ). Definition 2.2 (Factoring) . η : Γ (cid:48) ∪ { (cid:96) , . . . , (cid:96) n } ψ : Γ (cid:48) σ ∪ { (cid:96) } where σ is a unifier of { (cid:96) , . . . , (cid:96) n } and (cid:96) = (cid:96) i σ for any i ∈ { , . . . , n } .A resolution proof is a directed acyclic graph of clauses where the edges correspond tothe inference rules of resolution and factoring, as explained in detail in Definition 2.3. A resolution refutation is a resolution proof with root ⊥ .3 efinition 2.3 (First-Order Resolution Proof) . A directed acyclic graph (cid:104)
V, E, Γ (cid:105) , where V is a set of nodes and E is a set of edges labeled by literals and substitutions (i.e. E ⊂ V × L × S × V , where L is the set of all literals and S is the set of all substitutions, and v (cid:96) −→ σ v denotes an edge from node v to node v labeled by the literal (cid:96) and the substitution σ ), is a proof of a clause Γ iff it is inductively constructible according to the following cases: • Axiom:
If Γ is a clause, (cid:98)
Γ denotes some proof (cid:104){ v } , ∅ , Γ (cid:105) , where v is a new (axiom)node. • Resolution : If ψ L is a proof (cid:104) V L , E L , Γ L (cid:105) and ψ R is a proof (cid:104) V R , E R , Γ R (cid:105) , where Γ L and Γ R satisfy the requirements of Definition 2.1, then ψ L (cid:12) σ L σ R (cid:96) L (cid:96) R ψ R denotes a proof (cid:104) V, E, Γ (cid:105) s.t. V = V L ∪ V R ∪ { v } E = E L ∪ E R ∪ (cid:26) ρ ( ψ L ) { (cid:96) L } −−−→ σ L v, ρ ( ψ R ) { (cid:96) R } −−−→ σ R v (cid:27) Γ = Γ (cid:48) L σ L ∪ Γ (cid:48) R σ R where v is a new (resolution) node and ρ ( ϕ ) denotes the root node of ϕ . • Factoring: If ψ (cid:48) is a proof (cid:104) V (cid:48) , E (cid:48) , Γ (cid:48) (cid:105) such that Γ satisfies the requirements of Defi-nition 2.2, then (cid:98) ψ (cid:99) σ { (cid:96) ,...(cid:96) n } denotes a proof (cid:104) V, E, Γ (cid:105) s.t. V = V (cid:48) ∪ { v } E = E (cid:48) ∪ { ρ ( ψ (cid:48) ) { (cid:96) ,...(cid:96) n } −−−−−−→ σ v } Γ = Γ (cid:48) σ ∪ { (cid:96) } where v is a new (factoring) node, and ρ ( ϕ ) denotes the root node of ϕ . Example 2.1.
An example first-order resolution proof is shown below. η : Q ( x ) , Q ( a ) (cid:96) P ( b ) η : P ( b ) (cid:96) η : Q ( x ) , Q ( a ) (cid:96) η (cid:48) : Q ( a ) (cid:96) η η : (cid:96) P ( b ) , Q ( y ) η : (cid:96) Q ( y ) ψ : ⊥ The nodes η , η , and η are axioms. Node η is obtained by resolution on η and η where (cid:96) L = P ( b ) , (cid:96) R = ¬ P ( b ) , and σ L = σ R = ∅ . The node η (cid:48) is obtained by a factoring on η with σ = { x \ a } . The node η is the result of resolution on η and η with (cid:96) L = ¬ P ( b ) , (cid:96) R = P ( b ) , σ L = σ R = ∅ . Lastly, the conclusion node ψ is the result of a resolution of η (cid:48) and η , where (cid:96) L = ¬ Q ( a ) , (cid:96) R = Q ( y ) , σ L = ∅ , and σ R = { y \ a } . The directed acyclicgraph representation of the proof (with edge labels omitted) is shown in Figure 1. This is referred to as “binary resolution” elsewhere, with the understanding that “binary” refers to thenumber of resolved literals, rather than the number of premises of the inference rule. η η η η η (cid:48) ψ Figure 1: The proof in Example 2.1.
RecyclePivotsWithIntersection
This section explains
RecyclePivotsWithIntersection ( RPI ) [10], which aims to compressirregular propositional proofs. It can be seen as a simple but significant modification of the RP algorithm described in [2], from which it derives its name. Although in the worst casefull regularization can increase the proof length exponentially [31], these algorithms showthat many irregular proofs can have their length decreased if a careful partial regularizationis performed.We write ψ [ η ] to denote a proof-context ψ [ ] with a single placeholder replaced by thesubproof η . We say that a proof of the form ψ [ η (cid:12) p ψ (cid:48) [ η (cid:48) (cid:12) p η ]] is irregular . Example 3.1.
Consider an irregular proof and assume, without loss of generality, that p ∈ η and p ∈ η (cid:48) , as in the proof of ψ below. The proof of ψ can be written as ( η (cid:12) p ( η (cid:12) ( η (cid:48) (cid:12) p η (cid:48)(cid:48) ))) ,or ( η (cid:12) p ψ (cid:48) [( η (cid:48) (cid:12) p η (cid:48)(cid:48) )]) where ψ (cid:48) [( η (cid:48) (cid:12) p η (cid:48)(cid:48) )] = ( η (cid:12) ( η (cid:48) (cid:12) p η (cid:48)(cid:48) )) is the sub-proof of ¬ p . η : p η : ¬ r, ¬ p η (cid:48) : p η (cid:48)(cid:48) : ¬ p, r pr ¬ p pψ : ⊥ Then, if η (cid:48) (cid:12) p η (cid:48)(cid:48) is replaced by η (cid:48)(cid:48) within the proof-context ψ (cid:48) [ ] , the clause η (cid:12) p ψ (cid:48) [ η (cid:48)(cid:48) ] subsumes the clause η (cid:12) p ψ (cid:48) [ η (cid:48) (cid:12) p η (cid:48)(cid:48) ] , because even though the literal ¬ p of η (cid:48)(cid:48) is propagateddown, it gets resolved against the literal p of η later on below in the proof. More precisely,even though it might be the case that ¬ p ∈ ψ (cid:48) [ η (cid:48)(cid:48) ] while ¬ p / ∈ ψ (cid:48) [ η (cid:48) (cid:12) p η (cid:48)(cid:48) ] , it is necessarilythe case that ¬ p / ∈ η (cid:12) p ψ (cid:48) [ η (cid:48) (cid:12) p η (cid:48)(cid:48) ] and ¬ p / ∈ η (cid:12) p ψ (cid:48) [ η (cid:48)(cid:48) ] . In this case, the proof can beregularized as follows. η : p η : ¬ r, ¬ p η (cid:48)(cid:48) : ¬ p, r ¬ p pψ : ⊥ Although the remarks above suggest that it is safe to replace η (cid:48) (cid:12) p η (cid:48)(cid:48) by η (cid:48)(cid:48) within theproof-context ψ (cid:48) [ ], this is not always the case. If a node in ψ (cid:48) [ ] has a child in ψ [ ], thenthe literal ¬ p might be propagated down to the root of the proof, and hence, the clause ψ [ η (cid:12) p ψ (cid:48) [ η (cid:48)(cid:48) ]] might not subsume the clause ψ [ η (cid:12) p ψ (cid:48) [ η (cid:48) (cid:12) p η (cid:48)(cid:48) ]]. Therefore, it is only safeto do the replacement if the literal ¬ p gets resolved in all paths from η (cid:48)(cid:48) to the root or if italready occurs in the root clause of the original proof ψ [ η (cid:12) p ψ (cid:48) [ η (cid:48) (cid:12) p η (cid:48)(cid:48) ]].These observations lead to the idea of traversing the proof in a bottom-up manner,storing for every node a set of safe literals that get resolved in all paths below it in the proof(or that already occurred in the root clause of the original proof). Moreover, if one of thenode’s resolved literals belongs to the set of safe literals, then it is possible to regularize thenode by replacing it by one of its parents (cf. Algorithm 1).5 η : a, c, ¬ b η : ¬ a η : a, b aη : b bη : a, c aη : c η η : a, ¬ b, ¬ c bη : a, ¬ c η aη : ¬ c cψ : ⊥ (a) A propositional proof before compression by RPI . η : ¬ a η : a, c, ¬ b η : a, bη : a, cη : c η η : a, ¬ c, ¬ bη : a, ¬ c η η : ¬ cψ : ⊥ (b) A propositional proof after compression by RPI . Figure 2: A
RPI example.The regularization of a node should replace a node by one of its parents, and moreprecisely by the parent whose clause contains the resolved literal that is safe. After regu-larization, all nodes below the regularized node may have to be fixed. However, since theregularization is done with a bottom-up traversal, and only nodes below the regularizednode need to be fixed, it is again possible to postpone fixing and do it with only a singletraversal afterwards. Therefore, instead of replacing the irregular node by one of its parentsimmediately, its other parent is marked as deletedNode , as shown in Algorithm 2. Onlylater during fixing, the irregular node is actually replaced by its surviving parent (i.e. theparent that is not marked as deletedNode ).The set of safe literals of a node η can be computed from the set of safe literals of itschildren (cf. Algorithm 3). In the case when η has a single child ς , the safe literals of η aresimply the safe literals of ς together with the resolved literal p of ς belonging to η ( p is safefor η , because whenever p is propagated down the proof through η , p gets resolved in ς ). Itis important to note, however, that if ς has been marked as regularized, it will eventuallybe replaced by η , and hence p should not be added to the safe literals of η . In this case,the safe literals of η should be exactly the same as the safe literals of ς . When η has severalchildren, the safe literals of η w.r.t. a child ς i contain literals that are safe on all paths thatgo from η through ς i to the root. For a literal to be safe for all paths from η to the root, itshould therefore be in the intersection of the sets of safe literals w.r.t. each child.The RP and the RPI algorithms differ from each other mainly in the computation ofthe safe literals of a node that has many children. While
RPI returns the intersection asshown in Algorithm 3, RP returns the empty set (cf. Algorithm 4). Additionally, while in RPI the safe literals of the root node contain all the literals of the root clause, in RP the input : A proof ψ output: A possibly less-irregular proof ψ (cid:48) ψ (cid:48) ← ψ ; traverse ψ (cid:48) bottom-up and foreach node η in ψ (cid:48) do if η is a resolvent node then setSafeLiterals( η ) ; regularizeIfPossible( η ) ψ (cid:48) ← fix( ψ (cid:48) ) ; return ψ (cid:48) ; Algorithm 1:
RPI n nodes, lines from 5to 10 are executed at most 2 n times, and the algorithm remains linear. In our prototypeimplementation, the sets of safe literals are instances of Scala’s mutable.HashSet class.Being mutable, new elements can be added efficiently. And being HashSets, membershipchecking is done in constant time in the average case, and set intersection (line 12) can bedone in O ( k.s ), where k is the number of sets and s is the size of the smallest set. Example 3.2.
When applied to the proof ψ shown in Figure 2a, the algorithm RPI assigns { a, c } and { a, ¬ c } as the safe literals of, respectively, η and η . The safe literals of η w.r.t.its children η and η are respectively { a, c, b } and { a, ¬ c, b } , and hence the safe literals of η are { a, b } (the intersection of { a, c, b } and { a, ¬ c, b } ). Since the right resolved literal of η ( a ) belongs to η ’s safe literals, η is correctly detected as a redundant node and henceregularized: η is replaced by its right parent η . The resulting proof is shown in Figure 2b. In this section, we describe challenges that have to be overcome in order to successfullyadapt
RPI to the first-order case. The first example illustrates the need to take unificationinto account. The other two examples discuss complex issues that can arise when unificationis taken into account in a naive way.
Example 4.1.
Consider the following proof ψ . When computed as in the propositional case,the safe literals for η are { Q ( c ) , P ( a, x ) } . η : P ( y, b ) (cid:96) η : (cid:96) P ( w, x ) η : P ( w, x ) (cid:96) Q ( c ) η : (cid:96) Q ( c ) η : Q ( c ) (cid:96) P ( a, x ) η : (cid:96) P ( a, x ) ψ : ⊥ As neither of η ’s resolved literals is syntactically equal to a safe literal, the propositional RPI algorithm would not change ψ . However, η ’s left resolved literal P ( w, x ) ∈ η is unifiablewith the safe literal P ( a, x ) . Regularizing η , by deleting the edge between η and η andreplacing η by η , leads to further deletion of η (because it is not resolvable with η ) andfinally to the much shorter proof below. η : (cid:96) P ( w, x ) η : P ( y, b ) (cid:96) ψ (cid:48) : ⊥ Unlike in the propositional case, where a resolved literal must be syntactically equal to a safeliteral for regularization to be possible, the example above suggests that, in the first-ordercase, it might suffice that the resolved literal be unifiable with a safe literal. However, thereare cases, as shown in the example below, where mere unifiability is not enough and greatercare is needed. 7 nput :
A node η output: nothing (but the proof containing η may be changed) if η .rightResolvedLiteral ∈ S ( η ) then mark left parent of η as deletedNode ; mark η as regularized else if η .leftResolvedLiteral ∈ S ( η ) then mark right parent of η as deletedNode ; mark η as regularized Algorithm 2: regularizeIfPossible input :
A node η output: nothing (but the node η gets a set of safe literals) if η is a root node with no children then S ( η ) ← η .clause else foreach η (cid:48) ∈ η .children do if η (cid:48) is marked as regularized then safeLiteralsFrom( η (cid:48) ) ← S ( η (cid:48) ) ; else if η is left parent of η (cid:48) then safeLiteralsFrom( η (cid:48) ) ← S ( η (cid:48) ) ∪ { η (cid:48) .rightResolvedLiteral } ; else if η is right parent of η (cid:48) then safeLiteralsFrom( η (cid:48) ) ← S ( η (cid:48) ) ∪ { η (cid:48) .leftResolvedLiteral } ; S ( η ) ← (cid:84) η (cid:48) ∈ η .children safeLiteralsFrom( η (cid:48) ) Algorithm 3: setSafeLiterals input :
A node η output: nothing (but the node η gets a set of safe literals) if η is a root node with no children then S ( η ) ← ∅ else if η has only one child η (cid:48) then if η (cid:48) is marked as regularized then S ( η ) ← S ( η (cid:48) ) ; else if η is left parent of η (cid:48) then S ( η ) ← S ( η (cid:48) ) ∪ { η (cid:48) .rightResolvedLiteral } ; else if η is right parent of η (cid:48) then S ( η ) ← S ( η (cid:48) ) ∪ { η (cid:48) .leftResolvedLiteral } ; else S ( η ) ← ∅ Algorithm 4: setSafeLiterals for RP : Q ( f ( a, e ) , c ) (cid:96) η : (cid:96) P ( c, d ) η : P ( u, v ) (cid:96) Q ( f ( a, v ) , u ) η : Q ( f ( a, x ) , y ) , Q ( t, x ) (cid:96) Q ( f ( a, z ) , y ) η : P ( u, v ) , Q ( t, v ) (cid:96) Q ( f ( a, z ) , u ) η : (cid:96) Q ( r, s ) η : P ( u, v ) (cid:96) Q ( f ( a, z ) , u ) η : (cid:96) Q ( f ( a, z ) , c ) ψ : ⊥ Figure 3: An example where pre-regularizability is not sufficient.
Example 4.2.
The node η appears to be a candidate for regularization when the safeliterals are computed as in the propositional case and unification is considered na¨ıvely. Notethat S ( η ) = { Q ( c ) , P ( a, x ) } , and the resolved literal P ( a, c ) is unifiable with the safe literal P ( a, x ) , η : P ( y, b ) (cid:96) η : (cid:96) P ( a, c ) η : P ( a, c ) (cid:96) Q ( c ) η : (cid:96) Q ( c ) η : Q ( c ) (cid:96) P ( a, x ) η : (cid:96) P ( a, x ) ψ : ⊥ However, if we attempt to regularize the proof, the same series of actions as in Example 4.1would require resolution between η and η , which is not possible. One way to prevent the problem depicted above would be to require the resolved literalto be not only unifiable but subsume a safe literal. A weaker (and better) requirement ispossible, and requires a slight modification of the concept of safe literals, taking into accountthe unifications that occur on the paths from a node to the root.
Definition 4.1.
The set of safe literals for a node η in a proof ψ with root clause Γ, denoted S ( η ), is such that (cid:96) ∈ S ( η ) if and only if (cid:96) ∈ Γ or for all paths from η to the root of ψ thereis an edge v (cid:96) (cid:48) −→ σ v with (cid:96) (cid:48) σ = (cid:96) .As in the propositional case, safe literals can be computed in a bottom-up traversal ofthe proof. Initially, at the root, the safe literals are exactly the literals that occur in theroot clause. As we go up, the safe literals S ( η (cid:48) ) of a parent node η (cid:48) of η where η (cid:48) (cid:96) −→ σ η is setto S ( η ) ∪ { (cid:96)σ } . Note that we apply the substitution to the resolved literal before adding itto the set of safe literals (cf. algorithm 3, lines 8 and 10). In other words, in the first-ordercase, the set of safe literals has to be a set of instantiated resolved literals.In the case of Example 4.2, computing safe literals as defined above would result in S ( η ) = { Q ( c ) , P ( a, b ) } , where clearly the pivot P ( a, c ) in η is not safe. A generalizationof this requirement is formalized below. Definition 4.2.
Let η be a node with safe literals S ( η ) and parents η and η , assumingwithout loss of generality, η { (cid:96) } −−−→ σ η . The node η is said to be pre-regularizable in the proof ψ if (cid:96) σ matches a safe literal (cid:96) ∗ ∈ S ( η ).This property states that a node is pre-regularizable if an instantiated resolved literal (cid:96) (cid:48) matches a safe literal. The notion of pre-regulariziability can be thought of as a necessary condition for recycling the node η . 9 xample 4.3. Satisfying the pre-regularizability is not sufficient. Consider the proof ψ inFigure 3. After collecting the safe literals, S ( η ) = {¬ Q ( r, v ) , ¬ P ( c, d ) , Q ( f ( a, e ) , c ) } . η ’spivot Q ( f ( a, v ) , u ) matches the safe literal Q ( f ( a, e ) , c ) . Attempting to regularize η wouldlead to the removal of η , the replacement of η by η and the removal of η (because η does not contain the pivot required by η ), with η also being replaced by η . Then resolutionbetween η and η results in η (cid:48) , which cannot be resolved with η , as shown below. η : Q ( f ( a, e ) , c ) (cid:96) η : (cid:96) P ( c, d ) η : P ( u, v ) (cid:96) Q ( f ( a, v ) , u ) η (cid:48) : (cid:96) Q ( f ( a, d ) , c ) ψ (cid:48) : ?? η ’s literal Q ( f ( a, v ) , u ) , which would be resolved with η ’s literal, was changed to Q ( f ( a, d ) , c ) due to the resolution between η and η . Thus we additionally require that the following condition be satisfied.
Definition 4.3.
Let η be pre-regularizable, with safe literals S ( η ) and parents η and η ,with clauses Γ and Γ respectively, assuming without loss of generality that η { (cid:96) } −−−→ σ η suchthat (cid:96) σ matches a safe literal (cid:96) ∗ ∈ S ( η ). The node η is said to be strongly regularizable in ψ if Γ σ (cid:118) S ( η ).This condition ensures that the remainder of the proof does not expect a variable in η to be unified to different values simultaneously. This property is not necessary in thepropositional case, as the literals of the replacement node would not change lower in theproof.The notion of strongly regularizable can be thought of as a sufficient condition. Theorem 4.4.
Let ψ be a proof with root clause Γ and η be a node in ψ . Let ψ † = ψ \ { η } and Γ † be the root of ψ † . If η is strongly regularizable, then Γ † (cid:118) Γ .Proof. By definition of strong regularizability, η is such that there is a node η (cid:48) with clauseΓ (cid:48) and such that η (cid:48) { (cid:96) (cid:48) } −−→ σ (cid:48) η and (cid:96) (cid:48) σ (cid:48) matches a safe literal (cid:96) ∗ ∈ S ( η ) and Γ (cid:48) σ (cid:48) (cid:118) S ( η ).Firstly, in ψ † , η has been replaced by η (cid:48) . Since Γ (cid:48) σ (cid:48) (cid:118) S ( η ), by definition of S ( η ), everyliteral (cid:96) in Γ (cid:48) either subsumes a single literal that occurs as a pivot on every path from η to the root in ψ (and hence on every new path from η (cid:48) to the root in ψ † ) or subsumesliterals (cid:96)σ ,. . . , (cid:96)σ n in Γ. In the former case, (cid:96) is resolved away in the construction of ψ † (by contracting the descendants of (cid:96) with the pivots in each path). In the latter case, theliteral (cid:96)σ k (1 ≤ k ≤ n ) in Γ is a descendant of (cid:96) through a path k and the substitution σ k isthe composition of all substitutions on this path. When η is replaced by η (cid:48) , two things mayhappen to (cid:96)σ k . If the path k does not go through η , (cid:96)σ k remains unchanged (i.e. (cid:96)σ k ∈ Γ † unless the path k ceases to exist in ψ † ). If the path k goes through η , the literal is changedto (cid:96)σ † k , where σ † k is such that σ k = σ (cid:48) σ † k .Secondly, when η is replaced by η (cid:48) , the edge from η ’s other parent η (cid:48)(cid:48) to η ceases to existin ψ † . Consequently, any literal (cid:96) in Γ that is a descendant of a literal (cid:96) (cid:48)(cid:48) in the clause of η (cid:48)(cid:48) through a path via η will not belong to Γ † .Thirdly, a literal from Γ that descends neither from η (cid:48) nor from η (cid:48)(cid:48) either remains un-changed in Γ † or, if the path to the node from which it descends ceases to exist in theconstruction of ψ † , does not belong to Γ † at all.Therefore, by the three facts above, Γ † σ (cid:48) (cid:118) Γ, and hence Γ † (cid:118) Γ.10s the name suggests, strong regularizability is stronger than necessary. In some cases,nodes may be regularizable even if they are not strongly regularizable. A weaker condition(conjectured to be sufficient) is presented below. This alternative relies on knowledge of howliterals are changed after the deletion of a node in a proof (and it is inspired by the post-deletion unifiability condition described for
FirstOrderLowerUnits in [12]). However, sinceweak regularizability is more complicated to check, it is not as suitable for implementationas strong regularizability.
Definition 4.4.
Let η be a pre-regularizable node with parents η and η , assuming withoutloss of generality that η { (cid:96) } −−−→ σ η such that (cid:96) is unifiable with some (cid:96) ∗ ∈ S ( η ). For eachsafe literal (cid:96) = (cid:96) s σ s ∈ S ( η ), let η (cid:96) be a node on the path from η to the root of the proofsuch that | (cid:96) | is the pivot of η (cid:96) . Let R ( η (cid:96) ) be the set of all resolved literals (cid:96) (cid:48) s such that η (cid:48) { (cid:96) s } −−−→ σ s η (cid:96) , η (cid:48) { (cid:96) (cid:48) s } −−−→ σ (cid:48) s η (cid:96) , and (cid:96) s σ s = (cid:96) (cid:48) s σ (cid:48) s , for some nodes η (cid:48) and η (cid:48) and unifier σ (cid:48) s ; if no suchnode η (cid:96) exists, define R ( η (cid:96) ) = ∅ . The node η is said to be weakly regularizable in ψ if, for all (cid:96) ∈ S ( η ), all elements in R † ( η (cid:96) ) ∪ { (cid:96) † } are unifiable, where (cid:96) † is the literal in ψ \ { η } thatused to be (cid:96) in ψ and R † ( η (cid:96) ) is the set of literals in ψ \ { η } that used to be the literals of R ( η (cid:96) ) in ψ .This condition requires the ability to determine the underlying (uninstantiated) literalfor each safe literal of a weakly regularizable node η . To achieve this, one could store safeliterals as a pair ( (cid:96) s , σ s ), rather than as an instantiated literal (cid:96) s σ s , although this is notnecessary for the previous conditions.Note further that there is always at least one node η (cid:96) as assumed in the definition forany safe literal which was not contained in the root clause of the proof: the node whichresulted in (cid:96) = (cid:96) s σ s ∈ S ( η ) being a safe literal for the path from η to the root of the proof.Furthermore, it does not matter which node η (cid:96) is used. To see this, consider some node η (cid:48) (cid:96) (cid:54) = η (cid:96) with the same pivot | (cid:96) | = | (cid:96) s σ s | . Consider arbitrary nodes η and η such that η { (cid:96) s } −−−→ σ s η (cid:96) and η { (cid:96) } −−−→ σ η (cid:96) where (cid:96) s σ s = (cid:96) σ . Now consider arbitrary nodes η (cid:48) and η (cid:48) suchthat η (cid:48) { (cid:96) s } −−−→ σ s η (cid:48) (cid:96) and η (cid:48) { (cid:96) (cid:48) } −−−→ σ (cid:48) η (cid:48) (cid:96) where (cid:96) s σ s = (cid:96) (cid:48) σ (cid:48) . Since the pivots for η (cid:96) and η (cid:48) (cid:96) are equal,we must have that | (cid:96) s σ s | = | (cid:96) σ | and | (cid:96) s σ s | = | (cid:96) (cid:48) σ (cid:48) | , and thus | (cid:96) σ | = | (cid:96) (cid:48) σ (cid:48) | . This showsthat it does not matter which η (cid:96) we use; the instantiated resolved literals will always beequal implying that both of the resolved literals (cid:96) and (cid:96) (cid:48) will be contained in both R ( η (cid:96) )and R ( η (cid:48) (cid:96) ).Informally, a node η is weakly regularizable in a proof if it can be replaced by one ofits parents η , such that for each (cid:96) ∈ S ( η ), | (cid:96) | can still be used as a pivot in order tocomplete the proof. Weakly regularizable nodes differ from strongly regularizable nodes bynot requiring the entire parent η replacing the resolution η to be simultaneously matchedto a subset of S ( η ), and requires knowledge of how literals will be instantiated after theremoval of η and η from the proof. Example 4.5.
This example illustrates a case where a node is weakly regularizable butnot strongly regularizable. Table 1 shows the sets S ( η ) , R ( η ) and R † ( η ) for the nodes η in the proof below. Observe that η is pre-regularizable, since ¬ P ( x ) is unifiable with ¬ P ( w ) ∈ S ( η ) . In fact, η is the only pre-regularizable node in the proof, and thus the sets Because of the removal of η , (cid:96) † may differ from (cid:96) . S ( η ) R ( η ) R † ( η ) η { P ( w ) } ∅ ∅ η {¬ P ( w ) } ∅ ∅ η { R ( a ) , ¬ P ( w ) } ∅ ∅ η {¬ R ( a ) , ¬ P ( w ) } ∅ ∅ η { Q ( z ) , ¬ R ( a ) , ¬ P ( w ) } ∅ ∅ η {¬ P ( w ) , ¬ Q ( z ) , ¬ R ( a ) } { P ( u ) , P ( y ) } { P ( u ) } η { P ( y ) , ¬ P ( w ) , ¬ Q ( z ) , ¬ R ( a ) } ∅ ∅ η {¬ P ( y ) , ¬ P ( w ) , ¬ Q ( z ) , ¬ R ( a ) } ∅ ∅ Table 1: The sets S ( η ) and R ( η ) for each node η in the first proof of Example 4.5. R ( η ) = ∅ for all η (cid:54) = η . In the proof below, note that η is not strongly regularizable: thereis no unifier σ such that {¬ P ( x ) , ¬ Q ( x ) , ¬ R ( x ) } σ ⊆ S ( η ) . η : (cid:96) P ( u ) η : P ( z ) (cid:96) Q ( z ) η : P ( x ) , Q ( x ) , R ( a ) (cid:96) η : (cid:96) P ( y ) η : Q ( y ) , R ( a ) (cid:96) η : P ( z ) , R ( a ) (cid:96) η : (cid:96) R ( a ) η : P ( z ) (cid:96) ψ : ⊥ We show that η is weakly regularizable, and that η can be removed. Recalling that η ispre-regularizable, observe that R † ( η ) ∪ {¬ P ( w ) } is unifiable. Consider the following proofof ψ \ { η } : η : (cid:96) P ( u ) η : P ( x ) , Q ( x ) , R ( a ) (cid:96) η : P ( z ) (cid:96) Q ( z ) η (cid:48) : P ( z ) , P ( z ) , R ( a ) (cid:96) η : P ( z ) , R ( a ) (cid:96) η : (cid:96) R ( a ) η : P ( z ) (cid:96) ψ : ⊥ Now observe that for each (cid:96) ∈ S ( η ) we have the following, showing that η is weaklyregularizable: • (cid:96) = ¬ Q ( y ) : (cid:96) † = ¬ Q ( x ) which is unifiable with (cid:96) † = Q ( z ) • (cid:96) = ¬ R ( a ) : (cid:96) † = ¬ R ( a ) which is (trivially) unifiable with (cid:96) † = R ( a ) • (cid:96) = ¬ P ( w ) : (cid:96) † = ¬ P ( z ) which is unifiable with (cid:96) † = P ( u ) • (cid:96) = ¬ P ( y ) : (cid:96) † = ¬ P ( z ) which is unifiable with (cid:96) † = P ( u )If a node η with parents η and η is pre-regularizable and strongly regularizable in ψ ,then η is also weakly regularizable in ψ . FirstOrderRecyclePivotsWithIntersection ( FORPI ) (cf. Algorithm 5) is a first-ordergeneralization of the propositional
RPI . FORPI traverses the proof in a bottom-up manner,12 nput :
A first-order proof ψ output: A possibly less-irregular first-order proof ψ (cid:48) ψ (cid:48) ← ψ ; traverse ψ (cid:48) bottom-up and foreach node η in ψ (cid:48) do if η is a resolvent node then setSafeLiterals( η ) ; regularizeIfPossible( η ) ψ (cid:48) ← fix( ψ (cid:48) ) ; return ψ (cid:48) ; Algorithm 5:
FORPI input :
A node ψ = ψ L (cid:12) σ L σ R (cid:96) L (cid:96) R ψ R output: nothing (but the proof containing ψ may be changed) if ∃ σ and (cid:96) ∈ S ( ψ ) such that (cid:96) = (cid:96) R σ R σ then if ψ R σ R σ ⊆ S ( ψ ) then mark ψ L as deletedNode ; mark ψ as regularized else if ∃ σ and (cid:96) ∈ S ( ψ ) such that (cid:96) = (cid:96) L σ L σ then if ψ L σ L σ ⊆ S ( ψ ) then mark ψ R as deletedNode ; mark ψ as regularized Algorithm 6: regularizeIfPossible for
FORPI storing for every node a set of safe literals. The set of safe literals for a node ψ is computedfrom the set of safe literals of its children (cf. Algorithm 7), similarly to the propositionalcase, but additionally applying unifiers to the resolved literals (cf. Example 4.2). If oneof the node’s resolved literals matches a literal in the set of safe literals, then it may bepossible to regularize the node by replacing it by one of its parents.In the first-order case, we additionally check for strong regularizability (cf. lines 2 and6 of Algorithm 6). Similarly to RPI , instead of replacing the irregular node by one of itsparents immediately, its other parent is marked as a deletedNode , as shown in Algorithm 6.As in the propositional case, fixing of the proof is postponed to another (single) traversal,as regularization proceeds top-down and only nodes below a regularized node may requirefixing. During fixing, the irregular node is actually replaced by the parent that is notmarked as deletedNode . During proof fixing, factoring inferences can be applied, in orderto compress the proof further.Note that, in order to reduce notation clutter in the pseudocodes, we slightly abusenotation and do not explicitly distinguish proofs, their root nodes and the clauses stored intheir root nodes. It is clear from the context whether ψ refers to a proof, to its root nodeor to its root clause. A prototype version of
FORPI has been implemented in the functional programming languageScala as part of the
Skeptik library. This library includes an implementation of
GFOLU [12].In order to evaluate the algorithm’s effectiveness,
FORPI was tested on two data sets: proofsgenerated by a real theorem prover and randomly-generated resolution proofs. The proofsare included in the source code repository, available at https://github.com/jgorzny/ nput : A first-order resolution node ψ output: nothing (but the node ψ gets a set of safe literals) if ψ is a root node with no children then S ( ψ ) ← ψ .clause else foreach ψ (cid:48) ∈ ψ .children do if ψ (cid:48) is marked as regularized then safeLiteralsFrom( ψ (cid:48) ) ← S ( ψ (cid:48) ) ; else if ψ (cid:48) = ψ (cid:12) σ L σ R (cid:96) L (cid:96) R ψ R for some ψ R then safeLiteralsFrom( ψ (cid:48) ) ← S ( ψ (cid:48) ) ∪ { (cid:96) R σ R } else if ψ (cid:48) = ψ L (cid:12) σ L σ R (cid:96) L (cid:96) R ψ for some ψ L then safeLiteralsFrom( ψ (cid:48) ) ← S ( ψ (cid:48) ) ∪ { (cid:96) L σ L } S ( ψ ) ← (cid:84) ψ (cid:48) ∈ ψ .children safeLiteralsFrom( ψ (cid:48) ) Algorithm 7: setSafeLiterals for
FORPISkeptik . Note that by implementing the algorithms in this library, we have a relativeguarantee that the compressed proofs are correct, as in
Skeptik every inference rule (e.g.resolution, factoring) is implemented as a small class (each at most 178 lines of code that isassumed correct) with a constructor that checks whether the conditions for the applicationof the rule are met, thereby preventing the creation of objects representing incorrect proofnodes (i.e. unsound inferences). We only need to check that the root clause of the compressedproof is equal to or stronger than the root clause of the input proof and that the set of axiomsused in the compressed proof is a (possibly non-proper) subset of the set of axioms used inthe input proof.First,
FORPI was evaluated on the same proofs used to evaluate
GFOLU . These proofs weregenerated by executing the
SPASS theorem prover ( ) on1032 real-world unsatisfiable first-order problems without equality from the TPTP ProblemLibrary [29]. In order to generate pure resolution proofs, the advanced inference rules of
SPASS were disabled: the only enabled inference rules used were “Standard Resolution” and“Condensation”. The proofs were originally generated on the Euler Cluster at the Universityof Victoria with a time limit of 300 seconds per problem. Under these conditions,
SPASS was able to generate 308 proofs. The proofs generated by
SPASS were small: proof lengthsvaried from 3 to 49, and the number of resolutions in a proof ranged from 1 to 32.In order to test
FORPI ’s effectiveness on larger proofs, a total of 2280 proofs were ran-domly generated and then used as a second benchmark set. The randomly generated proofswere much larger than those of the first data set: proof lengths varied from 95 to 700, whilethe number of resolutions in a proof ranged from 48 to 368.
Additional proofs were generated by the following procedure: start with a root node whoseconclusion is ⊥ , and make two premises η and η using a randomly generated literal suchthat the desired conclusion is the result of resolving η and η . For each node η i , determinethe inference rule used to make its conclusion: with probability p = 0 . η i is the result ofa resolution, otherwise it is the result of factoring.Literals are generated by uniformly choosing a number from { , . . . , k, k + 1 } where k is the number of predicates generated so far; if the chosen number j is between 1 and k ,the j -th predicate is used; otherwise, if the chosen number is k + 1, a new predicate with a14lgorithm GFOLU (p) 55 (17.9%) 817 (35.9%) 872 (33.7%) 107 (4.8%) 17,769 (4.5%) 17,876 (4.5%)
FORPI (p) 23 (7.5%) 666 (29.2%) 689 (26.2%) 36 (1.6%) 28,904 (7.3%) 28,940 (7.3%)
GFOLU ( FORPI (p)) 55 (17.9%) 1303 (57.1%) 1358 (52.5%) 120 (5.4%) 48,126 (12.2%) 48,246 (12.2%)
FORPI ( GFOLU (p)) 23 (7.5%) 1302 (57.1%) 1325 (51.2%) 120 (5.4%) 48,434 (12.3%) 48,554 (12.3%)Best 59 (19.2%) 1303 (57.1%) 1362 (52.5%) 120 (5.4%) 55,530 (14.1%) 55,650 (14.0%)Table 2: Number of proofs compressed and number of overall nodes removednew random arity (at most four) is generated and used. Each argument is a constant withprobability p = 0 . η should be the result of a resolution, then with probability p = 0 . η (cid:96) and a right parent η r for η (i.e. η = η (cid:96) (cid:12) η r ) having a common parent η c (i.e. η l = ( η (cid:96) ) (cid:96) (cid:12) η c and η r = η c (cid:12) ( η r ) r , for some newly generated nodes ( η (cid:96) ) (cid:96) and ( η r ) r ).The common parent ensures that also non-tree-like DAG proofs are generated.This procedure is recursively applied to the generated parent nodes. Each parent of aresolution has each of its terms not contained in the pivot replaced by a fresh variable withprobability p = 0 .
7. At each recursive call, the additional minimum height required forthe remainder of the branch is decreased by one with probability p = 0 .
5. Thus if eachbranch always decreases the additional required height, the proof has height equal to theinitial minimum value. The process stops when every branch is required to add a subproofof height zero or after a timeout is reached. In any case, the topmost generated node foreach branch is generated as an axiom node.The minimum height was set to 7 (which is the minimum number of nodes in an irregularproof plus one) and the timeout was set to 300 seconds (the same timeout allowed for
SPASS ). The probability values used in the random generation were carefully chosen toproduce random proofs similar in shape to the real proofs obtained by
SPASS . For instance,the probability of a new node being a resolution (respectively, factoring) is approximatelythe same as the frequency of resolutions (respectively, factorings) observed in the real proofsproduced by
SPASS . For consistency, the same system and metrics were used. Proof compression and proofgeneration was performed on a laptop (2.8GHz Intel Core i7 processor with 4GB of RAM(1333MHz DDR3) available to the Java Virtual Machine). For each proof ψ , we measuredthe time needed to compress the proof ( t ( ψ )) and the compression ratio (( | ψ | − | α ( ψ ) | ) / | ψ | )where | ψ | is the number of resolutions in the proof, and α ( ψ ) is the result of applying acompression algorithm or some composition of FORPI and
GFOLU . Note that we consideronly the number of resolutions in order to compare the results of these algorithms to theirpropositional variants (where factoring is implicit). Moreover, factoring could be madeimplicit within resolution inferences even in the first-order case and we use explicit factoringonly for technical convenience.Table 2 summarizes the results of
FORPI and its combinations with
GFOLU . The first setof columns describes the percentage of proofs that were compressed by each compression15lgorithm First-Order Compression Algorithm Propositional Compression [4]All Compressed Only
GFOLU (p) 4.5% 13.5% LU (p) 7.5% FORPI (p) 6.2% 23.2%
RPI (p) 17.8%
GFOLU ( FORPI (p)) 10.6% 23.0% ( LU ( RPI (p)) 21.7%
FORPI ( GFOLU (p)) 11.1% 21.5% (
RPI ( LU (p)) 22.0%Best 12.6% 24.4% Best 22.0%Table 3: Mean compression resultsalgorithm. The algorithm ‘Best’ runs both combinations of GFOLU and
FORPI and returnsthe shortest proof output by either of them. The total number of proofs is 308+2280 = 2588and the total number of resolution nodes is 2 ,
249 + 393 ,
883 = 396 , ψ ∈ Ψ | ψ | − Σ ψ ∈ Ψ | α ( ψ ) | ) / (Σ ψ ∈ Ψ | ψ | ) for eachdata set Ψ (TPTP, Random, or Both). The use of FORPI alongside
GFOLU allows at leastan additional 17.5% of proofs to be compressed. Furthermore, the use of both algorithmsremoves almost twice as many nodes than any single algorithm.Table 3 compares the results of
FORPI and its combinations with
GFOLU with their propo-sitional variants as evaluated in [4]. The first column describes the mean compression ratiofor each algorithm including proofs that were not compressed by the algorithm, while thesecond column calculates the mean compression ratio considering only compressed proofs.It is unsurprising that the first column is lower than the propositional mean for each al-gorithm: there are stricter requirements to apply these algorithms to first-order proofs. Inparticular, additional properties must be satisfied before a unit can be lowered, or beforea pivot can be recycled. On the other hand, when first-order proofs are compressed, thecompression ratios are on par with or better than their propositional counterparts.Figure 4 (a) shows the number of proofs (compressed and uncompressed) per groupingbased on number of resolutions in the proof. The red (resp. dark grey) data shows thenumber of compressed (resp. uncompressed) proofs for the TPTP data set, while the green(resp. light grey) data shows the number of compressed (resp. uncompressed) proofs forthe random proofs. The number of proofs in each group is the sum of the heights of eachcoloured bar in that group. The overall percentage of proofs compressed in a group isindicated on each bar. Dark colors indicate the number of proofs compressed by
FORPI , GFOLU , and both compositions of these algorithms; light colors indicate cases were
FORPI succeeded, but at least one of
GFOLU or a combination of these algorithms achieved zerocompression. Given the size of the TPTP proofs, it is unsurprising that few are compressed:small proofs are a priori less likely to contain irregularities. On the other hand, at least 43%of the randomly generated proofs in each size group could be compressed.Figure 4 (b) is a scatter plot comparing the number of resolutions of the input proofagainst the number of resolutions in the compressed proof for each algorithm. The resultson the TPTP data are magnified in the sub-plot. For the randomly generated proofs (pointsoutside of the sub-plot), it is often the case that the compressed proof is significantly shorterthan the input proof. Interestingly,
GFOLU appears to reduce the number of resolutions bya linear factor in many cases. This is likely due to a linear growth in the number of non-interacting irregularities (i.e. irregularities for which the lowered units share no commonliterals with any other sub-proofs), which leads to a linear number of nodes removed.16 u m be r o f P r oo f s . % . % . % . % . % . % . % . % . % . % . % . % . % . % . % . % . % Proof Length Before Compression (Resolutions)
Compressed (Random)Always Compressed (Random)Compressed (TPTP)Always Compressed (TPTP)Not Compressed (Random)Not Compressed (TPTP) −
16 17 −
53 54 −
81 82 −
97 98 −
113 114 −
129 130 −
145 146 −
161 162 −
177 178 −
193 194 −
209 210 −
225 226 −
241 242 −
257 258 −
274 277 −
305 314 − (a) Number of (non-)compressed proofs l ll lllllll l ll lll ll l lll lll l ll lll lll l l lllll ll lll ll llll ll ll l lll lllll ll ll lll ll l lll ll l lllll l lll l l ll lll ll ll ll lll lll llll ll llll l ll ll ll ll lllll l ll llll llll l lllll ll llll ll lll l ll l lll llll l llll ll l ll lll l l lll llll l llll ll ll ll l l ll ll lll ll ll ll llll ll l ll ll ll lll ll l l ll l ll lll llll ll lll ll lll l ll l ll l ll ll ll ll l ll llll lll ll lll ll ll ll ll ll lll l ll ll llll ll lll ll lllll lll l l llll ll ll ll ll lll l ll ll ll ll ll ll llll l lll l ll ll lllllll ll ll llll ll ll ll l llll lll lll ll lll lll ll l lll ll ll l llll lll lll lll l ll l lll lll llll ll lll l ll ll lll llll ll l ll ll ll ll ll l lll ll ll lll ll l lll l lll l ll ll l lll l lll l ll lll l lll l lll l l lll ll lll l llll lll lll ll l lll lll llll ll ll ll l lll ll lll l ll lll ll ll l ll l ll ll ll ll ll l ll lll ll lll l ll ll ll ll lll ll ll ll ll l ll lll lll ll llll lll ll l lll ll ll l ll llll l llll ll llll l lll lll l l lll lll l lll l ll l l llll l lll ll lll ll l lll ll l ll ll lll ll lll ll llll ll lll lll ll l l ll ll l lllll l l llll l lll ll ll l lll l l ll ll l ll l lll lll ll l ll lll lll ll ll l lll ll lll ll ll llll lll llll ll ll ll l l lll ll l l lllll ll l llll l lll ll ll l ll ll l ll l ll lllll l lll l lll ll ll l lll ll lllll ll ll ll ll ll l ll l lll lll lll l ll ll l l ll llll llll llll ll l lll l ll l lll ll ll l llll l l ll l llll ll lll l ll l ll lll ll l ll llll l ll l ll lll ll ll l llll l lll l ll l ll l l ll l lllll ll ll ll lll l ll l llll l l lll lll ll ll ll llll l ll ll ll l ll ll lll ll l ll ll lll lllll ll l lll ll lll ll ll l llll ll lll ll ll l lll ll lllll ll ll l lll lll ll ll l l ll l lll lll ll lll l ll l ll ll lll ll ll ll l lll l l lll l ll lll ll l ll lll llll l ll l lll ll lll llll ll l lll lll lll ll lll l ll lll ll lll l ll ll ll ll ll ll ll l lll l l ll lll llllll lll ll lllll ll l l ll ll ll l lll lllll ll ll l lllll l lll l ll llll ll l llll lll ll ll llll ll l ll lllll llll ll lll l llll l ll lll lll lll l l lllll lll l ll l l l lllll ll llll lll l ll ll l lll ll ll ll ll l l ll ll l ll ll l lll lll lll ll llllll l ll ll ll ll lll ll ll ll ll ll ll l ll lll ll ll lll lll ll l lll ll l lll l lll ll lll lll ll lll l lll llll l ll lllll ll ll l lll lll lll lll l llll lll l ll ll ll ll l ll ll ll ll ll l l lll ll ll lll l ll l lll ll llll lll lll ll lll ll lll l ll ll ll l l lll llll ll ll lll l llll llll lllll lll ll l lll lll ll ll lll ll l lll llll llll ll l ll llllll l l ll ll l ll lll l ll ll l ll ll lll ll l l ll ll ll llllll ll lll ll ll ll ll ll lll l l ll ll ll l l l lll l ll ll lll ll l l lll ll lll llll lllll l ll lll ll l l llll l l l ll ll llll llll lll l lll ll lllllll ll llll ll llll l llll llll ll lll ll l lll ll lll l lll ll l ll ll lll ll ll l lll ll l ll lll l l lll ll l lll llll ll ll lll ll lll ll ll lll ll l ll l l lll l l ll ll l ll lll l l ll lllll ll l ll lllll l l l lll lll ll ll ll lll lll ll lll ll llll lll ll ll ll ll llll l l ll lll lll ll ll ll lll lll lll l lll l l lll llll ll ll ll ll lll l ll lll lll l lll llll ll ll lll lll lll lll lll ll ll l ll ll l lll llll ll llll ll l ll l llll ll l ll ll ll ll l lll ll lllll l ll l lllll llll lll lll ll lllll lll l ll lll llll ll l lll l ll l l ll llll l lll llll l llll Proof Length (Resolutions) C o m p r e ss ed Leng t h ( R e s o l u t i on s ) l FORPI(p)GFOLU(p)FORPI(GFOLU(p))GFOLU(FORPI(p)) llllllll llllllll l lllll lllllll llllllll lllllllll llllllllllllllll lllllllllllllllllllll lllll l ll llllllll llllllll ll llllll llllll l lll lll lllllllllllllllllllllll lllllllll lllllllll lll llllllllll lllllll llll llll llllllllll lllllllllllllllll llll lllll lllll lllll lllllllll lll llllll lll llll llllllllllllllll llllllllllllllllllllllll ll l ll lll ll lll ll l l lllll l llllll ll llllll lllllll ll llll lllll lllll ll lllllllllllll llll llll ll ll l l ll llllllll ll llllll ll llllll lll l ll l lll lll lllll lll ll l l lll l llll lll ll ll lllll llllll l ll l ll ll l lllllll ll lll ll llll llll lllll l ll ll lllllll lllll l llll llll lll ll lll ll ll lll ll lllll ll lll lllll l lll llll l llll llll ll lllll ll ll ll llll l llll ll lllll ll (b) Compressed length against input length l ll lll l ll l lll l lllll l ll ll l l ll lll lll ll lll ll ll l llll ll llll ll l ll l lllll ll ll lll ll l llllll lllll l llll l ll lll ll lll l llll ll llll ll llll l ll ll ll ll l lll ll ll lll ll lll l ll l lll l ll ll lll ll l ll l lll ll ll lllll lll ll llll ll ll llll llll l lll lllll ll ll lll ll ll l l llll ll l lll l ll l ll ll l l ll l ll ll l llll ll llll l llll ll ll l l ll ll l l lll ll llll lll ll lll ll ll lll l ll lll l ll ll lllll l l ll ll lllll lll l l l llll l ll ll ll lll lll ll ll lll l lll llll lll l ll ll llllll l ll ll ll ll ll ll ll l lll l lll lll ll lll lll ll l lll ll ll lll lll ll lll ll l l ll lll l ll l ll ll ll lll ll lll ll l llll ll l ll ll ll ll ll l lll ll ll ll l ll l lll l lll lll ll l lll l lll l ll lll ll ll l lll ll lll ll llll ll ll lll lll l l llll llll llll l ll ll lll l ll lll l lll ll ll ll ll l l ll ll ll ll lll ll l ll ll lll lll ll l l ll lll ll ll ll ll l ll l ll lll l llll l lll lll l ll ll ll l ll llll l ll ll lll l ll ll l l llll l lll lll l lll l lll l llll l lll ll ll l lll l ll ll l ll ll lll ll l ll ll ll ll ll lll lll ll l l ll ll l lllll l ll lll l ll lll lll lll l ll l lll ll l lll lll ll l ll lll l llll ll l lll ll lll ll ll ll llll ll ll l llllll lll lll l ll ll lll ll ll lll l lll ll lll ll ll l ll l ll lllll lll ll lll ll ll llll ll lllll ll ll ll ll ll l ll l lll lll llll ll ll l llllllll ll l llll ll lllll ll l lll ll ll lll ll l lll lll ll l l lll lll l ll llll l lll lll l l ll lll lll ll ll ll llll lll l lll ll l l ll l lll ll ll ll ll lll lll l ll ll l lllll ll ll ll ll llll l ll llll ll l ll ll l ll l ll ll l ll llll l ll l ll l ll lll ll ll l llll lll ll ll ll llll ll lll lll l l ll lll lllll ll l l ll ll lllll l llll l ll l ll ll ll l ll ll ll llll l l lll l llll ll l l ll llll lll l ll l lll ll lll llll ll l llll ll lll lllll l ll lll ll llll ll ll lll l ll ll ll l lll l lll lll l llll ll lll l lll l l ll l l ll ll ll l ll l lllllll ll l ll lll l lll l ll llll ll l l lll lll ll ll lll l lll ll lllll lll l ll lll ll l ll ll l lll lll ll l ll l lllll ll l ll lll ll lll llllll lll l l l ll l lll ll llllll l l ll ll l ll ll l lll lll lll ll lll l ll l ll ll ll ll lll ll ll ll ll ll ll ll l lll ll ll lll l ll ll l lll ll llll l lll ll lll lll ll lll l lll llll l ll ll lll ll ll ll ll lll lll lll l llll lll l ll ll ll ll l ll ll llll ll l l lll ll lllll l ll l lll ll llll llll l l ll lll ll lll l ll ll ll l l lll llll ll ll lll l llll llll llll l lll lll lll lll ll ll ll l ll l lll lll l llllll l ll llllll l l llll l ll lll l llll l ll ll lll ll l ll ll lll ll lllllllll ll ll ll ll lll ll l lll ll ll l l l lll lll ll llll l ll ll l l l lll lllllllll l ll lll ll l l llll l ll ll ll lll l llll lll ll ll ll llll llll llllll l llll l llll llll ll lll lll lll l l lll lllll l lll lll ll ll ll ll ll ll l lllll l l lll ll l lll llll ll lllll ll lll ll l l lll ll l ll lll lll l ll ll l ll llll l ll llll l ll l ll lllll l l llll lllllll ll ll l l ll ll ll lll lll l lll ll ll ll ll llll l l ll lll lll ll lll l lll ll l lll l ll l l l ll l l lll llll ll ll lll l l l lll lll l lll llll ll ll llll ll lll l lll llll ll ll ll ll ll llll l ll llll ll l ll l ll llll l ll llll ll lll lll ll lll ll lll l ll l llll llll ll ll lllll lll l ll lll ll ll llll l l l ll l l lll ll l l ll l llll l ll ll
Compressed Resolutions (FORPI(GFOLU(p))) C o m p r e ss ed R e s o l u t i on s ( G F O L U ( F O R P I ( p ))) l TPTP DataRandom Data (c)
FORPI ( GFOLU (p)) vs.
GFOLU ( FORPI (p))
Number of Proofs N u m be r o f R e m o v ed R e s o l u t i on s ( x ^ ) (d) Cumulative proof compression Figure 4:
GFOLU & FORPI
Combination Results17igure 4 (c) is a scatter plot comparing the size of compression obtained by applying
FORPI before
GFOLU versus
GFOLU before
FORPI . Data obtained from the TPTP data set ismarked in red; the remaining points are obtained from randomly generated proofs. Pointsthat lie on the diagonal line have the same size after each combination. There are 249 pointsbeneath the line and 326 points above the line. Therefore, as in the propositional case [10],it is not a priori clear which combination will compress a proof more. Nevertheless, thedistinctly greater number of points above the line suggests that it is more often the case that
FORPI should be applied after
GFOLU . Not only this combination is more likely to maximizethe likelihood of compression, but the achieved compression also tends to be larger.Figure 4 (d) shows a plot comparing the difference between the cumulative number ofresolutions of the first x input proofs and the cumulative number of resolutions in the first x proofs after compression (i.e. the cumulative number of removed resolutions). The TPTPdata is displayed in the sub-plot; note that the lines for everything except FORPI largelyoverlap (since the values are almost identical; cf. Table 2). Observe that it is always betterto use both algorithms than to use a single algorithm. The data also shows that using
FORPI after
GFOLU is normally the preferred order of composition, as it typically results in a greaternumber of nodes removed than the other combination. An even better approach is to tryboth combinations and choose the best result (as shown in the ‘Best’ curve).
SPASS required approximately 40 minutes of CPU time (running on a cluster) to generateall the 308 TPTP proofs. The total time to apply both
FORPI and
GFOLU on all theseproofs was just over 8 seconds on a simple laptop computer. The random proofs weregenerated in 70 minutes, and took approximately 461 seconds (or 7.5 minutes) to compress,both measured on the same computer. All times include parsing time. These compressionalgorithms continue to be very fast in the first-order case, and may simplify the proofconsiderably for a relatively small cost in time.
The main contribution of this paper is the lifting of the propositional proof compression algo-rithm
RPI to the first-order case. As indicated in Section 4, the generalization is challenging,because unification instantiates literals and, consequently, a node may be regularizable evenif its resolved literals are not syntactically equal to any safe literal. Therefore, unificationmust be taken into account when collecting safe literals and marking nodes for deletion.We first evaluated the algorithm on all 308 real proofs that the
SPASS theorem prover(with only standard resolution enabled) was capable of generating when executed on un-satisfiable TPTP problems without equality. Although the compression achieved by thefirst-order
FORPI algorithm was not as good as the compression achieved by the proposi-tional
RPI algorithm on real proofs generated by SAT and SMT solvers [10], this is due tothe fact that the 308 proofs were too short (less than 32 resolutions) to contain a signifi-cant amount of irregularities. In contrast, the propositional proofs used in the evaluation ofthe propositional
RPI algorithm had thousands (and sometimes hundreds of thousands) ofresolutions.Our second evaluation used larger, but randomly generated, proofs. The compressionachieved by
FORPI in a short amount of time on this data set was compatible with ourexpectations and previous experience in the propositional level. The obtained results indi-cate that
FORPI is a promising compression technique to be reconsidered when first-ordertheorem provers become capable of producing larger proofs. Although we carefully selected18eneration probabilites in accordance with frequencies observed in real proofs, it is impor-tant to note that randomly generated proofs may still differ from real proofs in shape andmay be more or less likely to contain irregularities exploitable by our algorithm. Resolu-tion restrictions and refinements (e.g. ordered resolution [16, 33], hyper-resolution [19, 24],unit-resulting resolution [17, 18]) may result in longer chains of resolutions and, therefore, inproofs with a possibly larger height to length ratio. As the number of irregularities increaseswith height, such proofs could have a higher number of irregularities in relation to length.In this paper, for the sake of simplicity, we considered a pure resolution calculus with-out restrictions, refinements or extensions. However, in practice, theorem provers do userestrictions and extensions. It is conceptually easy to adapt the algorithm described hereto many variations of resolution. For instance, restricted forms of resolution (e.g. orderedresolution, hyper-resolution, unit-resulting resolution) can be simply regarded as (chainsof) unrestricted resolutions for the purpose of proof compression. The compression processwould break the chains and change the structure of the proof, but the compressed proofwould still be a correct unrestricted resolution proof, albeit not necessarily satisfying therestrictions that the input proof satisfied. In the case of extensions for equality reasoningusing paramodulation-like inferences, it might be necessary to apply the paramodulationsto the corresponding safe literals. Alternatively, equality inferences could be replaced byresolutions with instances of equality axioms, and the proof compression algorithm could beapplied to the proof resulting from this replacement. Another common extension of resolu-tion is the splitting technique [34]. When splitting is used, each split sub-problem is solvedby a separate refutation, and the compression algorithm described here could be applied toeach refutation independently.
References [1] H. Amjad. Compressing propositional refutations.
Electronic Notes in Theoretical Computer Science ,185:3–15, 2007.[2] O. Bar-Ilan, O. Fuhrmann, S. Hoory, O. Shacham, and O. Strichman. Linear-time reductions ofresolution proofs. In
Haifa Verification Conference , LNCS, pages 114–128. Springer, 2008.[3] P. Baumgartner, J. Bax, and U. Waldmann. Beagle - A hierarchic superposition theorem prover. InFelty and Middeldorp [8], pages 367–377.[4] J. Boudou and B. Woltzenlogel Paleo. Compression of propositional resolution proofs by loweringsubproofs. In Galmiche and Larchey-Wendling [11], pages 59–73.[5] E. M. Clarke and A. Voronkov, editors.
Logic for Programming, Artificial Intelligence, and Reasoning16th International Conference, Dakar, Senegal, Revised Selected Papers , LNCS. Springer, 2010.[6] S. Cotton. Two techniques for minimizing resolution proofs. In O. Strichman and S. Szeider, editors,
SAT 2010 , LNCS, pages 306–312. Springer, 2010.[7] S. Cruanes.
Extending superposition with integer arithmetic, structural induction, and beyond . PhDthesis, ´Ecole polytechnique, 2015.[8] A. P. Felty and A. Middeldorp, editors.
Automated Deduction - CADE-25 - 25th International Con-ference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings , volume 9195 of
Lecture Notes in Computer Science . Springer, 2015.[9] P. Fontaine, S. Merz, and B. Woltzenlogel Paleo. Exploring and exploiting algebraic and graphicalproperties of resolution. In , 2010.[10] P. Fontaine, S. Merz, and B. Woltzenlogel Paleo. Compression of propositional resolution proofs viapartial regularization. In
Automated Deduction - CADE-23 - 23rd International Conference on Au-tomated Deduction, Wroclaw, Poland, July 31 - August 5, 2011. Proceedings , volume 6803 of
LNCS ,pages 237–251. Springer, 2011.
11] D. Galmiche and D. Larchey-Wendling, editors.
Automated Reasoning with Analytic Tableaux andRelated Methods - 22th International Conference, TABLEAUX 2013, Nancy, France, September 16-19, 2013. Proceedings , volume 8123 of
Lecture Notes in Computer Science . Springer, 2013.[12] J. Gorzny and B. Woltzenlogel Paleo. Towards the compression of first-order resolution proofs bylowering unit clauses. In Felty and Middeldorp [8], pages 356–366.[13] S. Hetzl, A. Leitsch, G. Reis, and D. Weller. Algorithmic introduction of quantified cuts.
TheoreticalComputer Science , 549:1–16, 2014.[14] S. Hetzl, A. Leitsch, D. Weller, and B. Woltzenlogel Paleo. Herbrand sequent extraction. In
IntelligentComputer Mathematics, 9th Int. Conference, AISC 2008, 15th Symposium, Calculemus 2008, 7thInt. Conference, MKM 2008, Birmingham, UK, July 28 - August 1, 2008. Proceedings , LNCS, pages462–477. Springer, 2008.[15] S. Hetzl, T. Libal, M. Riener, and M. Rukhaia. Understanding resolution proofs through herbrand’stheorem. In Galmiche and Larchey-Wendling [11], pages 157–171.[16] J. Hsiang and M. Rusinowitch. Proving refutational completeness of theorem-proving strategies: thetransfinite semantic tree method.
J. ACM , 38(3):558–586, 1991.[17] J. McCharen, R. Overbeek, and L. Wos. Complexity and related enhancements for automated theorem-proving programs.
Computers and Mathematics with Applications , 2:1–16, 1976.[18] W. McCune. Prover9 and mace4. , 2005–2010.[19] R. A. Overbeek. An implementation of hyper-resolution.
Computers & Mathematics with Applications ,1(2):201 – 214, 1975.[20] B. Woltzenlogel Paleo. Atomic cut introduction by resolution: Proof structuring and compression. InClarke and Voronkov [5], pages 463–480.[21] V. Prevosto and U. Waldmann. SPASS+T. In G. Sutcliffe, R. Schmidt, and S. Schulz, editors,
ESCoR ,CEUR Workshop Proceedings, pages 18–33, 2006.[22] G. Reis. Importing SMT and connection proofs as expansion trees. In Cezary Kaliszyk and AndreiPaskevich, editors,
Proceedings Fourth Workshop on Proof eXchange for Theorem Proving, PxTP2015, Berlin, Germany, August 2-3, 2015. , volume 186 of
EPTCS , pages 3–10, 2015.[23] A. Riazanov and A. Voronkov. The design and implementation of vampire.
AI Commun. , (2-3):91–110,2002.[24] J. A. Robinson. Automatic deduction with hyper-resolution.
International Journal of Computing andMathematics , 1:227–234, 1965.[25] S. F. Rollini, R. Bruttomesso, and N. Sharygina. An efficient and flexible approach to resolution proofreduction. In
Hardware and Software: Verification and Testing , LNCS, pages 182–196. Springer, 2011.[26] S. Schulz. System description: E 1.8. In K. L. McMillan, A. Middeldorp, and A. Voronkov, editors,
Logic for Programming, Artificial Intelligence, and Reasoning - 19th International Conference, LPAR-19, Stellenbosch, South Africa, December 14-19, 2013. Proceedings , volume 8312 of
Lecture Notes inComputer Science , pages 735–743. Springer, 2013.[27] S. Schulz and G. Sutcliffe. Proof generation for saturating first-order theorem provers. In D. Delahayeand B. Woltzenlogel Paleo, editors,
All about Proofs, Proofs for All , volume 55 of
Mathematical Logicand Foundations . College Publications, London, UK, 2015.[28] C. Sinz. Compressing propositional proofs by common subproof extraction. In R. Moreno-D´ıaz, F. Pich-ler, and A. Quesada-Arencibia, editors,
EUROCAST , LNCS, pages 547–555. Springer, 2007.[29] G. Sutcliffe. The TPTP Problem Library and Associated Infrastructure: The FOF and CNF Parts,v3.5.0.
Journal of Automated Reasoning , 43(4):337–362, 2009.[30] R. Thiele. Hilbert’s twenty-fourth problem.
The American Mathematical Monthly , 110(1):1–24, 2003.[31] G. S. Tseitin. On the complexity of derivation in propositional calculus. In J. Siekmann and G. Wright-son, editors,
Automation of Reasoning: Classical Papers in Computational Logic 1967-1970 . Springer-Verlag, 1983.[32] J. Vyskocil, D. Stanovsk´y, and J. Urban. Automated proof compression by invention of new definitions.In Clarke and Voronkov [5], pages 447–462.[33] U. Waldmann. Ordered resolution. In B. Woltzenlogel Paleo, editor,
Towards an Encyclopaedia ofProof Systems , pages 12–12. College Publications, London, UK, 1 edition, 1 2017.
34] C. Weidenbach. Combining superposition, sorts and splitting. In J. A. Robinson and A. Voronkov,editors,
Handbook of Automated Reasoning (in 2 volumes) , pages 1965–2013. Elsevier and MIT Press,2001.[35] C. Weidenbach, D. Dimova, A. Fietzke, R. Kumar, M. Suda, and P. Wischnewski. SPASS version3.5. In R. A. Schmidt, editor,
Automated Deduction - CADE-22, 22nd International Conference onAutomated Deduction, Montreal, Canada, August 2-7, 2009. Proceedings , volume 5663 of
Lecture Notesin Computer Science , pages 140–145. Springer, 2009.[36] B. Woltzenlogel Paleo. Herbrand sequent extraction. M.sc. thesis, Technische Universit¨at Dresden;Technische Universit¨at Wien, Dresden, Germany; Wien, Austria, 07 2007.[37] B. Woltzenlogel Paleo.
Herbrand Sequent Extraction [M.Sc. Thesis] . VDM-Verlag, Saarbr¨ucken, Ger-many, 2008.. VDM-Verlag, Saarbr¨ucken, Ger-many, 2008.