Dependencies in Formal Mathematics: Applications and Extraction for Coq and Mizar
aa r X i v : . [ c s . D L ] M a r Dependencies in Formal Mathematics:Applications and Extraction for Coq and Mizar
Jesse Alama , Lionel Mamane , and Josef Urban New University of Lisbon L-7243 Bereldangem, Luxembourg Radboud University Nijmegen
Abstract.
Two methods for extracting detailed formal dependenciesfrom the Coq and Mizar system are presented and compared. The meth-ods are used for dependency extraction from two large mathematicalrepositories: the
Coq
Repository at Nijmegen and the
Mizar
Mathemati-cal Library. Several applications of the detailed dependency analysis aredescribed and proposed. Motivated by the different applications, we dis-cuss the various kinds of dependencies that we are interested in, and thesuitability of various dependency extraction methods.
This paper presents two methods for extracting detailed formal dependenciesfrom two state-of-the-art interactive theorem provers (ITPs) for mathematics:the Coq system and the Mizar system. Our motivation for dependency extractionis application-driven. We are interested in using detailed dependencies for fastrefactoring of large mathematical libraries and wikis, for AI methods in auto-mated reasoning that learn from previous proofs, for improved interactive editingof formal mathematics, and for foundational research over formal mathematicallibraries.These applications require different notions of formal dependency . We discussthese different requirements, and as a result provide implementations that inseveral important aspects significantly differ from previous methods. For Mizar,the developed method captures practically all dependencies needed for success-ful re-verification of a particular formal text (i.e., also notational dependencies,automations used, etc.), and the method attempts hard to determine the min-imal set of such dependencies. For Coq, the method goes farther towards re-verification of formal texts than previous methods [5,13,4] that relied solely onthe final proof terms. For example, we can already track Coq dependencies thatappear during the tactic interpretation, but that do not end up being used inthe final proof term.The paper is organized as follows. Section 2 briefly discusses the notion offormal dependency. Section 3 describes the implementation of dependency ex-traction in the
Coq system, and Section 4 describes the implementation in the
Mizar system. Section 5 compares the two implemented approaches to depen-dency computation. Section 6 describes several experiments and measurementsonducted using our implementations on the
CoRN and
MML libraries, includ-ing training of AI/ATP proof assistance systems on the data, and estimatingthe speed-up for collaborative large-library developments. Section 8 concludes.
Generally, we say that a definition, or a theorem, T depends on some definition,lemma or other theorem T ′ , (or equivalently, that T ′ is a dependency of T ) if T “needs” T ′ to exist or hold. The main way such a “need” arises is that thewell-formedness, justification, or provability of T does not hold in the absenceof T ′ . We consider formal mathematics done in a concrete proof assistant sowe consider mathematical and logical constructs not only as abstract entitiesdepending on each other, but also as concrete objects (e.g., texts, syntax trees,etc.) in the proof assistants. For our applications, there are different notions of“dependency” we are interested in: – Purely semantic/logical view. One might claim, for example, that the lambdaterm (or proof object in the underlying formal framework) contains all suf-ficient dependencies for a particular theorem, regardless of any notationalconventions, library mechanisms, etc. – Purely pragmatic view. Such dependencies are met if the particular itemstill compiles in a particular high-level proof assistant framework, regard-less of possibly changed underlying semantics. This view takes into accountthe proof assistant as the major dependency, with their sophisticated mech-anisms like auto hint databases, notations, type automations, definitionsexpansions, proof search depth, parser settings, hidden arguments, etc.Formal dependencies can also be implicit and explicit. In the simple worldof first-order automated theorem proving, proofs and their dependencies aregenerally quite detailed and explicit about (essentially) all logical steps, even verysmall ones (such as the steps taken in a resolution proof). But in ITPs, which aregenerally oriented toward human mathematicians, one of the goals is to allow theusers to express themselves with minimal logical verbosity and ITPs come with anumber of implicit mechanisms. Examples are type mechanisms (e.g., type-classautomations of various flavors in
Coq [14] and
Isabelle [8], Prolog-like types in
Mizar [17,15]), hint mechanisms (in
Coq and
Isabelle ), etc. If we are interestedin giving a complete answer to the question of what a formalized proof dependsupon, we must expose such implicit facts and inferences.Formal dependencies reported by ITPs are typically sufficient . Depending onthe extraction mechanism, redundant dependencies can be reported. Bottom-upprocedures like congruence-closure and type closure in Mizar (and e.g., type-classmechanisms in other ITPs) are examples of mechanisms when the ITP uses avail-able knowledge exhaustively, often drawing in many unnecessary dependenciesfrom the context. For applications, it is obviously better if such unnecessarydependencies can be removed .
Dependency extraction in
Coq
Recall that
Coq is based on the Curry-Howard isomorphism, meaning that:1. A statement (formula) is encoded as a type.2. There is, at the “bare” logical level, no essential difference between a defi-nition and a theorem: they are both the binding (in the environment) of aname to a type (type of the definition, statement of the theorem) and a term(body of the definition, proof of the theorem).3. Similarly, there is no essential difference between an axiom and a param-eter: they are both the binding (in the environment) of a name to a type(statement of the axiom, type of the parameter, e.g. “natural number”).4. There is, as far as
Coq is concerned, no difference between the notions oftheorem, lemma, corollary, . . .Thus, in this section, and in other sections when talking of
Coq , we do not alwaysrepeat “axiom or parameter”, nor repeat “definition or theorem or lemma orcorollary or . . . ”. We will use “axiom” for “axiom or parameter” and “theorem”or “definition” for “definition or theorem or lemma or corollary or . . . ”. Similarlyfor “proof” and “definition body”.There are essentially three groups of
Coq commands that need to be treatedby the dependency tracking:
1. Commands that register a new logical construct (definition or axiom), either – From scratch. That is, commands that take as arguments a name anda type and/or a body, and that add the definition binding this name tothis type and/or body. The canonical examples are
Definition
Name : type := body and
Axiom
Name : type
The type can also be given implicitly as the inferred type of the body,as in
Definition
Name := body – Saving the current (completely proven) theorem in the environment.These are the “end of proof” commands, such as
Qed , Save , Defined .2. Commands that make progress in the current proof, which is necessarilymade in several steps:(a) Opening a new theorem, as in
Theorem
Name : type or Definition
Name : type As far as logical constructs are concerned. b) An arbitrary strictly positive amount of proof steps.(c) Saving that theorem in the environment.These commands update (by adding exactly one node) the internal
Coq structure called “proof tree”.3. Commands that open a new theorem, that will be proven in multiple steps.The dependency tracking is implemented as suitable hooks in the
Coq functionsthat the three kinds of commands eventually call. When a new construct isregistered in the environment, the dependency tracking walks over the typeand body (if present) of the new construct and collects all constructs that arereferenced. When a proof tree is updated, the dependency tracking examinesthe top node of the new proof tree (note that this is always the only changewith regards to the previous proof tree). The commands that update the prooftree (that is, make a step in the current proof) are called tactics . Coq ’s tacticinterpretation goes through three main phases:1. parsing;2. Ltac expansion;3. evaluation.The tactic structure after each of these phases is stored in the proof tree. Thisallows to collect all construct references mentioned at any of these tree levels.For example, if tactic Foo T is defined as try apply Bolz an oWe ie rstrass ;s o l v e [ T | auto ] and the user invokes the tactic as Foo FeitThompson , then the first level willcontain (in parsed form)
Foo FeitThompson , the second level will contain (inparsed form) try apply Bolz an oWe ie rstrass ;s o l v e [ FeitThompson | auto ] . } and the third level can contain any of: – refine (BolzanoWeierstrass ...) , – refine (FeitThompson ...) , – something else, if the proof was found by auto .The third level typically contains only a few of the basic atomic fundamentalrules (tactics) applications, such as refine , intro , rename or convert , andcombinations thereof. Ltac is the
Coq ’s tactical language, used to combine tactics and add new user-definedtactics. .1 Dependency availability, format, and protocol
Coq supports several interaction protocols: the coqtop , emacs and coq-interface protocols. Dependency tracking is available in the program implementing the coq-interface protocol which is designed for machine interaction. The depen-dency information is printed in a special message for each potentially progress-making command that can give rise to a dependency. A potentially progress-making command is one whose purpose is to change Coq ’s state. For example,the command
Print Foo , which displays the previously loaded mathematicalconstruct
Foo , is not a potentially progress-making command . Any tactic invo-cation is a potentially progress-making command. For example, the tactic auto silently succeeds (without any effect) if it does not completely solve the goal itis assigned to solve. In that case, although that particular invocation did notmake any actual progress in the proof, auto is still considered a potentiallyprogress-making command, and the dependency tracking outputs the message ‘‘dependencies: (empty list)’’ . Other kinds of progress-making commandsinclude, for example notation declarations and morphisms registrations. Somecommands, although they change Coq ’s state, might not give rise to a depen-dency. For example, the
Set Firstorder Depth command, taking only an inte-ger argument, changes the maximum depth at which the firstorder tactic willsearch for a proof. For such a command, no dependency message is output.One command may give rise to several dependency messages, when theychange
Coq ’s state in several different ways. For example, the intuition tactic can, mainly for efficiency reasons, construct an ad hoc lemma, register it intothe global environment and then use that lemma to prove the goal it has beenassigned to solve, instead of introducing the ad hoc lemma as a local hypothesisthrough a cut. This is mainly an optimization: The ad hoc lemma is definedas “opaque”, meaning that the typechecking (proofchecking) algorithm is notallowed to unfold the body (proof) of the lemma when the lemma is invoked,and thus won’t spend any time doing so. By contrast, a local hypothesis isalways “transparent”, and the typechecking algorithm is allowed to unfold itsbody. For the purpose of dependency tracking this means that intuition makes two conceptually different steps:1. register a new global lemma, under a fresh name;2. solve the current subgoal in the proof currently in progress. Thus, although this commands obviously needs item
Foo to be defined to succeed,the dependency tracking does not output that information. That is not a problemin practice because such commands are usually issued by a user interface to treat aninteractive user request (for example “show me item
Foo ”), but are not saved intothe script that is saved on disk. Even if they were saved into the script, adding orremoving them to (from, respectively) the script does not change the semantics ofthe script. The intuition tactic is a decision procedure for intuitionistic propositional calculusbased on the contraction-free sequent calculi LJT* of Roy Dyckhof, extended tohand over subgoals which it cannot solve to another tactic. ach of these steps gives rise to different dependencies. For example, if thecurrent proof is
BolzanoWeierstrass , then the new global lemma gives rise todependencies of the form“
BolzanoWeierstrass subproofN depends on . . . ”where the subproofN suffix is
Coq ’s way of generating a fresh name. Closingof the subgoal by use of
BolzanoWeierstrass subproofN then gives rise to thedependency“
BolzanoWeierstrass depends on
BolzanoWeierstrass subproofN ” The
Coq dependency tracking is already quite extensive, and sufficient for thewhole Nijmegen
CoRN corpus. Some restrictions remain in parts of the
Coq inter-nals that the second author does not yet fully understand. Our interests (andexperiments) include not only purely mathematical dependencies that can befound in the proof terms (for previous work see also [13,4]), but also fast recom-pilation modes for easy authoring of formal mathematics in large libraries andformal wikis. The
Coq dependency tracking code currently finds all logically rel-evant dependencies from the proof terms, even those that arise from automationtactics. It does not handle yet the non-logical dependencies. Examples includenotation declarations, morphism and equivalence relation declarations, auto hint database registrations, but also tactic interpretation. At this stage, wedon’t handle most of these, but as already explained, the internal structure of Coq lends itself well to collecting dependencies that appear at the various levels oftactic interpretation. This means that we can already handle the ( non-semantic )dependencies on logical constructs that appear during the tactic interpretation,but that do not end up being used in the final proof term.Some of the non-logical dependencies are a more difficult issue. For exam-ple, several dependencies related to tactic parametrization ( auto hint databases, firstorder proof depth search) need specific knowledge of how the tactic is in-fluenced by parameters, or information available only to the internals of thetactic. The best approach to handle such dependencies seems to be to change(at the
OCaml source level in
Coq ) the type of a tactic, so that the tactic it-self is responsible for providing such dependencies. This will however have to bevalidated in practice, provided that we manage to persuade the greater
Coq com-munity about the importance and practical usefulness of complete dependencytracking for formal mathematics and for research based on it. Such as when and how dynamics are used in tactic expressions or a completeoverview of all datatype tactics take as arguments. So that the tactics for equality can handle one’s user-defined equality. auto not only needs that the necessary lemmas be available in the environment, butit also needs to be specifically instructed to try to use them, through a mechanismwhere the lemmas are registered in a “hint database”. Each invocation of auto canspecify which hint databases to use. oq also presents an interesting corner case as far as opacity of dependenciesis concerned. On the one hand, Coq has an explicit management of opacityof items; an item originally declared as opaque can only be used genericallywith regards to its type; no information arising from its body can be used, theonly information available to other items is the type. Lemmas and theoremsare usually declared opaque , and definitions usually declared transparent, butthis is not forced by the system. In some cases, applications of lemmas need tobe transparent. Coq provides an easy way to decide whether a dependency isopaque or transparent: dependencies on opaque objects can only be opaque, anddependencies on transparent objects are to be considered transparent.Note that the predicative calculus of inductive constructions (pCIC) uses auniverse level structure, where the universes have to be ordered in a well-foundedway at all times. However, the ordering constraints between the universes arehidden from the user, and are absent from the types (statements) the user writes.Changing the proof of a theorem T can potentially have an influence on theuniverse constraints of the theorem. Thus, changing the body of an opaque item T ′ appearing in the proof of T can change the universe constraints attached toit, potentially in a way that is incompatible with the way it was previously usedin the body of T . Detecting whether the universe constraints have changed ornot is not completely straightforward, and needs specific knowledge of the pCIC.But unless one does so, for complete certainty of correctness of the library as awhole, one has to consider all dependencies as transparent. Note that in practiceuniverse constraint incompatibilities are quite rare. A large library may thusoptimize its rechecking after a small change, and not immediately follow opaquereverse dependencies. Instead, fully correct universe constraint checking couldbe done in a postponed way, for example by rechecking the whole library fromscratch once per week or per month. Mizar
Dependency computation in
Mizar differs from the implementation provided for
Coq , being in some sense much simpler, but at the same time also more robustwith respect to the potential future changes of the
Mizar codebase. For compar-ison of the techniques, see Section 5. For a more detailed discussion of
Mizar ,see [11] or [7].In
Mizar , every article A has its own environment E A specifying the context(theorems, definitions, notations, etc.) that is used to verify the article. E A , isusually a rather conservative overestimate of the items that the article actuallyneeds. For example, even if an article A needs only one definition (or theorem,or notation, or scheme, or. . . ) from article B , all the definitions (theorems, nota-tions, schemes, . . . ) from B will be present in E A . The dependencies for an article A are computed as the smallest environment E ′ A under which A is still Mizar -verifiable (and has the same semantics as A did under E A ). To get dependencies thereby following the mathematical principle of proof irrelevance . f a particular Mizar item I (theorem, definition, etc.,), we first create a mi-croarticle containing essentially just the item I , and compute the dependenciesof this microarticle.More precisely, computing fine-grained dependencies in Mizar takes threesteps:
Normalization
Rewrite every article of the
Mizar
Mathematical Library sothat: – Each definition block defines exactly one concept.Definition blocks that contain multiple definitions or notations can leadto false positive dependencies. For example, if two functions g and g aredefined in a single definition block, and a theorem φ uses f but not g ,then we want to be able to say that φ depends on f but is independent of g . Without splitting definition blocks, we have the specious dependencyof φ upon g . – All toplevel logical linking is replaced by explicit reference: constructionssuch as φ ; then ψ ; whereby the statement ψ is justified by the statement φ , are replaced by Label1 : φ ;Label2 : ψ by Label1 ; where Label1 and
Label2 are new labels. By doing this transformation,we ensure that the only way that a statement is justified by another isthrough explicit reference. – Segments of reserved variables all have length exactly 1. For example,constructions such as reserve A for set ,B for non empty set ,f for Function of A, B,M for
Cardinal ; which is a single reservation statement that assigns types to four variables( A , B , f , and M ) is replaced by four reservation statements, each of whichassigns a type to a single variable: reserve A for s e t ; reserve B for non empty s e t ; reserve f for Function of A, B; reserve M for Cardinal ;
When reserved variables are normalized in this way, one can eliminatesome false positive dependencies. A theorem in which, say, the variable f occurs freely but which has nothing to do with cardinal numbers hasthe type Function of A,B in the presence of both the first and thesecond sequences of reserved variables. If the first reservation statementis deleted, the theorem becomes ill-formed because f no longer has atype. But the reservation statement itself directly requires that the type Cardinal of cardinal numbers is available, and thus indirectly requires aart of the development of cardinal numbers. If the theorem has nothingto do with cardinal numbers, this dependency is clearly specious. Byrewriting reserved variables in the second way, though, one sees that onecan safely delete the fourth reservation statement, thereby eliminatingthis false dependency.These rewritings do not affect the semantics of the
Mizar article.
Decomposition
For every normalized article A in the Mizar
Mathematical Li-brary, extract the sequence h I A , I A , . . . , I An i of its toplevel items, each ofwhich written to a “microarticle” A k that contains only I Ak and whose envi-ronment is that of A and contains each A j ( j < k ). Minimization
For every article A of the Mizar
Mathematical Library and everymicroarticle A n of A , do a brute-force minimization of smallest environment E A n such that A n is Mizar -verifiable.The brute-force minimization works as follows. Given a microarticle A , wesuccessively trim the environment for all the Mizar environment item kinds. Each item kind is associated with a sequence s of imported items h a , . . . , a n i and the task is to find a minimal sublist s ′ of s such that A is Mizar -verifiable. We apply a simple binary search algorithm to s to compute the minimal sublist s ′ . Applying this approach for all Mizar item kinds, for all microarticles A k , forall articles A of the Mizar
Mathematical Library is a rather expensive compu-tation (for some
Mizar articles, this process can take several hours). It is muchslower than the method used for
Coq described in Section 3. However the resultis truly minimized, which is important for many applications of dependencies.Additionally, we have already developed some heuristics that help to find s ′ , andthese already do perform tolerably fast. Some observations comparing the
Coq and
Mizar dependency computation canbe drawn generally, without comparing the actual data as done in the followingsections. Dependencies in the case of
CoRN are generated by hooking into theactual code and are thus quite exactly mirroring the work of the proof assistant.In the case of
Mizar , dependencies are approximated from above. The depen-dency graph in this case starts with an over-approximation of what is known tobe sufficient for an item to be
Mizar -verifiable and then successively refines thisover-approximation toward a minimal set of sufficient conditions. A significantdifference is that the dependencies in
Coq are not minimized: the dependencytracking there tells us exactly the dependencies that were used by
Coq (in theparticular context) when a certain command is run. Thus, if for example thecontext is rich, and redundant dependencies are used by some exhaustive strate-gies, we will not detect their redundancy. On the other hand, in
Mizar we do not Namely, theorems, schemes, top-level lemmas, definitional theorems, definientia, pat-terns, registrations, and constructors. See [7] for a discussion of these item kinds. There is always one minimal sublist, since we assume that A is Mizar -verifiable tobegin with. ely on the proof assistant reporting how it exactly works, and instead try to ex-haustively minimize the set of dependencies, until an error occurs. This processis more computationally intensive, however, it guarantees minimality (relativeto the proof assistant’s power) which is interesting for many of the applicationsmentioned below.Another difference is in the coverage of non-logical constructs. Practicallyevery resource necessary for a verification of a
Mizar article is an explicit partof the article’s environment. Thus, it is easy to minimize not just the strictlylogical dependencies, but also the non-logical ones, like the sets of symbols andnotations needed for a particular item, or particular automations like definitionalexpansions. For LCF-based proof assistants, this typically implies further workon the dependency tracking.
CoRN and
MML
We use the dependency extraction methods described in 3 and 4 to obtain finedependency data for the
CoRN library and an initial 100 article fragment of the
MML . As described above, for
CoRN , we use the dependency exporter imple-mented directly using the
Coq code base. The export is thus approximately asfast as the
Coq processing of
CoRN itself, taking about 40 minutes on contem-porary hardware. The product are for each
CoRN file a corresponding file withdependencies, which have altogether about 65 MB. This information is thenpost-processed by standard UNIX and other tools into the dependency graphdiscussed below.For
Mizar and
MML we use the brute-force dependency extraction approachdiscussed above. This takes significantly longer than
Mizar processing alone, alsobecause of the number of preprocessing and normalization steps that need to bedone when splitting articles into micro-articles. For our data this now takesabout one day for the initial 100 article fragment of the
MML , the main share ofthis time being spent on minimizing the large numbers of items used implicitlyby
Mizar . Note that in this implementation we are initially more interested inachieving completeness and minimality rather than efficiency, and a number ofavailable optimizations can reduce this time significantly . The data obtainedare again post-processed by standard UNIX tools into the dependency graphs.In order to compare the benefits of having fine dependencies, we also computefor each library the full file-based dependency graph for all items. These graphsemulate the current dumb file-based treatment of dependencies in these libraries:each time an item is changed in some file, all items in the depending files haveto be re-verified. The two kinds of graphs for both libraries are then comparedin Table 1. For example a very simple recent optimization done for theorems, definitions, andschemes, has reduced the processing time in half. he graphs confirm our initial intuition that having the fine dependencieswill significantly speed up partial recompilation of the large libraries, which isespecially interesting in the
CoRN and
MML formal wikis that we develop. For example, the average number of items that need to be recompiled when arandom item is changed has dropped about seven times for
CoRN , and aboutfive times for
Mizar . The medians for these numbers are even more interesting,increasing to fifteen for
Mizar . The difference between
MML and
CoRN is alsoquite interesting, but it is hard to draw any conclusions. The corpora differ intheir content and use different styles and techniques.
CoRN/item CoRN/file MML-100/item MML-100/fileItems 9 462 9 462 9 553 9 553Deps 175 407 2 214 396 704 513 21 082 287TDeps 3 614 445 24 385 358 7 258 546 34 974 804P(%) 8 54 . . . . . . . . . Deps
Number of dependency edges
TDeps
Number of transitive dependency edges P Probability that given two randomly chosen items, one depends (directly or indi-rectly) on the other, or vice versa.
ARL
Average number of items recompiled if one item is changed.
MRL
Median number of items recompiled if one item is changed.
Table 1.
Statistics of the item-based and file-based dependencies for
CoRN and
MML i t e m s CoRN/item: reverse dependencies 010002000300040005000600070008000900010000 0 1500 3000 4500 6000 7500 9000 i t e m s CoRN/file: reverse dependencies
Fig. 1.
Cumulative transitive reverse dependencies for
CoRN : file-based vs. item-based
Another interesting new statistics given in Table 6.1 is the information aboutthe number and structure of explicit and implicit dependencies that we have donefor
Mizar . Explicit dependencies are anything that is already mentioned in theoriginal text. Implicit dependencies are everything else, for example dependen-cies on type mechanisms. Note that the ratio of implicit dependencies is verysignificant, which suggests that handling them precisely can be quite necessaryfor the learning and ATP experiments conducted in the next section. http://mws.cs.ru.nl/mwiki/ , http://mws.cs.ru.nl/cwiki/ i t e m s MML/item: reverse dependencies 010002000300040005000600070008000900010000 0 1500 3000 4500 6000 7500 9000 i t e m s MML/file: reverse dependencies
Fig. 2.
Cumulative transitive reverse dependencies for
MML : file-based vs. item-basedtheorem top-level lemma definition scheme registrationfrom 550134 44120 44216 7053 58622to 314487 2384 263486 6510 108449
Table 2.
Statistics of Mizar direct dependencies from and to different items
The knowledge of how a large number of theorems are proved is used by math-ematicians to direct their new proof attempts and theory developments. In thesame way, the precise formal proof knowledge that we now have can be used fordirecting formal automated theorem proving (ATP) systems and meta-systemsover the large mathematical libraries. In [3] we provide an initial evaluation ofthe usefulness of our
MML dependency data for machine learning of such proofguidance of first-order ATPs.These experiments are conducted on a set of 2078 problems extracted fromthe
Mizar library and translated to first-order ATP format. We emulate thegrowth of the
Mizar library (limited to the 2078 problems), by considering allprevious theorems and definitions when a new conjecture is attempted (i.e., whena new theorem is formulated by an author, requiring a proof). The ATP problemsthus become very large, containing thousands of the previously proved formulasas available axioms, which obviously makes automated theorem proving quitedifficult, see e.g. [16] and [12] for details. We run the state-of-the-art
Vampire -SInE [9] ATP system on these large problems, and solve 567 of them (witha 10-second timelimit). Then, instead of attacking such large problems directly,we learn proof relevance from all previous fine-grained proof dependencies, usingmachine learning with a naive Bayes classifier. This technique works surprisinglywell: in comparison with running
Vampire -SInE directly on the large problems,the problems pruned by such trained machine learner can be proved by
Vampire in 717 cases, i.e., the efficiency of the automated theorem proving is raised byabout 30% when we apply the knowledge about previous proof dependencies,which is a very significant advance in the world of automated theorem proving,where the search complexity is typically superexponential.n [2] we further leverage this automated reasoning technique by scaling thedependency analysis to the whole
MML , and attempting a fully automated prooffor every
MML theorem. This yields the so-far largest number of fully automatedproofs over the whole
MML , allowing us (using the precise formal dependenciesof the ATP and
MML proofs) to attempt an initial comparison of human andautomated proofs in general mathematics.
A particular practical use of fine dependencies (initially motivating the workdone on
Coq dependencies in 3) is for advanced interactive editing. tmEgg [10]is a TEX macs -based user interface to
Coq . Its main purpose is to integrateformal mathematics done in
Coq in a more general document (such as coursenotes or journal article) without forcing the document to follow the structure ofthe formal mathematics contained therein.For example, it does not require that the order in which the mathematicalconstructs appear in the document be the same as the order in which they arepresented to
Coq . As one would expect, the latter must respect the constraintsinherent to the incremental construction of the formal mathematics, such as alemma being proven before it is used in the proof of a theorem or a definitionbeing made before the defined construct is used.However, the presentation the author would like to put in the document maynot strictly respect these constraints. For example, clarity of exposition maybenefit from first presenting the proof of the main theorem, making it clear howeach lemma being used is useful, and then only go through all lemmas. Or adidactic presentation of a subject may first want to go through some examplesbefore presenting the full definitions for the concepts being manipulated. tmEgg thus allows the mathematical constructs to be in any order in thedocument, and uses the dependency information to dynamically — and lazily —load any construct necessary to perform the requested action. For example, ifthe requested action is “check the proof of this theorem”, it will automaticallyload all definitions and lemmas used by the statement or proof of the theorem.An interactive editor presents slightly different requirements than the batchrecompilation scenario of a mathematical library described in 6.1. One suchdifference is that an interactive editor needs the dependency information, aspart of the interactive session, for partial in-progress proofs. Indeed, if any in-progress proof depends on an item T , and the user wishes to change or unload(remove from the environment) T , then the part of the in-progress proof thatdepends on T has to be undone, even if the dependency is opaque. The dependency tracking for
Coq was actually started by the second author as partof the development of tmEgg . This facility has been already integrated in the officialrelease of
Coq . Since then this facility was extended to be able to treat the whole ofthe
CoRN library. These changes are not yet included in the official release of
Coq . Related Work
Related work exists in the first-order ATP field, where a number of systemscan today output the axioms needed for a particular proof. Purely semantic(proof object) dependencies have been extracted several times for several ITPs,for example by Bertot and the Helm project for
Coq [5,13,4], and Obua andMcLaughlin for HOL Light and
Isabelle . The focus of the latter two dependencyextractions is on cross-verification, and are based on quite low-level (proof object)mechanisms. A higher-level semantic dependency exporter for HOL Light wasrecently implemented by Adams [1] for his work on HOL Light re-verification inHOL Zero. This could be usable as a basis for extending our applications to thecore HOL Light library and the related large Flyspeck library. The Coq / CoRN approach quite likely easily scales to other large
Coq libraries, like for examplethe one developed in the Math Components project [6]. Our focus in this workis wider than the semantic-only efforts: We attempt to get the full informationabout all implicit mechanisms (including syntactic mechanisms), and we areinterested in using the information for smart re-compilation, which requires totrack much more than just the purely semantic or low-level information.
In this paper we have tried to show the importance and attractiveness of formaldependencies. We have implemented and used two very different techniques toelicit fine-grained proof dependencies for two very different proof assistants andtwo very different large formal mathematical libraries. This provides enough con-fidence that our approaches will scale to other important libraries and assistants,and our techniques and the derived benefits will be usable in other contexts.Mathematics is being increasingly encoded in a computer-understandable(formal) and in-principle-verifiable way. The results are increasingly large inter-dependent computer-understandable libraries of mathematical knowledge. (Col-laborative) development and refactoring of such large libraries requires advancedcomputer support, providing fast computation and analysis of dependencies, andfast re-verification methods based on the dependency information. As such auto-mated assistance tools reach greater and greater reasoning power, the cost/ben-efit ratio of doing formal mathematics decreases.Given our previous work on several parts of this program, providing ex-act dependency analysis and linking it to the other important tools seems tobe a straightforward choice. Even though the links to proof automation, fastlarge-scale refactoring, and proof analysis, are very fresh, it is our hope thatthe significant performance boosts already sufficiently demonstrate the impor-tance of good formal dependency analysis for formal mathematics, and for futuremathematics in general. By higher-level we mean tracking higher-level constructs, like use of theorems andtactics, not just tracking of the low-level primitive steps done in the proof-assistant’skernel. eferences
1. Adams, M.: Introducing HOL Zero. In: Fukuda, K., Hoeven, J., Joswig, M.,Takayama, N. (eds.) Mathematical Software—ICMS 2010, LNCS, vol. 6327, pp.142–143. Springer (2010), http://dx.doi.org/10.1007/978-3-642-15582-6_25
2. Alama, J., Kuehlwein, D., Urban, J.: Automated and human proofs in generalmathematics: An initial comparison. In: LPAR. Lecture Notes in Computer Sci-ence, Springer (2012), accepted3. Alama, J., K¨uhlwein, D., Tsivtsivadze, E., Urban, J., Heskes, T.: Premise selectionfor mathematics by corpus analysis and kernel methods. CoRR abs/1108.3446(2011), http://arxiv.org/abs/1108.3446
4. Asperti, A., Padovani, L., Coen, C.S., Guidi, F., Schena, I.: Mathematical knowl-edge management in HELM. Ann. Math. Artif. Intell. 38(1-3), 27–46 (2003)5. Bertot, Y., Pons, O., Pottier, L.: Dependency graphs for interactive theoremprovers. Tech. rep., INRIA (2000), report RR-40526. Garillot, F., Gonthier, G., Mahboubi, A., Rideau, L.: Packaging mathematicalstructures. In: Tobias Nipkow, Christian Urban (eds.) TPHOLs. LNCS, vol. 5674.Springer (2009), http://hal.inria.fr/inria-00368403/en/
7. Grabowski, A., Kornilowicz, A., Naumowicz, A.:
Mizar in a nutshell. Journal ofFormalized Reasoning 3(2), 153–245 (2010), http://jfr.cib.unibo.it/article/view/1980/1356
8. Haftmann, F., Wenzel, M.: Constructive type classes in isabelle. In: Altenkirch,T., McBride, C. (eds.) TYPES. Lecture Notes in Computer Science, vol. 4502, pp.160–174. Springer (2006)9. Hoder, K., Voronkov, A.: Sine qua non for large theory reasoning. In: CADE 11(2011), to appear10. Mamane, L., Geuvers, H.: A document-oriented
Coq plugin for TEX macs . In:Kauers, M., Kerber, M., Miner, R., Windsteiger, W. (eds.) MKM 2007 - Work inProgress. RISC Report, vol. 07-06, pp. 47–60. University of Linz, Austria (2007),technical report no. 07-06 in Series11. Matuszewski, R., Rudnicki, P.:
Mizar : the first 30 years. Mechanized Mathematicsand Its Applications 4, 3–24 (2005)12. Meng, J., Paulson, L.C.: Lightweight relevance filtering for machine-generated res-olution problems. J. Applied Logic 7(1), 41–57 (2009)13. Pons, O., Bertot, Y., Rideau, L.: Notions of dependency in proof assistants. In:UITP 1998. Eindhoven University of Technology (1998)14. Spitters, B., van der Weegen, E.: Type classes for mathematics in type theory.CoRR abs/1102.1323 (2011)15. Urban, J.: MoMM—fast interreduction and retrieval in large libraries of formalizedmathematics. International Journal on Artificial Intelligence Tools 15(1), 109–130(2006)16. Urban, J., Hoder, K., Voronkov, A.: Evaluation of automated theorem proving onthe