[PDF] Dependencies in Formal Mathematics: Applications and Extraction for Coq and Mizar

Abstract

Two methods for extracting detailed formal dependencies from the Coq and Mizar system are presented and compared. The methods are used for dependency extraction from two large mathematical repositories: the Coq Repository at Nijmegen and the Mizar Mathematical Library. Several applications of the detailed dependency analysis are described and proposed. Motivated by the different applications, we discuss the various kinds of dependencies that we are interested in,and the suitability of various dependency extraction methods.

Full PDF

aa r X i v : . [ c s . D L ] M a r Dependencies in Formal Mathematics:Applications and Extraction for Coq and Mizar

Jesse Alama , Lionel Mamane , and Josef Urban New University of Lisbon L-7243 Bereldangem, Luxembourg Radboud University Nijmegen

Abstract.

Two methods for extracting detailed formal dependenciesfrom the Coq and Mizar system are presented and compared. The meth-ods are used for dependency extraction from two large mathematicalrepositories: the

Coq

Repository at Nijmegen and the

Mizar

Mathemati-cal Library. Several applications of the detailed dependency analysis aredescribed and proposed. Motivated by the diﬀerent applications, we dis-cuss the various kinds of dependencies that we are interested in, and thesuitability of various dependency extraction methods.

This paper presents two methods for extracting detailed formal dependenciesfrom two state-of-the-art interactive theorem provers (ITPs) for mathematics:the Coq system and the Mizar system. Our motivation for dependency extractionis application-driven. We are interested in using detailed dependencies for fastrefactoring of large mathematical libraries and wikis, for AI methods in auto-mated reasoning that learn from previous proofs, for improved interactive editingof formal mathematics, and for foundational research over formal mathematicallibraries.These applications require diﬀerent notions of formal dependency . We discussthese diﬀerent requirements, and as a result provide implementations that inseveral important aspects signiﬁcantly diﬀer from previous methods. For Mizar,the developed method captures practically all dependencies needed for success-ful re-veriﬁcation of a particular formal text (i.e., also notational dependencies,automations used, etc.), and the method attempts hard to determine the min-imal set of such dependencies. For Coq, the method goes farther towards re-veriﬁcation of formal texts than previous methods [5,13,4] that relied solely onthe ﬁnal proof terms. For example, we can already track Coq dependencies thatappear during the tactic interpretation, but that do not end up being used inthe ﬁnal proof term.The paper is organized as follows. Section 2 brieﬂy discusses the notion offormal dependency. Section 3 describes the implementation of dependency ex-traction in the

Coq system, and Section 4 describes the implementation in the

Mizar system. Section 5 compares the two implemented approaches to depen-dency computation. Section 6 describes several experiments and measurementsonducted using our implementations on the

CoRN and

MML libraries, includ-ing training of AI/ATP proof assistance systems on the data, and estimatingthe speed-up for collaborative large-library developments. Section 8 concludes.

Generally, we say that a deﬁnition, or a theorem, T depends on some deﬁnition,lemma or other theorem T ′ , (or equivalently, that T ′ is a dependency of T ) if T “needs” T ′ to exist or hold. The main way such a “need” arises is that thewell-formedness, justiﬁcation, or provability of T does not hold in the absenceof T ′ . We consider formal mathematics done in a concrete proof assistant sowe consider mathematical and logical constructs not only as abstract entitiesdepending on each other, but also as concrete objects (e.g., texts, syntax trees,etc.) in the proof assistants. For our applications, there are diﬀerent notions of“dependency” we are interested in: – Purely semantic/logical view. One might claim, for example, that the lambdaterm (or proof object in the underlying formal framework) contains all suf-ﬁcient dependencies for a particular theorem, regardless of any notationalconventions, library mechanisms, etc. – Purely pragmatic view. Such dependencies are met if the particular itemstill compiles in a particular high-level proof assistant framework, regard-less of possibly changed underlying semantics. This view takes into accountthe proof assistant as the major dependency, with their sophisticated mech-anisms like auto hint databases, notations, type automations, deﬁnitionsexpansions, proof search depth, parser settings, hidden arguments, etc.Formal dependencies can also be implicit and explicit. In the simple worldof ﬁrst-order automated theorem proving, proofs and their dependencies aregenerally quite detailed and explicit about (essentially) all logical steps, even verysmall ones (such as the steps taken in a resolution proof). But in ITPs, which aregenerally oriented toward human mathematicians, one of the goals is to allow theusers to express themselves with minimal logical verbosity and ITPs come with anumber of implicit mechanisms. Examples are type mechanisms (e.g., type-classautomations of various ﬂavors in

Coq [14] and

Isabelle [8], Prolog-like types in

Mizar [17,15]), hint mechanisms (in

Coq and

Isabelle ), etc. If we are interestedin giving a complete answer to the question of what a formalized proof dependsupon, we must expose such implicit facts and inferences.Formal dependencies reported by ITPs are typically suﬃcient . Depending onthe extraction mechanism, redundant dependencies can be reported. Bottom-upprocedures like congruence-closure and type closure in Mizar (and e.g., type-classmechanisms in other ITPs) are examples of mechanisms when the ITP uses avail-able knowledge exhaustively, often drawing in many unnecessary dependenciesfrom the context. For applications, it is obviously better if such unnecessarydependencies can be removed .

Dependency extraction in

Coq

Recall that

Coq is based on the Curry-Howard isomorphism, meaning that:1. A statement (formula) is encoded as a type.2. There is, at the “bare” logical level, no essential diﬀerence between a deﬁ-nition and a theorem: they are both the binding (in the environment) of aname to a type (type of the deﬁnition, statement of the theorem) and a term(body of the deﬁnition, proof of the theorem).3. Similarly, there is no essential diﬀerence between an axiom and a param-eter: they are both the binding (in the environment) of a name to a type(statement of the axiom, type of the parameter, e.g. “natural number”).4. There is, as far as

Coq is concerned, no diﬀerence between the notions oftheorem, lemma, corollary, . . .Thus, in this section, and in other sections when talking of

Coq , we do not alwaysrepeat “axiom or parameter”, nor repeat “deﬁnition or theorem or lemma orcorollary or . . . ”. We will use “axiom” for “axiom or parameter” and “theorem”or “deﬁnition” for “deﬁnition or theorem or lemma or corollary or . . . ”. Similarlyfor “proof” and “deﬁnition body”.There are essentially three groups of

Coq commands that need to be treatedby the dependency tracking:

1. Commands that register a new logical construct (deﬁnition or axiom), either – From scratch. That is, commands that take as arguments a name anda type and/or a body, and that add the deﬁnition binding this name tothis type and/or body. The canonical examples are

Definition

Name : type := body and

Axiom

Name : type

The type can also be given implicitly as the inferred type of the body,as in

Definition

Name := body – Saving the current (completely proven) theorem in the environment.These are the “end of proof” commands, such as

Qed , Save , Defined .2. Commands that make progress in the current proof, which is necessarilymade in several steps:(a) Opening a new theorem, as in

Theorem

Name : type or Definition

Name : type As far as logical constructs are concerned. b) An arbitrary strictly positive amount of proof steps.(c) Saving that theorem in the environment.These commands update (by adding exactly one node) the internal

Coq structure called “proof tree”.3. Commands that open a new theorem, that will be proven in multiple steps.The dependency tracking is implemented as suitable hooks in the

Coq functionsthat the three kinds of commands eventually call. When a new construct isregistered in the environment, the dependency tracking walks over the typeand body (if present) of the new construct and collects all constructs that arereferenced. When a proof tree is updated, the dependency tracking examinesthe top node of the new proof tree (note that this is always the only changewith regards to the previous proof tree). The commands that update the prooftree (that is, make a step in the current proof) are called tactics . Coq ’s tacticinterpretation goes through three main phases:1. parsing;2. Ltac expansion;3. evaluation.The tactic structure after each of these phases is stored in the proof tree. Thisallows to collect all construct references mentioned at any of these tree levels.For example, if tactic Foo T is deﬁned as try apply Bolz an oWe ie rstrass ;s o l v e [ T | auto ] and the user invokes the tactic as Foo FeitThompson , then the ﬁrst level willcontain (in parsed form)

Foo FeitThompson , the second level will contain (inparsed form) try apply Bolz an oWe ie rstrass ;s o l v e [ FeitThompson | auto ] . } and the third level can contain any of: – refine (BolzanoWeierstrass ...) , – refine (FeitThompson ...) , – something else, if the proof was found by auto .The third level typically contains only a few of the basic atomic fundamentalrules (tactics) applications, such as refine , intro , rename or convert , andcombinations thereof. Ltac is the

Coq ’s tactical language, used to combine tactics and add new user-deﬁnedtactics. .1 Dependency availability, format, and protocol

Coq supports several interaction protocols: the coqtop , emacs and coq-interface protocols. Dependency tracking is available in the program implementing the coq-interface protocol which is designed for machine interaction. The depen-dency information is printed in a special message for each potentially progress-making command that can give rise to a dependency. A potentially progress-making command is one whose purpose is to change Coq ’s state. For example,the command

Print Foo , which displays the previously loaded mathematicalconstruct

Foo , is not a potentially progress-making command . Any tactic invo-cation is a potentially progress-making command. For example, the tactic auto silently succeeds (without any eﬀect) if it does not completely solve the goal itis assigned to solve. In that case, although that particular invocation did notmake any actual progress in the proof, auto is still considered a potentiallyprogress-making command, and the dependency tracking outputs the message ‘‘dependencies: (empty list)’’ . Other kinds of progress-making commandsinclude, for example notation declarations and morphisms registrations. Somecommands, although they change Coq ’s state, might not give rise to a depen-dency. For example, the

Set Firstorder Depth command, taking only an inte-ger argument, changes the maximum depth at which the firstorder tactic willsearch for a proof. For such a command, no dependency message is output.One command may give rise to several dependency messages, when theychange

Coq ’s state in several diﬀerent ways. For example, the intuition tactic can, mainly for eﬃciency reasons, construct an ad hoc lemma, register it intothe global environment and then use that lemma to prove the goal it has beenassigned to solve, instead of introducing the ad hoc lemma as a local hypothesisthrough a cut. This is mainly an optimization: The ad hoc lemma is deﬁnedas “opaque”, meaning that the typechecking (proofchecking) algorithm is notallowed to unfold the body (proof) of the lemma when the lemma is invoked,and thus won’t spend any time doing so. By contrast, a local hypothesis isalways “transparent”, and the typechecking algorithm is allowed to unfold itsbody. For the purpose of dependency tracking this means that intuition makes two conceptually diﬀerent steps:1. register a new global lemma, under a fresh name;2. solve the current subgoal in the proof currently in progress. Thus, although this commands obviously needs item

Foo to be deﬁned to succeed,the dependency tracking does not output that information. That is not a problemin practice because such commands are usually issued by a user interface to treat aninteractive user request (for example “show me item

Foo ”), but are not saved intothe script that is saved on disk. Even if they were saved into the script, adding orremoving them to (from, respectively) the script does not change the semantics ofthe script. The intuition tactic is a decision procedure for intuitionistic propositional calculusbased on the contraction-free sequent calculi LJT* of Roy Dyckhof, extended tohand over subgoals which it cannot solve to another tactic. ach of these steps gives rise to diﬀerent dependencies. For example, if thecurrent proof is

BolzanoWeierstrass , then the new global lemma gives rise todependencies of the form“

BolzanoWeierstrass subproofN depends on . . . ”where the subproofN suﬃx is

Coq ’s way of generating a fresh name. Closingof the subgoal by use of

BolzanoWeierstrass subproofN then gives rise to thedependency“

BolzanoWeierstrass depends on

BolzanoWeierstrass subproofN ” The

Coq dependency tracking is already quite extensive, and suﬃcient for thewhole Nijmegen

CoRN corpus. Some restrictions remain in parts of the

Coq inter-nals that the second author does not yet fully understand. Our interests (andexperiments) include not only purely mathematical dependencies that can befound in the proof terms (for previous work see also [13,4]), but also fast recom-pilation modes for easy authoring of formal mathematics in large libraries andformal wikis. The

Coq dependency tracking code currently ﬁnds all logically rel-evant dependencies from the proof terms, even those that arise from automationtactics. It does not handle yet the non-logical dependencies. Examples includenotation declarations, morphism and equivalence relation declarations, auto hint database registrations, but also tactic interpretation. At this stage, wedon’t handle most of these, but as already explained, the internal structure of Coq lends itself well to collecting dependencies that appear at the various levels oftactic interpretation. This means that we can already handle the ( non-semantic )dependencies on logical constructs that appear during the tactic interpretation,but that do not end up being used in the ﬁnal proof term.Some of the non-logical dependencies are a more diﬃcult issue. For exam-ple, several dependencies related to tactic parametrization ( auto hint databases, firstorder proof depth search) need speciﬁc knowledge of how the tactic is in-ﬂuenced by parameters, or information available only to the internals of thetactic. The best approach to handle such dependencies seems to be to change(at the

OCaml source level in

Coq ) the type of a tactic, so that the tactic it-self is responsible for providing such dependencies. This will however have to bevalidated in practice, provided that we manage to persuade the greater

Coq com-munity about the importance and practical usefulness of complete dependencytracking for formal mathematics and for research based on it. Such as when and how dynamics are used in tactic expressions or a completeoverview of all datatype tactics take as arguments. So that the tactics for equality can handle one’s user-deﬁned equality. auto not only needs that the necessary lemmas be available in the environment, butit also needs to be speciﬁcally instructed to try to use them, through a mechanismwhere the lemmas are registered in a “hint database”. Each invocation of auto canspecify which hint databases to use. oq also presents an interesting corner case as far as opacity of dependenciesis concerned. On the one hand, Coq has an explicit management of opacityof items; an item originally declared as opaque can only be used genericallywith regards to its type; no information arising from its body can be used, theonly information available to other items is the type. Lemmas and theoremsare usually declared opaque , and deﬁnitions usually declared transparent, butthis is not forced by the system. In some cases, applications of lemmas need tobe transparent. Coq provides an easy way to decide whether a dependency isopaque or transparent: dependencies on opaque objects can only be opaque, anddependencies on transparent objects are to be considered transparent.Note that the predicative calculus of inductive constructions (pCIC) uses auniverse level structure, where the universes have to be ordered in a well-foundedway at all times. However, the ordering constraints between the universes arehidden from the user, and are absent from the types (statements) the user writes.Changing the proof of a theorem T can potentially have an inﬂuence on theuniverse constraints of the theorem. Thus, changing the body of an opaque item T ′ appearing in the proof of T can change the universe constraints attached toit, potentially in a way that is incompatible with the way it was previously usedin the body of T . Detecting whether the universe constraints have changed ornot is not completely straightforward, and needs speciﬁc knowledge of the pCIC.But unless one does so, for complete certainty of correctness of the library as awhole, one has to consider all dependencies as transparent. Note that in practiceuniverse constraint incompatibilities are quite rare. A large library may thusoptimize its rechecking after a small change, and not immediately follow opaquereverse dependencies. Instead, fully correct universe constraint checking couldbe done in a postponed way, for example by rechecking the whole library fromscratch once per week or per month. Mizar

Dependency computation in

Mizar diﬀers from the implementation provided for

Coq , being in some sense much simpler, but at the same time also more robustwith respect to the potential future changes of the

Mizar codebase. For compar-ison of the techniques, see Section 5. For a more detailed discussion of

Mizar ,see [11] or [7].In

Mizar , every article A has its own environment E A specifying the context(theorems, deﬁnitions, notations, etc.) that is used to verify the article. E A , isusually a rather conservative overestimate of the items that the article actuallyneeds. For example, even if an article A needs only one deﬁnition (or theorem,or notation, or scheme, or. . . ) from article B , all the deﬁnitions (theorems, nota-tions, schemes, . . . ) from B will be present in E A . The dependencies for an article A are computed as the smallest environment E ′ A under which A is still Mizar -veriﬁable (and has the same semantics as A did under E A ). To get dependencies thereby following the mathematical principle of proof irrelevance . f a particular Mizar item I (theorem, deﬁnition, etc.,), we ﬁrst create a mi-croarticle containing essentially just the item I , and compute the dependenciesof this microarticle.More precisely, computing ﬁne-grained dependencies in Mizar takes threesteps:

Normalization

Rewrite every article of the

Mizar

Mathematical Library sothat: – Each deﬁnition block deﬁnes exactly one concept.Deﬁnition blocks that contain multiple deﬁnitions or notations can leadto false positive dependencies. For example, if two functions g and g aredeﬁned in a single deﬁnition block, and a theorem φ uses f but not g ,then we want to be able to say that φ depends on f but is independent of g . Without splitting deﬁnition blocks, we have the specious dependencyof φ upon g . – All toplevel logical linking is replaced by explicit reference: constructionssuch as φ ; then ψ ; whereby the statement ψ is justiﬁed by the statement φ , are replaced by Label1 : φ ;Label2 : ψ by Label1 ; where Label1 and

Label2 are new labels. By doing this transformation,we ensure that the only way that a statement is justiﬁed by another isthrough explicit reference. – Segments of reserved variables all have length exactly 1. For example,constructions such as reserve A for set ,B for non empty set ,f for Function of A, B,M for

Cardinal ; which is a single reservation statement that assigns types to four variables( A , B , f , and M ) is replaced by four reservation statements, each of whichassigns a type to a single variable: reserve A for s e t ; reserve B for non empty s e t ; reserve f for Function of A, B; reserve M for Cardinal ;

When reserved variables are normalized in this way, one can eliminatesome false positive dependencies. A theorem in which, say, the variable f occurs freely but which has nothing to do with cardinal numbers hasthe type Function of A,B in the presence of both the ﬁrst and thesecond sequences of reserved variables. If the ﬁrst reservation statementis deleted, the theorem becomes ill-formed because f no longer has atype. But the reservation statement itself directly requires that the type Cardinal of cardinal numbers is available, and thus indirectly requires aart of the development of cardinal numbers. If the theorem has nothingto do with cardinal numbers, this dependency is clearly specious. Byrewriting reserved variables in the second way, though, one sees that onecan safely delete the fourth reservation statement, thereby eliminatingthis false dependency.These rewritings do not aﬀect the semantics of the

Mizar article.

Decomposition

For every normalized article A in the Mizar

Mathematical Li-brary, extract the sequence h I A , I A , . . . , I An i of its toplevel items, each ofwhich written to a “microarticle” A k that contains only I Ak and whose envi-ronment is that of A and contains each A j ( j < k ). Minimization

For every article A of the Mizar

Mathematical Library and everymicroarticle A n of A , do a brute-force minimization of smallest environment E A n such that A n is Mizar -veriﬁable.The brute-force minimization works as follows. Given a microarticle A , wesuccessively trim the environment for all the Mizar environment item kinds. Each item kind is associated with a sequence s of imported items h a , . . . , a n i and the task is to ﬁnd a minimal sublist s ′ of s such that A is Mizar -veriﬁable. We apply a simple binary search algorithm to s to compute the minimal sublist s ′ . Applying this approach for all Mizar item kinds, for all microarticles A k , forall articles A of the Mizar

Mathematical Library is a rather expensive compu-tation (for some

Mizar articles, this process can take several hours). It is muchslower than the method used for

Coq described in Section 3. However the resultis truly minimized, which is important for many applications of dependencies.Additionally, we have already developed some heuristics that help to ﬁnd s ′ , andthese already do perform tolerably fast. Some observations comparing the

Coq and

Mizar dependency computation canbe drawn generally, without comparing the actual data as done in the followingsections. Dependencies in the case of

CoRN are generated by hooking into theactual code and are thus quite exactly mirroring the work of the proof assistant.In the case of

Mizar , dependencies are approximated from above. The depen-dency graph in this case starts with an over-approximation of what is known tobe suﬃcient for an item to be

Mizar -veriﬁable and then successively reﬁnes thisover-approximation toward a minimal set of suﬃcient conditions. A signiﬁcantdiﬀerence is that the dependencies in

Coq are not minimized: the dependencytracking there tells us exactly the dependencies that were used by

Coq (in theparticular context) when a certain command is run. Thus, if for example thecontext is rich, and redundant dependencies are used by some exhaustive strate-gies, we will not detect their redundancy. On the other hand, in

Mizar we do not Namely, theorems, schemes, top-level lemmas, deﬁnitional theorems, deﬁnientia, pat-terns, registrations, and constructors. See [7] for a discussion of these item kinds. There is always one minimal sublist, since we assume that A is Mizar -veriﬁable tobegin with. ely on the proof assistant reporting how it exactly works, and instead try to ex-haustively minimize the set of dependencies, until an error occurs. This processis more computationally intensive, however, it guarantees minimality (relativeto the proof assistant’s power) which is interesting for many of the applicationsmentioned below.Another diﬀerence is in the coverage of non-logical constructs. Practicallyevery resource necessary for a veriﬁcation of a

Mizar article is an explicit partof the article’s environment. Thus, it is easy to minimize not just the strictlylogical dependencies, but also the non-logical ones, like the sets of symbols andnotations needed for a particular item, or particular automations like deﬁnitionalexpansions. For LCF-based proof assistants, this typically implies further workon the dependency tracking.

CoRN and

MML

We use the dependency extraction methods described in 3 and 4 to obtain ﬁnedependency data for the

CoRN library and an initial 100 article fragment of the

MML . As described above, for

CoRN , we use the dependency exporter imple-mented directly using the

Coq code base. The export is thus approximately asfast as the

Coq processing of

CoRN itself, taking about 40 minutes on contem-porary hardware. The product are for each

CoRN ﬁle a corresponding ﬁle withdependencies, which have altogether about 65 MB. This information is thenpost-processed by standard UNIX and other tools into the dependency graphdiscussed below.For

Mizar and

MML we use the brute-force dependency extraction approachdiscussed above. This takes signiﬁcantly longer than

Mizar processing alone, alsobecause of the number of preprocessing and normalization steps that need to bedone when splitting articles into micro-articles. For our data this now takesabout one day for the initial 100 article fragment of the

MML , the main share ofthis time being spent on minimizing the large numbers of items used implicitlyby

Mizar . Note that in this implementation we are initially more interested inachieving completeness and minimality rather than eﬃciency, and a number ofavailable optimizations can reduce this time signiﬁcantly . The data obtainedare again post-processed by standard UNIX tools into the dependency graphs.In order to compare the beneﬁts of having ﬁne dependencies, we also computefor each library the full ﬁle-based dependency graph for all items. These graphsemulate the current dumb ﬁle-based treatment of dependencies in these libraries:each time an item is changed in some ﬁle, all items in the depending ﬁles haveto be re-veriﬁed. The two kinds of graphs for both libraries are then comparedin Table 1. For example a very simple recent optimization done for theorems, deﬁnitions, andschemes, has reduced the processing time in half. he graphs conﬁrm our initial intuition that having the ﬁne dependencieswill signiﬁcantly speed up partial recompilation of the large libraries, which isespecially interesting in the

CoRN and

MML formal wikis that we develop. For example, the average number of items that need to be recompiled when arandom item is changed has dropped about seven times for

CoRN , and aboutﬁve times for

Mizar . The medians for these numbers are even more interesting,increasing to ﬁfteen for

Mizar . The diﬀerence between

MML and

CoRN is alsoquite interesting, but it is hard to draw any conclusions. The corpora diﬀer intheir content and use diﬀerent styles and techniques.

CoRN/item CoRN/ﬁle MML-100/item MML-100/ﬁleItems 9 462 9 462 9 553 9 553Deps 175 407 2 214 396 704 513 21 082 287TDeps 3 614 445 24 385 358 7 258 546 34 974 804P(%) 8 54 . . . . . . . . . Deps

Number of dependency edges

TDeps

Number of transitive dependency edges P Probability that given two randomly chosen items, one depends (directly or indi-rectly) on the other, or vice versa.

ARL

Average number of items recompiled if one item is changed.

MRL

Median number of items recompiled if one item is changed.

Table 1.

Statistics of the item-based and ﬁle-based dependencies for

CoRN and

MML i t e m s CoRN/item: reverse dependencies 010002000300040005000600070008000900010000 0 1500 3000 4500 6000 7500 9000 i t e m s CoRN/ﬁle: reverse dependencies

Fig. 1.

Cumulative transitive reverse dependencies for

CoRN : ﬁle-based vs. item-based

Another interesting new statistics given in Table 6.1 is the information aboutthe number and structure of explicit and implicit dependencies that we have donefor

Mizar . Explicit dependencies are anything that is already mentioned in theoriginal text. Implicit dependencies are everything else, for example dependen-cies on type mechanisms. Note that the ratio of implicit dependencies is verysigniﬁcant, which suggests that handling them precisely can be quite necessaryfor the learning and ATP experiments conducted in the next section. http://mws.cs.ru.nl/mwiki/ , http://mws.cs.ru.nl/cwiki/ i t e m s MML/item: reverse dependencies 010002000300040005000600070008000900010000 0 1500 3000 4500 6000 7500 9000 i t e m s MML/ﬁle: reverse dependencies

Fig. 2.

Cumulative transitive reverse dependencies for

MML : ﬁle-based vs. item-basedtheorem top-level lemma deﬁnition scheme registrationfrom 550134 44120 44216 7053 58622to 314487 2384 263486 6510 108449

Table 2.

Statistics of Mizar direct dependencies from and to diﬀerent items

The knowledge of how a large number of theorems are proved is used by math-ematicians to direct their new proof attempts and theory developments. In thesame way, the precise formal proof knowledge that we now have can be used fordirecting formal automated theorem proving (ATP) systems and meta-systemsover the large mathematical libraries. In [3] we provide an initial evaluation ofthe usefulness of our

MML dependency data for machine learning of such proofguidance of ﬁrst-order ATPs.These experiments are conducted on a set of 2078 problems extracted fromthe

Mizar library and translated to ﬁrst-order ATP format. We emulate thegrowth of the

Mizar library (limited to the 2078 problems), by considering allprevious theorems and deﬁnitions when a new conjecture is attempted (i.e., whena new theorem is formulated by an author, requiring a proof). The ATP problemsthus become very large, containing thousands of the previously proved formulasas available axioms, which obviously makes automated theorem proving quitediﬃcult, see e.g. [16] and [12] for details. We run the state-of-the-art

Vampire -SInE [9] ATP system on these large problems, and solve 567 of them (witha 10-second timelimit). Then, instead of attacking such large problems directly,we learn proof relevance from all previous ﬁne-grained proof dependencies, usingmachine learning with a naive Bayes classiﬁer. This technique works surprisinglywell: in comparison with running

Vampire -SInE directly on the large problems,the problems pruned by such trained machine learner can be proved by

Vampire in 717 cases, i.e., the eﬃciency of the automated theorem proving is raised byabout 30% when we apply the knowledge about previous proof dependencies,which is a very signiﬁcant advance in the world of automated theorem proving,where the search complexity is typically superexponential.n [2] we further leverage this automated reasoning technique by scaling thedependency analysis to the whole

MML , and attempting a fully automated prooffor every

MML theorem. This yields the so-far largest number of fully automatedproofs over the whole

MML , allowing us (using the precise formal dependenciesof the ATP and

MML proofs) to attempt an initial comparison of human andautomated proofs in general mathematics.

A particular practical use of ﬁne dependencies (initially motivating the workdone on

Coq dependencies in 3) is for advanced interactive editing. tmEgg [10]is a TEX macs -based user interface to

Coq . Its main purpose is to integrateformal mathematics done in

Coq in a more general document (such as coursenotes or journal article) without forcing the document to follow the structure ofthe formal mathematics contained therein.For example, it does not require that the order in which the mathematicalconstructs appear in the document be the same as the order in which they arepresented to

Coq . As one would expect, the latter must respect the constraintsinherent to the incremental construction of the formal mathematics, such as alemma being proven before it is used in the proof of a theorem or a deﬁnitionbeing made before the deﬁned construct is used.However, the presentation the author would like to put in the document maynot strictly respect these constraints. For example, clarity of exposition maybeneﬁt from ﬁrst presenting the proof of the main theorem, making it clear howeach lemma being used is useful, and then only go through all lemmas. Or adidactic presentation of a subject may ﬁrst want to go through some examplesbefore presenting the full deﬁnitions for the concepts being manipulated. tmEgg thus allows the mathematical constructs to be in any order in thedocument, and uses the dependency information to dynamically — and lazily —load any construct necessary to perform the requested action. For example, ifthe requested action is “check the proof of this theorem”, it will automaticallyload all deﬁnitions and lemmas used by the statement or proof of the theorem.An interactive editor presents slightly diﬀerent requirements than the batchrecompilation scenario of a mathematical library described in 6.1. One suchdiﬀerence is that an interactive editor needs the dependency information, aspart of the interactive session, for partial in-progress proofs. Indeed, if any in-progress proof depends on an item T , and the user wishes to change or unload(remove from the environment) T , then the part of the in-progress proof thatdepends on T has to be undone, even if the dependency is opaque. The dependency tracking for

Coq was actually started by the second author as partof the development of tmEgg . This facility has been already integrated in the oﬃcialrelease of

Coq . Since then this facility was extended to be able to treat the whole ofthe

CoRN library. These changes are not yet included in the oﬃcial release of

Coq . Related Work

Related work exists in the ﬁrst-order ATP ﬁeld, where a number of systemscan today output the axioms needed for a particular proof. Purely semantic(proof object) dependencies have been extracted several times for several ITPs,for example by Bertot and the Helm project for

Coq [5,13,4], and Obua andMcLaughlin for HOL Light and

Isabelle . The focus of the latter two dependencyextractions is on cross-veriﬁcation, and are based on quite low-level (proof object)mechanisms. A higher-level semantic dependency exporter for HOL Light wasrecently implemented by Adams [1] for his work on HOL Light re-veriﬁcation inHOL Zero. This could be usable as a basis for extending our applications to thecore HOL Light library and the related large Flyspeck library. The Coq / CoRN approach quite likely easily scales to other large

Coq libraries, like for examplethe one developed in the Math Components project [6]. Our focus in this workis wider than the semantic-only eﬀorts: We attempt to get the full informationabout all implicit mechanisms (including syntactic mechanisms), and we areinterested in using the information for smart re-compilation, which requires totrack much more than just the purely semantic or low-level information.

In this paper we have tried to show the importance and attractiveness of formaldependencies. We have implemented and used two very diﬀerent techniques toelicit ﬁne-grained proof dependencies for two very diﬀerent proof assistants andtwo very diﬀerent large formal mathematical libraries. This provides enough con-ﬁdence that our approaches will scale to other important libraries and assistants,and our techniques and the derived beneﬁts will be usable in other contexts.Mathematics is being increasingly encoded in a computer-understandable(formal) and in-principle-veriﬁable way. The results are increasingly large inter-dependent computer-understandable libraries of mathematical knowledge. (Col-laborative) development and refactoring of such large libraries requires advancedcomputer support, providing fast computation and analysis of dependencies, andfast re-veriﬁcation methods based on the dependency information. As such auto-mated assistance tools reach greater and greater reasoning power, the cost/ben-eﬁt ratio of doing formal mathematics decreases.Given our previous work on several parts of this program, providing ex-act dependency analysis and linking it to the other important tools seems tobe a straightforward choice. Even though the links to proof automation, fastlarge-scale refactoring, and proof analysis, are very fresh, it is our hope thatthe signiﬁcant performance boosts already suﬃciently demonstrate the impor-tance of good formal dependency analysis for formal mathematics, and for futuremathematics in general. By higher-level we mean tracking higher-level constructs, like use of theorems andtactics, not just tracking of the low-level primitive steps done in the proof-assistant’skernel. eferences

1. Adams, M.: Introducing HOL Zero. In: Fukuda, K., Hoeven, J., Joswig, M.,Takayama, N. (eds.) Mathematical Software—ICMS 2010, LNCS, vol. 6327, pp.142–143. Springer (2010), http://dx.doi.org/10.1007/978-3-642-15582-6_25

2. Alama, J., Kuehlwein, D., Urban, J.: Automated and human proofs in generalmathematics: An initial comparison. In: LPAR. Lecture Notes in Computer Sci-ence, Springer (2012), accepted3. Alama, J., K¨uhlwein, D., Tsivtsivadze, E., Urban, J., Heskes, T.: Premise selectionfor mathematics by corpus analysis and kernel methods. CoRR abs/1108.3446(2011), http://arxiv.org/abs/1108.3446

4. Asperti, A., Padovani, L., Coen, C.S., Guidi, F., Schena, I.: Mathematical knowl-edge management in HELM. Ann. Math. Artif. Intell. 38(1-3), 27–46 (2003)5. Bertot, Y., Pons, O., Pottier, L.: Dependency graphs for interactive theoremprovers. Tech. rep., INRIA (2000), report RR-40526. Garillot, F., Gonthier, G., Mahboubi, A., Rideau, L.: Packaging mathematicalstructures. In: Tobias Nipkow, Christian Urban (eds.) TPHOLs. LNCS, vol. 5674.Springer (2009), http://hal.inria.fr/inria-00368403/en/

7. Grabowski, A., Kornilowicz, A., Naumowicz, A.:

Mizar in a nutshell. Journal ofFormalized Reasoning 3(2), 153–245 (2010), http://jfr.cib.unibo.it/article/view/1980/1356

8. Haftmann, F., Wenzel, M.: Constructive type classes in isabelle. In: Altenkirch,T., McBride, C. (eds.) TYPES. Lecture Notes in Computer Science, vol. 4502, pp.160–174. Springer (2006)9. Hoder, K., Voronkov, A.: Sine qua non for large theory reasoning. In: CADE 11(2011), to appear10. Mamane, L., Geuvers, H.: A document-oriented

Coq plugin for TEX macs . In:Kauers, M., Kerber, M., Miner, R., Windsteiger, W. (eds.) MKM 2007 - Work inProgress. RISC Report, vol. 07-06, pp. 47–60. University of Linz, Austria (2007),technical report no. 07-06 in Series11. Matuszewski, R., Rudnicki, P.:

Mizar : the ﬁrst 30 years. Mechanized Mathematicsand Its Applications 4, 3–24 (2005)12. Meng, J., Paulson, L.C.: Lightweight relevance ﬁltering for machine-generated res-olution problems. J. Applied Logic 7(1), 41–57 (2009)13. Pons, O., Bertot, Y., Rideau, L.: Notions of dependency in proof assistants. In:UITP 1998. Eindhoven University of Technology (1998)14. Spitters, B., van der Weegen, E.: Type classes for mathematics in type theory.CoRR abs/1102.1323 (2011)15. Urban, J.: MoMM—fast interreduction and retrieval in large libraries of formalizedmathematics. International Journal on Artiﬁcial Intelligence Tools 15(1), 109–130(2006)16. Urban, J., Hoder, K., Voronkov, A.: Evaluation of automated theorem proving onthe