[PDF] Small, Fast, Concurrent Proof Checking for the lambda-Pi Calculus Modulo Rewriting

Abstract

Several proof assistants, such as Isabelle or Coq, can concurrently check multiple proofs. In contrast, the vast majority of today's small proof checkers either does not support concurrency at all or only limited forms thereof, restricting the efficiency of proof checking on multi-core processors. This work presents a small proof checker with support for concurrent proof checking, achieving state-of-the-art performance in both concurrent and nonconcurrent settings. The proof checker implements the lambda-Pi calculus modulo rewriting, which is an established framework to uniformly express a multitude of logical systems. The proof checker is faster than the reference proof checker for this calculus, Dedukti, on all of five evaluated datasets obtained from proof assistants and interactive theorem provers.

Full PDF

SSmall, Fast, Concurrent Proof Checking for the lambda-Pi Calculus Modulo Rewriting

Michael Färber

Université Paris-Saclay, ENS Paris-Saclay, CNRS, Inria, LSV, 91190, Gif-sur-Yvette, France [email protected]

Abstract

Several proof assistants, such as Isabelle or Coq, canconcurrently check multiple proofs. In contrast, the vastmajority of today’s small proof checkers either does notsupport concurrency at all or only limited forms thereof,restricting the efficiency of proof checking on multi-coreprocessors. This work presents a small proof checkerwith support for concurrent proof checking, achievingstate-of-the-art performance in both concurrent and non-concurrent settings. The proof checker implements thelambda-Pi calculus modulo rewriting, which is an es-tablished framework to uniformly express a multitudeof logical systems. The proof checker is faster than thereference proof checker for this calculus, Dedukti, on allof five evaluated datasets obtained from proof assistantsand interactive theorem provers.

Keywords: concurrency, performance, sharing, rewriting,reduction, verification, type checking, Dedukti, Rust

Perfection is attainednot when there is nothing more to add,but when there is nothing more to remove.—

Antoine de Saint-Exupéry ,Wind, Sand and Stars

Proof assistants are tools that provide a syntax torigorously specify mathematical statements and theirproofs, in order to mechanically verify them. A strongmotivation to use proof assistants is to increase thetrust in the correctness of mathematical results, suchas the Kepler conjecture [17], which has been verifiedusing the proof assistants HOL Light [21] and Isabelle

Permission to make digital or hard copies of all or part of thiswork for personal or classroom use is granted without fee providedthat copies are not made or distributed for profit or commercialadvantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work ownedby others than ACM must be honored. Abstracting with credit ispermitted. To copy otherwise, or republish, to post on servers orto redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].

Conference’17, July 2017, Washington, DC, USA © 2021 Association for Computing Machinery.ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00 https://doi.org/10.1145/nnnnnnn.nnnnnnn [38], and the Four-Colour Theorem [16], which has beenverified using Coq [8]. However, why should we believethat a proof is indeed correct when a proof assistantsays so? We might trust such a statement if we werecertain that the proof assistant was correct, i.e. thatthe proof assistant only accepts valid proofs. To verifythe correctness of the proof assistant, we can eitherinspect it by hand or verify it with another proof assistantin whose correctness we trust. However, many proofassistants are too complex and change too often to makesuch an endeavour worthwhile. Still, even if we ignorethe correctness of a proof assistant, we may trust itsstatements, provided that the proof assistant justifiesall statements in such a way that we can comprehendthe justifications and write a program to verify them. Aproof assistant “satisfying the possibility of independentchecking by a small program is said to satisfy the deBruijn criterion” [5]. We call such small programs proofcheckers .The logical framework Dedukti has been suggested asuniversal proof checker for many different proof assistants[4]. Its underlying calculus, the lambda-Pi calculus mod-ulo rewriting [13], is sufficiently powerful to efficientlyexpress a variety of logics, such as those underlying theproof assistants HOL and Matita [3], PVS [15], and theB method [18].The Dedukti theories generated by proof assistantsand automated theorem provers can be in the order ofgigabytes and take considerable amounts of time to verify.The current architecture of Dedukti allows only for arestricted form of concurrent proof checking, restrictingthe efficiency of proof checking on multi-core processors.Like Dedukti, most other existing small proof checkersdo not exploit multiple cores. This article deals withthe following question: how to implement a small andefficient concurrent proof checker, and how much timecan be saved using such a checker?This work sets out to answer the research questionby reimplementing a small, yet expressive subset of theDedukti kernel in the programming language Rust. Theresulting kernel allows for concurrent proof checking,compromising neither conciseness of the code nor per-formance. The proof checker built around this kernelis faster than Dedukti on all of five evaluated datasets.To the best of my knowledge, this is also the first proof a r X i v : . [ c s . L O ] F e b onference’17, July 2017, Washington, DC, USA Michael Färber checker that is both small in the sense of the de Bruijncriterion as well as concurrent.The contributions of this work are the following: thepresentation of an architecture allowing for concurrentproof checking, the implementation of a proof checkingkernel for aforementioned architecture, and an evalua-tion of its performance with benchmarks derived fromautomated and interactive theorem provers. I introduce here the lambda-Pi calculus modulo rewriting.Saillard’s PhD thesis presents it in more detail [30].A term 𝑡 ∈ 𝒯 has the shape 𝑡 (cid:66) 𝑐 | 𝑠 | 𝑡𝑢 | 𝑥 | 𝜆𝑥 : 𝑡. 𝑢 | Π 𝑥 : 𝑡. 𝑢, where 𝑐 ∈ 𝒞 is a constant, 𝑠 (cid:66) Type | Kind is a sort, 𝑡 and 𝑢 are terms, and 𝑥 is a bound variable. We denotethe substitution of 𝑥 in 𝑡 by 𝑢 as 𝑡𝑢𝑥 .A rewrite pattern 𝑝 has the shape 𝑝 (cid:66) 𝑥 | 𝑐𝑝 . . . 𝑝 𝑛 ,where 𝑥 is a bound variable, 𝑐 ∈ 𝒞 is a constant, and 𝑝 . . . 𝑝 𝑛 is a potentially empty sequence of rewrite pat-terns applied to 𝑐 . A rewrite rule 𝑟 ∈ ℛ has the shape 𝑟 (cid:66) 𝑐𝑝 . . . 𝑝 𝑛 ˓ → 𝑡 , where we call 𝑐𝑝 . . . 𝑝 𝑛 the left-handside, 𝑡 the right-hand side, and 𝑐 the head symbol of 𝑟 .The free variables of the right-hand side are required tobe a subset of the free variables of the left-hand side, i.e. 𝑖 ℱ𝒱 ar 𝑝 𝑖 ⊇ ℱ𝒱 ar 𝑡 . A signature for a set of constants 𝒞 is a pair 𝑇, 𝑅 ∈𝒞 → 𝒯 × 𝒞 ↛ ℛ , where 𝑇 𝑐 returns the type of theconstant 𝑐 and 𝑅𝑐 returns the set of rewrite rules having 𝑐 as head symbol. Here, 𝑅 is a partial function with theproperty that 𝑐 ∈ dom 𝑅 iff we want to allow the additionof rewrite rules having 𝑐 as head symbol. Restricting therewritability of a constant is useful to assure that it isinjective.A context Γ is a function from variables to terms.For a signature Σ = 𝑇, 𝑅 , we say that Σ ⊢ 𝑐 : 𝐴 if 𝑇 𝑐 = 𝐴 and Σ ⊢ 𝑡 ˓ → 𝑢 if 𝑡 ˓ → 𝑢 ∈ 𝑅𝑐 , where 𝑐 isthe head symbol of 𝑡 . Furthermore, we say that for acontext Γ , Γ ⊢ 𝑥 : 𝐴 if Γ 𝑥 = 𝐴 . We say that the term 𝑡 has the type 𝐴 under the signature Σ if we can finda derivation of Σ , Γ ⊢ 𝑡 : 𝐴 using the rules in Figure 1[adapted from 30, Figure 2.4], where Γ is a context withan empty domain.We can beta-reduce terms via 𝜆𝑥.𝑡𝑢 → 𝛽 𝑡𝑢𝑥 . Addi-tionally, we have a reduction rule 𝑡 → 𝛾 Σ 𝑢 which isadmissible when there is a term rewrite rule Σ ⊢ 𝑡 ′ ˓ → 𝑢 ′ and a substitution 𝜎 , so that 𝜎𝑡 ′ = 𝑡 and 𝜎𝑢 ′ = 𝑢 . To simplify the presentation, I only introduce first-order rewriting.Note that Dedukti uses higher-order rewriting [26].

Type Σ , Γ ⊢ Type : KindApplication Σ , Γ ⊢ 𝑡 : Π 𝑥 : 𝐴. 𝐵 Σ , Γ ⊢ 𝑢 : 𝐴 Σ , Γ ⊢ 𝑡 𝑢 : 𝐵𝑢𝑥

Abstraction Σ , Γ { 𝑥 → 𝐴 } ⊢ 𝑡 : 𝐵 Σ , Γ ⊢ Π 𝑥 : 𝐴. 𝐵 : 𝑠 Σ , Γ ⊢ 𝜆𝑥 : 𝐴. 𝑡 : Π 𝑥 : 𝐴. 𝐵

Product Σ , Γ ⊢ 𝐴 : Type Σ , Γ , 𝑥 : 𝐴 ⊢ 𝑡 : 𝑠 Σ , Γ ⊢ Π 𝑥 : 𝐴. 𝑡 : 𝑠 Conversion Σ , Γ ⊢ 𝑡 : 𝐴 Σ , Γ ⊢ 𝐵 : 𝑠 𝐴 ∼ Σ 𝐵 Σ , Γ ⊢ 𝑡 : 𝐵 Figure 1.

Inference rules.Let → Σ = → 𝛽 ∪ → 𝛾 Σ be our reduction relation. Wesay that two terms 𝑡, 𝑢 are Σ -convertible, i.e. 𝑡 ∼ Σ 𝑢 ,when there exists a term 𝑣 such that 𝑡 → * Σ 𝑣 and 𝑢 → * Σ 𝑣 .Type inference determines a unique term 𝐴 for a term 𝑡 and a signature Σ such that Σ ⊢ 𝑡 : 𝐴 . Type checkingverifies for terms 𝑡 and 𝐴 and a signature Σ whether Σ ⊢ 𝑡 : 𝐴 . If the reduction relation → * Σ is type-preserving,terminating, and confluent, then type inference and typechecking terminate [30, Theorem 6.3.1]. In mathematics, a theory can be seen as a directed graphof definitions, axioms, and theorems. In contrast to this,I consider a theory to be a sequence of commands, whichI will introduce in this section. In the following, I explainhow to verify a theory.To verify a theory, it suffices to treat the sequence ofits commands one by one. The set of constants 𝒞 andthe signature Σ for 𝒞 capture the state of the theoryverification, i.e. the essence of all previously treatedcommands. We start from an empty set of constants 𝒞 = ∅ and an initial signature 𝑇, 𝑅 with

𝑇 𝑐 = ⊥ and dom 𝑅 = ∅ . For every command, we update the constants 𝒞 andthe signature 𝑇, 𝑅 . The theory is verified if this processdoes not fail at any step. I will now define commands,before showing how to update the current state with acommand.A command either introduces a new constant or addsa rewrite rule. A new constant 𝑐 can be introduced inthree different ways: (a) A declaration 𝑐 : 𝐴 introduces aninjective constant 𝑐 of type 𝐴 . (b) A definition introducesa constant 𝑐 with either a type 𝐴 ( 𝑐 : 𝐴 ), a defining term 𝑡 ( 𝑐 (cid:66) 𝑡 ), or both ( 𝑐 : 𝐴 (cid:66) 𝑡 , which should be read as 𝑐 : 𝐴 (cid:66) 𝑡 ). Only constants introduced by definition mayoccur as head symbol of the left-hand side of rewrite The implementations of the calculus optionally eta-reduce termsvia 𝜆𝑥.𝑡𝑥 → 𝜂 𝑡 .2 mall, Fast, Concurrent Proof Checking for the lambda-Pi Calculus Modulo Rewriting Conference’17, July 2017, Washington, DC, USA rules. If 𝑡 is given, then occurrences of 𝑐 in subsequentcommands are replaced by 𝑡 , and if 𝐴 is additionallygiven, then the type of 𝑡 must be convertible with 𝐴 .(c) A theorem 𝑐 : 𝐴 (cid:66) 𝑡 introduces a constant 𝑐 of type 𝐴 with a proof term 𝑡 , where the type of 𝑡 must beconvertible with 𝐴 and occurrences of 𝑐 in subsequentcommands are not replaced by 𝑡 ; that is, 𝑡 is an opaqueproof. Example 2.1.

Consider the following theory about impli-cation: The first two commands declare prop : Type andimp : prop → prop → prop, where 𝑡 → 𝑢 is notation for Π 𝑥 : 𝑡. 𝑢 . The next command defines prf : prop → Type.Having defined (rather than declared) prf allows thenext command to add a rewrite rule for prf, namelyprf imp 𝑥 𝑦 ˓ → prf 𝑥 → prf 𝑦 . The last command then in-troduces a theorem imp_refl : Π 𝑥 : prop . prf imp 𝑥 𝑥 (cid:66) 𝜆𝑥 : prop . 𝜆𝑝 : prf 𝑥. 𝑝 .I now introduce a new concept to abstract from theseveral ways by which constants can be introduced.A typing is a triple 𝑐, 𝐴, 𝑡 ∈ 𝒞 × 𝒯 × 𝒯 ∪ { □ } , asserting 𝑐 : 𝐴 when 𝑡 = □ , otherwise 𝑐 : 𝐴 (cid:66) 𝑡 . We infer atyping from declarations, definitions, and theorems witha signature Σ as follows: (a) A declaration or definition 𝑐 : 𝐴 becomes a typing 𝑐, 𝐴, □ if Σ ⊢ 𝐴 : Kind or Σ ⊢ 𝐴 :Type. (b) A definition or theorem 𝑐 : 𝐴 (cid:66) 𝑡 becomes atyping 𝑐, 𝐴, 𝑡 if Σ ⊢ 𝐴 : 𝐴 ′ for some 𝐴 ′ and 𝐴 ≠ Kind. (c)A definition 𝑐 (cid:66) 𝑡 becomes a typing 𝑐, 𝐴, 𝑡 if Σ ⊢ 𝑡 : 𝐴 and 𝐴 ≠ Kind. We check a typing with Σ by verifyingthat either 𝑡 = □ or Σ ⊢ 𝑡 : 𝐴 . Note that checking atyping can only fail when the typing was inferred froma definition or theorem 𝑐 : 𝐴 (cid:66) 𝑡 .Now we can finally state how to update the currentstate with a new command. That is, given some com-mand, how to update a set of constants 𝒞 and a signature Σ = 𝑇, 𝑅 for 𝒞 to a set of constants 𝒞 ′ and a signature Σ ′ = 𝑇 ′ , 𝑅 ′ for 𝒞 ′ ? For this, we distinguish the two kindsof commands.If the command introduces a new constant 𝑐 , then wefail if 𝑐 ∈ 𝒞 , otherwise we infer and check the typing 𝑐 : 𝐴 (cid:66) 𝑡 with Σ and set 𝒞 ′ = 𝒞 ∪ { 𝑐 } as well as 𝑇 ′ = 𝑇 { 𝑐 → 𝐴 } . If the command does not introduce 𝑐 by adefinition, then 𝑅 ′ = 𝑅 , otherwise we initialise the setof rewrite rules for 𝑐 by 𝑅 ′ = 𝑅 { 𝑐 → ∅} if 𝑡 = □ and 𝑅 ′ = 𝑅 { 𝑐 → { 𝑐 ˓ → 𝑡 }} otherwise.If the command adds a rewrite rule 𝑟 ∈ ℛ with a headsymbol 𝑐 , then we keep 𝒞 ′ = 𝒞 and 𝑇 ′ = 𝑇 . Furthermore,we add the new rewrite rule 𝑟 to the existing rules for 𝑐 by 𝑅 ′ = 𝑅 { 𝑐 → 𝑅𝑐 ∪ { 𝑟 }} . Note that this fails if 𝑐 ∉ dom 𝑅 ;that is, if 𝑐 was not introduced via a definition. To assure termination of type inference and checking, it is neces-sary to assure subject reduction, termination, and confluence ofrewrite rules. However, even a theory verifier that does not verifythese properties can be useful, because many theories add only a

Listing 1.

Structural and physical equality in OCaml. let a = Some(0) inlet b = a inlet c = Some(0) inassert (a = b); assert (b = c); assert (a == b); assert (not (b == c));

Listing 2.

Structural and physical equality in Rust. let a = Rc::new(Some(0)); let b = a.clone(); let c = Rc::new(Some(0));assert!(a == b);assert!(b == c);assert!( Rc::ptr_eq(&a, &b));assert!(!Rc::ptr_eq(&b, &c));

There are two closely related technical subjects relevantto this work: sharing and concurrency.Sharing enables multiple references to the same mem-ory region. We call such references physically equal . Shar-ing and physical equality are exploited in Dedukti; forexample, we consider physically equal terms to be con-vertible. In many garbage-collected programming lan-guages, such as Haskell and OCaml, sharing is implicit ,i.e. members of any type may be shared, whereas in manyprogramming languages without garbage collector, suchas C++ and Rust, sharing is explicit , i.e. only membersof special types are shared. Such special types includeC++’s shared_ptr and Rust’s Rc . To check for physicalequality in Rust, we need to explicitly wrap objects witha type such as Rc (Listing 2), whereas in OCaml, suchwrapping is implicit (Listing 1).One common technique to manage memory of sharedobjects is reference counting : A reference-counted objectkeeps a counter to register how it is referenced. Whenevera reference to an object is created, its counter is increased,and whenever a reference to an object goes out of scope,its counter is decreased. Finally, when an object’s counterturns zero, the object is freed.While reference counting is efficient in single-threadedscenarios, it can pose performance problems in multi-threaded scenarios: When a reference-counted object isshared between multiple threads, its counter has to bemodified atomically , to ensure that multiple concurrentmodifications to the counter do not interfere. Non-atomicmodifications can result in memory corruption (a counter fixed number of rewrite rules. This makes it feasible to establishaforementioned properties for such theories by other means.3 onference’17, July 2017, Washington, DC, USA Michael Färber Thread-safeFast Shared

ArcBox Rc&

Figure 2.

Venn diagram of common Rust pointer typesand their properties.turning 0 despite references still pending) and memoryleaks (a counter remaining greater than 0 despite noreferences left). However, atomic modifications imply asignificant runtime overhead. We call data structuresthat can be safely shared between threads thread-safe .Unlike implicitly sharing languages, explicitly shar-ing languages allow us to minimise concurrency over-head by choosing appropriate types for sharing. In Rust,wrapping objects with different smart pointer typesmarks them as either shareable only within one thread( Rc , i.e. reference-counted), shareable between multiplethreads ( Arc , i.e. atomically reference-counted), or notshareable at all (

Box ). Any of these smart pointer typeshas two out of three properties: thread-safety (

Box , Arc ),sharing ( Rc , Arc ), and performance (

Box , Rc ), see Fig-ure 2. In addition, we have a non-smart pointer type,namely references ( & ), which has all three desideratamentioned above, but requires us to prove that it pointsto a valid object. In summary, for concurrent type checking, we need tocarefully choose our pointer types, as this choice has adirect impact on performance.

Concurrent verification designates the simultaneous veri-fication of different parts of a theory. Following Wenzel’sterminology [37], concurrency can happen at differentlevels of granularity . I distinguish concurrent verificationon the level of theories (granularity 0) and on the levelof commands/proofs (granularity 1). Rust is a memory-safe language, so unlike e.g. C/C++, thecompiler throws an error if we attempt to use a reference pointingto an invalid object. This protects against a large class of memory-related bugs. Wenzel gives yet another level of granularity, namely sub-proofs.However, there is no concept of sub-proofs in Dedukti. theoremfermatgcd congprimessigmapibigops divexp fact permutationnatboolrelationslogicconnectivessttfa

Figure 3.

Theory dependency graph of Fermat’s littletheorem in Matita, encoded in STTfa.

Theory-concurrent verification exploits the fact that wecan divide a theory into smaller theories, as long asthe theory dependencies form a directed acyclic graph.To verify a theory, all of its (transitive) dependenciesmust be verified before. Theories that do not transitivelydepend on each other can be checked concurrently.An example of a theory dependency graph is shownin Figure 3 for a formalisation of Fermat’s little theoremin Matita. The “breadth” of the graph determines themaximum amount of theories that can be concurrentlyverified; for example, for Figure 3 we can verify at mostsix theories concurrently, namely exp , bigops , gcd , cong , fact , and permutation .Theory-concurrent verification can be implementedby launching a verification process for every theory, pro-ducing for every theory a signature that contains thetypings and rewrite rules contained in that theory. Toverify a theory, it is necessary to load the signaturesof the theory’s dependencies. As loading of signaturescomes with some overhead, dividing a theory into smallertheories increases the number of theories that can be ver-ified concurrently, at the cost of the individual theoriestaking longer to verify. Command-concurrent verification allows for the concur-rent verification of commands regardless of the theory mall, Fast, Concurrent Proof Checking for the lambda-Pi Calculus Modulo Rewriting Conference’17, July 2017, Washington, DC, USA Listing 3.

Definition of a term type for parsing. enum

PTerm {Constant(String),Application(Box, Box),} graph. Where the maximum number of concurrently ver-ifiable theories is bounded by the graph breadth, themaximum number of concurrently verifiable commands is bounded by the total number of commands to verify.Where theory-concurrent verification lends itself well toprocesses, command-concurrent verification lends itselfwell to threads, because threads allow for the sharing ofthe signature between concurrent verifications and thusto omit the I/O overhead of loading signatures, whichwould become noticeable if done for every command.However, this comes at the cost of using shareable datastructures for the signature, as I will discuss in section 4.I will focus on command-concurrent verification insection 4, as it is the model that I have implementedfor this work, see section 5. I will evaluate the twoapproaches in section 6.

In this section, I show how to concretely implementthe abstract theory verification procedure introducedin subsection 2.2 such that commands can be verifiedconcurrently.

The central data structure of our proof checker are terms.Let us have a closer look at how they are defined. Seesubsection 2.3 for an explanation of the pointer typesused here.There are several term transformations during proofchecking. It turns out to be beneficial to define multipleterm types corresponding to these transformations.The first of these term types, called

PTerm , is pro-duced by the parser. As parsing does not perform anysharing, it makes sense to make

PTerm unshared, whichwill allow it to be efficiently sent across threads. A sim-plified version of this type, restricted to constants andbinary application, is shown in Listing 3. We can seethat

PTerm represents constants by

String s, which areunshared. Furthermore, in the application, we see that

PTerm is wrapped in the pointer type

Box , which is alsounshared. As a result,

PTerm as a whole is unshared.It turns out that all term types we need can be definedby abstracting over the type of constants C ( String in PTerm ) and the type of term pointers T ( Box in PTerm ). This yields an abstract term type shown in

Listing 4.

Definition of term types for parsing, scoping,and typing, using an abstract term type. enum

Term {Constant(C),Application(T, T),} struct

PTerm(Term>); struct

STerm<'c>(Term<&'c str, Box>>); struct

TTerm<'c>(Term<&'c str, Ptr>>);

Listing 4, which can be instantiated to define

PTerm aswell as two other types,

STerm and

TTerm . STerm (for “scoped term”) is a term type that uses &str , i.e. string references, instead of

String to repre-sent constants. String references are a natural choice herebecause comparing and hashing &str (which are frequentoperations during typing) takes constant time and deref-erencing &str is always guaranteed to yield a string. Furthermore, using &str instead of e.g.

Rc en-sures that freeing a term does not involve modifying anyreference counters for its constants. On the other hand,using &str requires us to ensure that the referencedstrings remain in memory during checking, which can beachieved e.g. by putting every newly introduced constantname into a designated region [19].

TTerm is a term type that is used during typing, i.e.type inference and type checking. As these operationsreduce terms, it is beneficial to allow for the sharing of

TTerm s. However, we would like to use variants of

TTerm that use the efficient, thread-unsafe pointer type Rc whentyping using a single thread and slow, thread-safe pointertype Arc when typing using multiple threads. This canbe achieved by generating two versions of the kernel,where in one,

Ptr is substituted by Rc and in the other, Ptr is substituted by

Arc . This generates one kernelversion for sequential and one for concurrent scenarios. Having defined the basic term types, we can defineother data structures that contain terms. In particular,we define

PCommand and

SCommand to be commands thatcontain

PTerm s and

STerm s, respectively. If we would map constants to, say, integers instead of stringreferences, which have similar performance characteristics, wewould need to manually maintain the invariant that every integerrepresents a valid constant. This can be avoided by using stringreferences, at the price of needing to assure their lifetime. A more elegant and flexible solution might be enabled by theimplementation of generic associated types (similar to higher-kinded types) in Rust, allowing for the definition of an abstracttype

Term .5 onference’17, July 2017, Washington, DC, USA Michael Färber I will now explain how a theory file is handled in practice.To this end, I introduce a procedure building on theone in subsection 2.2. The repeated execution of thisprocedure on a file until its consumption corresponds toverifying the file.This procedure is illustrated in Figure 4, where edgelabels (e.g.

PCommand ) represent temporary data relevantto the current run of the procedure, boxed nodes (thecurrent “File”, the set of constants 𝒞 , and the signature Σ )represent data persisting between runs of the procedure,and unboxed nodes (e.g. “parse”) represent functions.The procedure performs the following: First, we parse a command from the current position of the file andupdate the current position. Parsing yields a PCommand .We then scope the

PCommand , which assures that all un-bound symbols occurring in the

PCommand ’s terms orrewrite rules were previously introduced into the set ofconstants. Scoping yields an

SCommand , in which iden-tical constants are mapped to physically equal objects.After scoping, we perform the procedure outlined in sub-section 2.2, by distinguishing the type of the

SCommand :If the command adds a rewrite rule (the

Rule case), weadd the rewrite rule to the signature Σ . If the commandintroduces a new constant 𝑐 (the Intro case), we add 𝑐 to the set of constants 𝒞 . We then convert any STerm inside the command to a

TTerm , in order to infer andcheck a typing using Σ . Finally, we add the typing to Σ .We are now going to illustrate how the procedure inFigure 4 deals with a sequence of commands. To simplifythe resulting image, let us assume that we only dealwith commands that introduce constants. This resultsin a lattice-like graph shown in Figure 5. This latticeshows the flow of the persistent data structures (“File”, 𝒞 , Σ ) from top to bottom and the flow of temporarydata structures ( PCommand , SCommand , Typing ) from leftto right. To obtain an item at an outgoing arrow, allincoming arrows have to be (transitively) satisfied first;for example, to obtain the second

SCommand , it is neces-sary to first have obtained the second

PCommand and theconstants resulting from the first

PCommand . However,the second

SCommand requires neither the first

SCommand nor the first

Typing .A crucial detail is the special handling of infer andcheck: The signature update is handled uniquely by inferand does not depend on check, which makes it possibleto defer checking. In the remainder of this section, I willshow how to exploit this to parallelise checking.

The lattice in Figure 5 specifies dependencies betweendata; however, it does not impose a particular order of operations. For this, I have selected three orders inFigure 6.The first order in Figure 6a corresponds to single-threaded execution, which is what is implemented in De-dukti. The leftmost parse block has to be supplied witha file, a set of constants, and a signature. It passes thesethree components between its succeeding scope/infer/checkoperations, which have to succeed before a new commandis treated by the next connected block, which is fed withthe updated file, constants, and signature.The second order in Figure 6b corresponds to two-threaded execution, where parsing is separated from therest. Parsing is a natural candidate because it is thelargest operation that can be separated from the otherswithout requiring slow thread-safe data structures. Thisis due to the fact that the output of parsing, namely

PCommand , can be transferred safely between threads dueto being unshared, whereas other data structures, such as

Typing , cannot be transferred between threads withoutoverhead due to being shared. Figure 6b shows that pars-ing is performed at the same time as scope/infer/check,and the outputs of the parse thread (

PCommand s) arepassed to the main thread. Here, communication acrossthreads is indicated by lines ending in arrows.The third order in Figure 6c corresponds to multi-threaded execution scaling to an arbitrary number ofthreads. This cuts the main thread shown in Figure 6binto two parts, namely scope/infer and check. It is safe todo this because the main thread does not depend on theresult of the check thread, other than reporting errorsfrom it. Usually, the check function is more expensivethan the scope and infer functions, so it makes senseto move it to an own thread. However, as discussedbefore, this comes at a cost: Because we move a shareddata structure (

Typing ) between threads, we have touse thread-safe terms in all data structures used by thekernel, including the signature.

Using thread-safely shared data structures implies a con-siderable overhead compared to thread-unsafely sharedones, as I will show in section 6. I will now discuss a fewideas to avoid the usage of thread-safe sharing in thefuture.The most critical point for concurrency is betweenscoping and typing. Typing operations operate on shareddata structures. If we store shared data structures in themain thread, then they have to be thread-safely shared inorder to enable concurrency. This, as shown in section 6,brings a considerable overhead. As the signature itselfusually contains mostly entries whose size is linear inthe size of the input (except for types that are inferredfrom a definition), it seems natural to use unsharedstructures for the signature. However, we then have to mall, Fast, Concurrent Proof Checking for the lambda-Pi Calculus Modulo Rewriting Conference’17, July 2017, Washington, DC, USA File 𝒞 parse scope ? infer check Σ PCommand SCommand IntroRule Typing

Figure 4.

The command processing flow.File 𝒞 Σ parseparse ... scopescope ... inferinfer ... checkcheck ... PCommandPCommand SCommandSCommand TypingTyping

Figure 5.

The introduction command processing lattice.parse scope infer check parse scope infer check · · · (a)

Sequential processing. parse parse · · ·

Parsethread scope infer check scope infer check · · ·

Mainthread (b)

Parallel processing with two threads. parse parse · · ·

Parsethread scope infer scope infer · · ·

Mainthread checkcheck ...

CheckthreadCheckthread ... (c)

Parallel processing with more than two threads.

Figure 6.

Execution strategies. onference’17, July 2017, Washington, DC, USA Michael Färber convert between unshared and shared structures whentyping. Doing this whenever an entry from the signatureis retrieved is quite costly and does not compensate theomitted overhead implied by thread-safe sharing.An alternative is to cache the signature entries for con-stants we have seen during typing of one command, andto reuse the cached shared data structures afterwards.In my experiments, this turned out to be also morecostly than using thread-safe sharing. One future ideais to cache shared signature entries persistently across commands. However, this can significantly increase mem-ory consumption, because in the worst case, all checkthreads keep the whole signature in memory. This couldbe remedied by smarter schemes, such as a cache withfixed size, removing typing info for constants not seen fora certain time. Also, this complicates the structure of theproof checker, because it requires to keep a per-threaddata structure, which requires a more complicated pro-gram structure than the currently used map-reduce styleprocessing.One might be tempted to use only a single, globalsignature and to pass it to all check threads. However,at this point it becomes important to recall that thesimplification we used so far does not consider rewriterules. Rewrite rules can change typing behaviour, so ifwe were to use a single global signature, then adding arewrite rule might influence checking of commands givenbefore the rewrite rule. To prevent such situations, weduplicate the signature whenever we pass it to a checkthread. This is an inexpensive operation, as we use forthe signature a functional hash map that can be copiedwith minimal overhead. Could we go further, by dividing between multiple threadsthe verification of a single command? To explore thisquestion, I parallelised substitution and pattern match-ing, two of the most frequently used operations dur-ing type checking. In particular, due to 𝜎𝑓 𝑡 , . . . , 𝑡 𝑛 = 𝑓 𝜎𝑡 , . . . , 𝜎𝑡 𝑛 , we can calculate multiple 𝜎𝑡 𝑖 in paralleland then merge the results. The problem here is thatsome 𝜎 functions we use are thread-unsafe. In particular,the evaluation of terms uses a 𝜎 that maps variablesto shared lazy terms that are evaluated from a sharedmutable state. Because this 𝜎 is normally not sharedbetween threads (even when multiple check operationsare executed in parallel), the data type of 𝜎 containsfour instances of thread-unsafe types (for sharing, lazyevaluation, again sharing, and mutation). To parallelisethis 𝜎 , it is thus necessary to replace each of these typesby a slower thread-safe version. To parallelise matching,i.e. to check for multiple rewrite patterns in parallelwhether they match a shared lazy term, the same typereplacement as for parallelising substitution is required. As a result, the type checking performance with eitherparallelised substitution or parallelised matching is farbelow the performance of single-threaded execution. I implemented a small, parallel verifier named

Kontroli ,supporting all execution strategies shown in section 4.It is implemented in Rust, a functional programminglanguage that forgoes a garbage collector in favour ofRAII [23].Kontroli tries to satisfy the following properties: ∙ Minimality : The size of a proof checker is relevantfor our trust in its correctness, as a smaller codebase can be more easily evaluated and understoodthan a larger one. Furthermore, as can be observedfrom other systems with small kernels, such asthe proof assistant HOL Light [21] or the theoremprover leanCoP [28], such systems lend themselveswell to modifications. ∙ Concurrency : The kernel of Kontroli is concurrency-agnostic, that is, it does not presuppose its usage ineither concurrent or non-concurrent settings. Thisallows the kernel to perform without overhead in anon-concurrent context, while allowing at the sametime to be used concurrently. ∙ Performance : The theories exported from proofassistants and automated theorem provers can belarge, with several gigabytes not being an exception.It is therefore important to process data efficientlyin order to reduce the waiting time for users. ∙ Safety : Imperative programming languages such asC/C++ often offer performance at the expense of(memory) safety, whereas functional programminglanguages such as Haskell or OCaml often offersafety at the expense of performance. However,for a proof checker operating on large amountsof data, both properties are highly desirable. TheRust programming language is a hybrid betweenimperative and functional programming, offeringboth performance and memory safety. ∙ Compatibility : In order to use the many datasetsavailable for Dedukti, Kontroli aims to be compat-ible with Dedukti.We will now look at the structure of Kontroli, seeFigure 7. Kontroli is divided into two parts: a library anda command-line program, where the library is dividedinto a prekernel and a kernel. The division between theseparts obeys the following principles: The library is thelargest possible part of the type checker that containsonly I/O-free functions, and the prekernel is the largestpossible part of the library that is free of shareablesmart (i.e. reference-counted) pointers. Consequentially, Esperanto for “to check”, “to verify”.8 mall, Fast, Concurrent Proof Checking for the lambda-Pi Calculus Modulo Rewriting Conference’17, July 2017, Washington, DC, USA

General497 LOC Parsing313 LOC Scoping213 LOC Typing642 LOCProgram389 LOCPrekernel Kernel Library

Figure 7.

The structure of Kontroli.263170 65 52 482618 ReductionTypingSubstitutionConvertibilityTermsSharingAPI

Figure 8.

Lines of code of different Kontroli components.the kernel is the smallest possible part of the librarydealing with shareable smart pointers, and the programis the smallest possible part of the type checker dealingwith I/O. The I/O-freeness of the library is verified bythe Rust compiler (using the keyword). Thisallows the library to be used in restricted environments,such as web applications. The fact that the prekerneluses no shareable smart pointers can be syntacticallyestablished. This means that Rust allows us to comeup with and to enforce clear boundaries between thedifferent parts of the type checker.What functionalities does the library provide? Theprekernel implements parsing and scoping of terms andcommands. The kernel implements type inference &checking, lazy evaluation via an abstract machine [2], sub-stitution, convertibility checking, and first-order rewrit-ing, whose respective sizes are shown in Figure 8. Asdescribed in section 4, the kernel is realised as a functor-like module that has a hole for a shared pointer type

Ptr ,allowing us to instantiate the kernel with both thread-safe and thread-unsafe pointer types. The Kontroli kernelconsists of 642 lines of code, whereas the Dedukti kernelconsists of 3470 lines. The Kontroli prekernel consists of526 lines of code, whereas the Dedukti prekernel consistsof 780 lines. Dedukti was obtained from https://github.com/Deducteam/Dedukti , rev. 38e0c57. Kontroli was obtained from https://github.com/01mf02/kontroli-rs , rev. 5fb49d8. Lines of code include neithercomments nor blank lines. I used Tokei 11.0.0 to count the lines for

To achieve its small size, Kontroli omits several fea-tures present in Dedukti: ∙ Higher-order rewriting , that is, rewrite patternsof the shape 𝜆𝑥.𝑝 [26]. ∙ Matching modulo AC [12] ∙ Rewrite rule verification : Several properties of thecalculus demand that rewrite rules added as partof a theory preserve confluence, termination, andtypes. While these properties can be difficult toverify, there are many interesting theories for whichthe rewrite rules are fixed and few, compared tothe total size of the theory. For this reason, itmakes sense to omit the verification of rewriterule properties from the type checker kernel, whileleaving open the possibility to extend the typechecker by appropriate checks outside of the kernel. ∙ Evaluation & Assertion : Dedukti has built-in com-mands to output the weak-head and strong normalform of terms, as well as to check the convertibilityof terms. These functionalities are implemented inthe kernel, although they are not strictly necessaryfor proof checking. We can emulate the convertibil-ity check command in Kontroli: To verify that twoterms 𝑡, 𝑢 : 𝐴 are convertible, it suffices to declareEq : 𝑥 : 𝐴 → Type and eq : 𝑥 : 𝐴 → Eq 𝑥 , beforechecking that eq 𝑡 has the type Eq 𝑢 .Furthermore, Kontroli does not implement severaloptimisations present in Dedukti, one of them beingdecision trees: Decision trees accelerate the matching ofterms in the presence of many rewrite rules [22]. On theother hand, for the theories we consider in this article,decision trees are not strictly necessary for performance. I am now going to evaluate the performance of Deduktiand several execution orders of Kontroli. The evaluation

Kontroli by tokei src/kernel and tokei src/pre , and for De-dukti by tokei kernel and tokei parsing && sed '/ˆ\s*$/d'parsing/*.ml{l,y} | wc -l (summing the results for the lastcommand).9 onference’17, July 2017, Washington, DC, USA Michael Färber was performed on a cluster with 56 2.2GHz Intel HaswellCPUs, 120GB RAM and a 50GB HDD.I evaluate Kontroli and Dedukti on two kinds ofdatasets: ATP and ITP problems. ATP datasets con-sist of theory files that can be checked independently,whereas ITP datasets consist of theory files that dependon each other. Among the ATP datasets, I evaluateproofs of TPTP problems generated by iProver Mod-ulo and proofs of theorems from B method set theorygenerated by Zenon Modulo [9]. For the ITP datasets, Ievaluate parts of the standard libraries from HOL Light(up to finite Cartesian products) and Isabelle/HOL (upto

HOL.List ), as well as Fermat’s little theorem provedin Matita [34]. An evaluation of Coq datasets is unfortu-nately not possible due to its current encoding relyingon higher-order rewriting.I evaluate different configurations of Dedukti and Kon-troli. For Dedukti, I evaluate a configuration runningat most one instance at a time (DK) and one configura-tion running as many instances as possible (DKj), usingtheory-level concurrency, see section 3. For Kontroli, Ievaluate a configuration running at most one thread at atime (KO), one configuration where we parse at the sametime as the rest (KOp), and several configurations using

Arc -shared data structures with a maximum numberof 𝑛 simultaneous check and scope threads (KO 𝑛 ). Forcomparison, I also evaluate a version with simultaneousparsing and scope/infer, but without checking (KOi).This version serves as a lower bound for the runtime ofKO 𝑛 .For the ATP datasets, proof checking is an embarrass-ingly trivial problem because we can check any amountof problems at the same time. Therefore, I use for bothKontroli and Dedukti a configuration that resembles DKj,but I limit the number of maximal running instances to32.All datasets were evaluated ten times and their averagerunning time as well as the standard deviation wereobtained.I now discuss the results for the ITP datasets shown inFigure 9. The single-instance Dedukti configuration, DK,takes the most time across all datasets. Running sev-eral instances of Dedukti (DKj) significantly improvesperformance for those datasets that are divided intomultiple theories, i.e. Matita and HOL Light. Still, onall datasets, the single-threaded Kontroli configuration(KO) is faster than running a maximum number of De-dukti instances in parallel. This can be explained bythe theory graphs of the ITP libraries not being very“broad”, which restricts the effectiveness of Dedukti’stheory-level concurrency. This might be related to Wen-zel’s observation that the standard library of Isabelle wasamong the Isabelle libraries where parallelism yielded the least gain [36]. Parsing and scoping/checking in par-allel (KOp) can improve performance, but as it comeswith some overhead, it may also have the adverse effect,as we can see from the HOL Light dataset. One reasonfor this is the per-command overhead stemming fromusing channels, which becomes dominant when we havea large number of small commands, as is the case for theHOL Light dataset.I now discuss the concurrent configurations. KO1 isalways the slowest of all Kontroli configurations. That isbecause this configuration suffers from the Arc -inducedoverhead, but does not actually run scoping and typingconcurrently. KO1 differs from KOp only in that it uses

Arc instead of Rc . In that sense, KO1 is just a referenceconfiguration that is there to show the overhead of Arc compared to Rc and to serve as a baseline for concurrentconfigurations. Starting from 𝑛 =

2, all KO 𝑛 configu-rations are faster than KOp, and starting from 𝑛 = 𝑛 configurations are the fastest overall. Further-more, for large 𝑛 , the time needed for Kontroli to checka theory converges towards the time needed for Kontrolito only parse and scope the theory. For example, pars-ing and scoping the Isabelle dataset without checkingrequires 406 seconds (KOi), which increases only by twoseconds to 408 seconds (+0.45%) when checking witheight threads in parallel (KO8). In comparison, checkingwith a single thread with simultaneous parsing (KOp)increases total time to 454 seconds (+11.8%), and with-out simultaneous parsing (KO), time further rises to 475seconds (+17.0%).For the ATP datasets shown in Figure 10, we alsohave that Kontroli is faster than Dedukti.In conclusion, on the evaluated datasets, Kontroliconsistently improves performance over Dedukti, bothin sequential and in concurrent settings. The related work can be divided by two criteria, namelysize and concurrency. Work related to small size is mostlyabout proof checkers, and work related to concurrencyis about proof assistants. To the best of my knowledge,this work is the first that combines the two aspects bycreating a proof checker that is both concurrent andsmall.

The type-theoretic logical framework LF is closely relatedto Dedukti, being based on the lambda-Pi calculus byHarper et al. [20]. Appel et al. have created a proofchecker for LF that is similar to this work due to theirpursuit of small size [1]. Their proof checker consists of803 LOC, where the kernel (dealing with type checking,term equality, DAG creation and manipulation) consists mall, Fast, Concurrent Proof Checking for the lambda-Pi Calculus Modulo Rewriting Conference’17, July 2017, Washington, DC, USA . . . . . . . . . . [︀ s ]︀ Matita 0 200 400402346336379446348307294267Runtime [︀ s ]︀ HOL Light 0 200 400491484475454499425425408406Runtime [︀ s ]︀ Isabelle/HOL

Figure 9.

ITP dataset evaluation.0 5 10DKKO 10 . . [︀ s ]︀ iProver 0 200 400430363Runtime [︀ s ]︀ Zenon

Figure 10.

ATP dataset evaluation.of only 278 LOC and the prekernel (dealing with parsing)consists of 428 LOC. The small size of the proof checkeris remarkable considering that it is written in C anddoes not rely on external libraries.LFSC is a logical framework that extends LF by sideconditions. It is used for the verification of SMT proofs,where LFSC acts as a meta-logic for different SMTprovers, similarly to Dedukti acting as meta-logic fordifferent proof assistants [31]. Stump et al. have createda proof checker generator for LFSC that creates a proofchecker from a signature of proof rules [32]. The size ofthe generator is 5912 LOC of C++, and the kernel of aproof checker generated for SAT problems is 600 LOCof C++. Checkers is a proof checker based on foundational proofcertificates (FPCs) developed by Chihani et al. [11]. Un-like Dedukti, which requires a translation of proofs intoits calculus, FPCs allow for the interpretation of theproofs in the original proof calculus (modulo syntactictransformations), given an interpretation for the originalcalculus. The proof checker is implemented in 𝜆 Prolog Obtained from https://github.com/CVC4/LFSC , rev. 11fefc6.Measured with tokei src/ -e CMake* and lfscc --compile-sccsat.plf && tokei scccode.* . and is the smallest work evaluated in this section, con-sisting of only 98 LOC . Where LFSC generates a proofchecker from a signature, Checkers generates a problemchecker from a signature and a proof certificate, dueto relying on 𝜆 Prolog for parsing signatures and proofcertificates. Chihani et al. evaluated Checkers on a set ofproofs generated by E-Prover, which unfortunately doesnot permit a comparison with neither Dedukti nor Kon-troli due to currently not supporting E-Prover proofs.Metamath is a language for formalising mathematicsbased on set theory [25]. There exist several proof veri-fiers for Metamath, one of the smallest being written in308 LOC of Python. Furthermore, Metamath allowsto import OpenTheory proofs and thus to verify proofsfrom HOL Light, HOL4, and Isabelle [10].The aut program is a proof checker for the Automathsystem developed by Wiedijk [39]. It is written in C andconsists of 3048 LOC. It can verify the formalisation ofLandau’s “Grundlagen der Analysis”HOL Light is a proof assistant whose small kernel(396 LOC of OCaml) qualifies it as a proof checker[21]. However, the code in HOL Light that extends thesyntax of its host language OCaml is comparatively large(2753 LOC). Among others, HOL Light has been usedto certify SMT [31] as well as tableaux proofs [14, 24].Checking external proofs in a proof assistant also benefitsits users, who can use external tools as automation fortheir own work and have their proofs certified. Obtained from https://github.com/proofcert/checkers ,rev. 241b3c8. Measured with sed -e '/ˆ$/d' -e '/ˆ%/d'lkf-kernel.mod | wc -l . Obtained from https://github.com/david-a-wheeler/mmverify.py ,rev. fb2e141. Measured with tokei mmverify.py . Obtained from https://github.com/jrh13/hol-light , rev. 4c324a2.Measured with tokei fusion.ml and tokei pa_j_4.xx_7.xx.ml .11 onference’17, July 2017, Washington, DC, USA Michael Färber

Concurrent proof checking is nowadays mostly foundin interactive theorem provers. Early work includes theDistributed Larch Prover [35] and the MP refiner [27].The Paral-ITP project improved parallelism in proversthat were initially designed to be sequentially executed,such as Coq and Isabelle [6]. Among others, as part ofthe Paral-ITP project, Barras et al. introduced parallelproof checking in Coq that resembles this work in thesense that it delegates checking of opaque proofs [7].However, unlike this work, Coq checks the opaque proofsusing processes instead of threads, requiring marshallingof data between the prover and the checker processes.Isabelle features concurrency on multiple levels: Asidefrom concurrently checking both theories and toplevelproofs (similar to Dedukti and Kontroli), it also con-currently checks sub-proofs. Furthermore, it executessome tactics in parallel, for example the simplificationof independent subgoals [36, 37].Like Isabelle, ACL2 checks theories and toplevel proofsin parallel, but differs from Isabelle by automaticallygenerating subgoals that are verified in parallel [29]. Inboth Isabelle and ACL2, threads are used to handleconcurrent verification.

I have presented Kontroli, a small proof checker for thelambda-Pi calculus modulo rewriting. Despite its smallsize, Kontroli allows to verify proofs concurrently, with-out incurring any concurrency overhead when verifyingproofs sequentially. I achieved this by abstracting overthe shared pointer type in the kernel using a functor-liketechnique, which allows to parametrise the kernel bothwith slow thread-safe and fast thread-unsafe pointertypes. Already in sequential mode, Kontroli is faster onall evaluated datasets than Dedukti, and in concurrentmode, Kontroli further significantly reduces checkingtime.The current implementation of Kontroli omits severalfeatures present in Dedukti, such as detailed error report-ing and higher-order rewriting. Despite this, Kontrolican be used to verify a considerable set of theories, in-cluding proof exports from HOL Light, Isabelle, Matita,iProver Modulo, and Zenon Modulo. The limited detailof errors reported by Kontroli makes it a candidate forautomated testing where errors are the exception. Oneparticular application of Kontroli could be in continuousintegration: For example, whenever the kernel of a proofassistant or of an automated theorem prover is changed,one could export the proofs generated by the proof as-sistant or the theorem prover on some corpus and verifyit with Kontroli. Kontroli could also be used to verifyproofs generated during competitions like CASC [33]. In such scenarios, errors detected by Kontroli can thenfurther be diagnosed by Dedukti.This work can serve as a case study for implementingautomated reasoning tools in Rust, whose combinationof absent garbage collection and compiler-verified mem-ory/thread safety currently makes it a unique program-ming language. It is encouraging to see that the resultingproof checker is not only performant, but also conciseand safe. Having a proof checker in Rust simplifies futureconcurrency-related experiments, such as overhead-freeconcurrent verification.

Acknowledgments

I would like to thank François Thiré for the inspirationto write this article. Furthermore, I would like to thankGaspard Ferey, Guillaume Genestier and Gabriel Hon-det for explaining to me the inner workings of Dedukti,and Emilie Grienenberger for providing me with the De-dukti export of the HOL Light standard library. Finally,I would like to thank Thibault Gauthier, GuillaumeGenestier, Emilie Grienenberger, Gabriel Hondet, andFrançois Thiré for their helpful comments on drafts ofthis article.

References [1] Andrew W. Appel, Neophytos G. Michael, Aaron Stump, andRoberto Virga. 2003. A Trustworthy Proof Checker.

J. Autom.Reasoning

31, 3-4 (2003), 231–260. https://doi.org/10.1023/B:JARS.0000021013.61329.58 [2] Andrea Asperti, Wilmer Ricciotti, Claudio Sacerdoti Coen,and Enrico Tassi. 2009. A compact kernel for the calculus ofinductive constructions.

Sadhana

34 (2009), 71–144. https://doi.org/10.1007/s12046-009-0003-3 [3] Ali Assaf. 2015.

A framework for defining computationalhigher-order logics. (Un cadre de définition de logiques calcu-latoires d’ordre supérieur) . Ph.D. Dissertation. École Polytech-nique, Palaiseau, France. https://tel.archives-ouvertes.fr/tel-01235303 [4] Ali Assaf, Guillaume Burel, Raphaël Cauderlier, David Dela-haye, Gilles Dowek, Catherine Dubois, Frédéric Gilbert, PierreHalmagrand, Olivier Hermant, and Ronan Saillard. [n.d.]. De-dukti: a Logical Framework based on the 𝜆 Π -Calculus ModuloTheory. ([n. d.]). [5] Henk Barendregt and Freek Wiedijk. 2005. The challengeof computer mathematics. Philosophical Transactions of theRoyal Society A: Mathematical, Physical and EngineeringSciences https://doi.org/10.1098/rsta.2005.1650 [6] Bruno Barras, Lourdes Del Carmen González-Huesca, HugoHerbelin, Yann Régis-Gianas, Enrico Tassi, Makarius Wenzel,and Burkhart Wolff. 2013. Pervasive Parallelism in Highly-Trustable Interactive Theorem Proving Systems. In

IntelligentComputer Mathematics - MKM, Calculemus, DML, and Sys-tems and Projects 2013, Held as Part of CICM 2013, Bath,UK, July 8-12, 2013. Proceedings (Lecture Notes in Com-puter Science, Vol. 7961) , Jacques Carette, David Aspinall,Christoph Lange, Petr Sojka, and Wolfgang Windsteiger12 mall, Fast, Concurrent Proof Checking for the lambda-Pi Calculus Modulo Rewriting Conference’17, July 2017, Washington, DC, USA (Eds.). Springer, 359–363. https://doi.org/10.1007/978-3-642-39320-4_29 [7] Bruno Barras, Carst Tankink, and Enrico Tassi. 2015. Asyn-chronous Processing of Coq Documents: From the Kernel upto the User Interface. In

Interactive Theorem Proving - 6thInternational Conference, ITP 2015, Nanjing, China, Au-gust 24-27, 2015, Proceedings (Lecture Notes in ComputerScience, Vol. 9236) , Christian Urban and Xingyuan Zhang(Eds.). Springer, 51–66. https://doi.org/10.1007/978-3-319-22102-1_4 [8] Yves Bertot. 2008. A Short Presentation of Coq. In

TheoremProving in Higher Order Logics, 21st International Confer-ence, TPHOLs 2008, Montreal, Canada, August 18-21, 2008.Proceedings (Lecture Notes in Computer Science, Vol. 5170) ,Otmane Aït Mohamed, César A. Muñoz, and Sofiène Tahar(Eds.). Springer, 12–16. https://doi.org/10.1007/978-3-540-71067-7_3 [9] Guillaume Burel, Guillaume Bury, Raphaël Cauderlier, DavidDelahaye, Pierre Halmagrand, and Olivier Hermant. 2020.First-Order Automated Reasoning with Theories: When De-duction Modulo Theory Meets Practice.

J. Autom. Reasoning

64, 6 (2020), 1001–1050. https://doi.org/10.1007/s10817-019-09533-z [10] Mario M. Carneiro. 2016. Conversion of HOL Light proofsinto Metamath.

J. Formalized Reasoning

9, 1 (2016), 187–200. https://doi.org/10.6092/issn.1972-5787/4596 [11] Zakaria Chihani, Tomer Libal, and Giselle Reis. 2015. TheProof Certifier Checkers. In

Automated Reasoning with An-alytic Tableaux and Related Methods - 24th InternationalConference, TABLEAUX 2015, Wrocław, Poland, September21-24, 2015. Proceedings (Lecture Notes in Computer Sci-ence, Vol. 9323) , Hans de Nivelle (Ed.). Springer, 201–210. https://doi.org/10.1007/978-3-319-24312-2_14 [12] Evelyne Contejean. 2004. A Certified AC Matching Algorithm.In

Rewriting Techniques and Applications, 15th InternationalConference, RTA 2004, Aachen, Germany, June 3-5, 2004,Proceedings (Lecture Notes in Computer Science, Vol. 3091) ,Vincent van Oostrom (Ed.). Springer, 70–84. https://doi.org/10.1007/978-3-540-25979-4_5 [13] Denis Cousineau and Gilles Dowek. 2007. Embedding PureType Systems in the Lambda-Pi-Calculus Modulo. In

TypedLambda Calculi and Applications, 8th International Confer-ence, TLCA 2007, Paris, France, June 26-28, 2007, Pro-ceedings (Lecture Notes in Computer Science, Vol. 4583) ,Simona Ronchi Della Rocca (Ed.). Springer, 102–117. https://doi.org/10.1007/978-3-540-73228-0_9 [14] Michael Färber and Cezary Kaliszyk. 2019. Certificationof Nonclausal Connection Tableaux Proofs. In

AutomatedReasoning with Analytic Tableaux and Related Methods - 28thInternational Conference, TABLEAUX 2019, London, UK,September 3-5, 2019, Proceedings (Lecture Notes in ComputerScience, Vol. 11714) , Serenella Cerrito and Andrei Popescu(Eds.). Springer, 21–38. https://doi.org/10.1007/978-3-030-29026-9_2 [15] Frédéric Gilbert. 2018.

Extending higher-order logic withpredicate subtyping: Application to PVS. (Extension de lalogique d’ordre supérieur avec le sous-typage par prédicats) .Ph.D. Dissertation. Sorbonne Paris Cité, France. https://tel.archives-ouvertes.fr/hal-01673518 [16] Georges Gonthier. 2008. Formal Proof—The Four-Color Theo-rem.

Notices of the American Mathematical Society

55 (2008),1382–1393. Issue 11. [17] Thomas C. Hales, Mark Adams, Gertrud Bauer, Dat Tat Dang,John Harrison, Truong Le Hoang, Cezary Kaliszyk, Victor Ma-gron, Sean McLaughlin, Thang Tat Nguyen, Truong QuangNguyen, Tobias Nipkow, Steven Obua, Joseph Pleso, Ja-son Rute, Alexey Solovyev, An Hoai Thi Ta, Trung NamTran, Diep Thi Trieu, Josef Urban, Ky Khac Vu, and RolandZumkeller. 2017. A formal proof of the Kepler conjecture.

Forum of Mathematics, Pi https://doi.org/10.1017/fmp.2017.1 [18] Pierre Halmagrand. 2016.

Automated Deduction and ProofCertification for the B Method. (Déduction Automatique etCertification de Preuve pour la Méthode B) . Ph.D. Disserta-tion. Conservatoire national des arts et métiers, Paris, France. https://tel.archives-ouvertes.fr/tel-01420460 [19] David R. Hanson. 1990. Fast Allocation and Deallocation ofMemory Based on Object Lifetimes.

Softw. Pract. Exp.

20, 1(1990), 5–12. https://doi.org/10.1002/spe.4380200104 [20] Robert Harper, Furio Honsell, and Gordon D. Plotkin. 1993.A Framework for Defining Logics.

J. ACM

40, 1 (1993),143–184. https://doi.org/10.1145/138027.138060 [21] John Harrison. 2009. HOL Light: An Overview. In

The-orem Proving in Higher Order Logics, 22nd InternationalConference, TPHOLs 2009, Munich, Germany, August 17-20, 2009. Proceedings (Lecture Notes in Computer Sci-ence, Vol. 5674) , Stefan Berghofer, Tobias Nipkow, Chris-tian Urban, and Makarius Wenzel (Eds.). Springer, 60–66. https://doi.org/10.1007/978-3-642-03359-9_4 [22] Gabriel Hondet and Frédéric Blanqui. 2020. The New Rewrit-ing Engine of Dedukti (System Description). In , Zena M. Ari-ola (Ed.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik,35:1–35:16. https://doi.org/10.4230/LIPIcs.FSCD.2020.35 [23] Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, andDerek Dreyer. 2018. RustBelt: securing the foundations of theRust programming language.

Proc. ACM Program. Lang. https://doi.org/10.1145/3158154 [24] Cezary Kaliszyk, Josef Urban, and Jiří Vyskocil. 2015. Certi-fied Connection Tableaux Proofs for HOL Light and TPTP.In

Proceedings of the 2015 Conference on Certified Pro-grams and Proofs, CPP 2015, Mumbai, India, January 15-17, 2015 , Xavier Leroy and Alwen Tiu (Eds.). ACM, 59–66. https://doi.org/10.1145/2676724.2693176 [25] Norman D. Megill and David A. Wheeler. 2019.

Meta-math: A Computer Language for Mathematical Proofs . LuluPress, Morrisville, North Carolina. http://us.metamath.org/downloads/metamath.pdf [26] Dale Miller. 1991. A Logic Programming Language withLambda-Abstraction, Function Variables, and Simple Uni-fication.

J. Log. Comput.

1, 4 (1991), 497–536. https://doi.org/10.1093/logcom/1.4.497 [27] Roderick Moten. 1998. Exploiting Parallelism in Interac-tive Theorem Provers. In

Theorem Proving in Higher OrderLogics, 11th International Conference, TPHOLs’98, Can-berra, Australia, September 27 - October 1, 1998, Proceed-ings (Lecture Notes in Computer Science, Vol. 1479) , JimGrundy and Malcolm C. Newey (Eds.). Springer, 315–330. https://doi.org/10.1007/BFb0055144 [28] Jens Otten. 2017. Non-clausal Connection Calculi for Non-classical Logics. In

Automated Reasoning with AnalyticTableaux and Related Methods - 26th International Con-ference, TABLEAUX 2017, Brasília, Brazil, September 25-28, 2017, Proceedings (Lecture Notes in Computer Science, onference’17, July 2017, Washington, DC, USA Michael Färber Vol. 10501) , Renate A. Schmidt and Cláudia Nalon (Eds.).Springer, 209–227. https://doi.org/10.1007/978-3-319-66902-1_13 [29] David L. Rager, Warren A. Hunt Jr., and Matt Kaufmann.2013. A Parallelized Theorem Prover for a Logic with ParallelExecution. In

Interactive Theorem Proving - 4th InternationalConference, ITP 2013, Rennes, France, July 22-26, 2013. Pro-ceedings (Lecture Notes in Computer Science, Vol. 7998) , San-drine Blazy, Christine Paulin-Mohring, and David Pichardie(Eds.). Springer, 435–450. https://doi.org/10.1007/978-3-642-39634-2_31 [30] Ronan Saillard. 2015.

Typechecking in the lambda-Pi-CalculusModulo : Theory and Practice. (Vérification de typage pourle lambda-Pi-Calcul Modulo : théorie et pratique) . Ph.D.Dissertation. Mines ParisTech, France. https://tel.archives-ouvertes.fr/tel-01299180 [31] Aaron Stump, Duckki Oe, Andrew Reynolds, Liana Hadarean,and Cesare Tinelli. 2013. SMT proof checking using a logicalframework.

Formal Methods Syst. Des.

42, 1 (2013), 91–118. https://doi.org/10.1007/s10703-012-0163-3 [32] Aaron Stump, Andrew Reynolds, Cesare Tinelli, Austin Lauge-sen, Harley Eades, Corey Oliver, and Ruoyu Zhang. 2012.LFSC for SMT Proofs: Work in Progress. In

Second Inter-national Workshop on Proof Exchange for Theorem Prov-ing, PxTP 2012, Manchester, UK, June 30, 2012. Pro-ceedings (CEUR Workshop Proceedings, Vol. 878) , DavidPichardie and Tjark Weber (Eds.). CEUR-WS.org, 21–27. http://ceur-ws.org/Vol-878/paper1.pdf [33] Geoff Sutcliffe. 2016. The CADE ATP System Competition -CASC.

AI Magazine

37, 2 (2016), 99–101. https://doi.org/10.1609/aimag.v37i2.2620 [34] François Thiré. 2018. Sharing a Library between Proof As-sistants: Reaching out to the HOL Family. In

Proceedings ofthe 13th International Workshop on Logical Frameworks andMeta-Languages: Theory and Practice, LFMTP@FSCD 2018,Oxford, UK, 7th July 2018 (EPTCS, Vol. 274) , Frédéric Blan-qui and Giselle Reis (Eds.). 57–71. https://doi.org/10.4204/EPTCS.274.5 [35] Mark T. Vandevoorde and Deepak Kapur. 1996. DistributedLarch Prover (DLP): An Experiment in Parallelizing aRewrite-Rule Based Prover. In

Rewriting Techniques andApplications, 7th International Conference, RTA-96, NewBrunswick, NJ, USA, July 27-30, 1996, Proceedings (LectureNotes in Computer Science, Vol. 1103) , Harald Ganzinger(Ed.). Springer, 420–423. https://doi.org/10.1007/3-540-61464-8_71 [36] Makarius Wenzel. 2009. Parallel Proof Checking in Is-abelle/Isar. In

The ACM SIGSAM 2009 International Work-shop on Programming Languages for Mechanized Mathemat-ics Systems (PLMMS). Munich, August 2009 , Gabriel DosReis and Laurent Théry (Eds.). ACM Digital library.[37] Makarius Wenzel. 2013. Shared-Memory Multiprocessingfor Interactive Theorem Proving. In

Interactive TheoremProving - 4th International Conference, ITP 2013, Rennes,France, July 22-26, 2013. Proceedings (Lecture Notes in Com-puter Science, Vol. 7998) , Sandrine Blazy, Christine Paulin-Mohring, and David Pichardie (Eds.). Springer, 418–434. https://doi.org/10.1007/978-3-642-39634-2_30 [38] Makarius Wenzel, Lawrence C. Paulson, and Tobias Nipkow.2008. The Isabelle Framework. In

Theorem Proving in HigherOrder Logics, 21st International Conference, TPHOLs 2008,Montreal, Canada, August 18-21, 2008. Proceedings (LectureNotes in Computer Science, Vol. 5170) , Otmane Aït Mo-hamed, César A. Muñoz, and Sofiène Tahar (Eds.). Springer, 33–38. https://doi.org/10.1007/978-3-540-71067-7_7 [39] Freek Wiedijk. 2002. A New Implementation of Automath.

J.Autom. Reasoning

29, 3-4 (2002), 365–387. https://doi.org/10.1023/A:1021983302516https://doi.org/10.1023/A:1021983302516