[PDF] Smart Proofs via Smart Contracts: Succinct and Informative Mathematical Derivations via Decentralized Markets

Abstract

Modern mathematics is built on the idea that proofs should be translatable into formal proofs, whose validity is an objective question, decidable by a computer. Yet, in practice, proofs are informal and may omit many details. An agent considers a proof valid if they trust that it could be expanded into a machine-verifiable proof. A proof's validity can thus become a subjective matter and lead to a debate, which may be difficult to settle. Hence, while the concept of valid proof is well-defined, the process to establish validity is itself a complex multi-agent problem. We introduce the SPRIG protocol. SPRIG allows agents to propose and verify succinct and informative proofs in a decentralized fashion; the trust is established by agents being able to request more details in the proof steps; debates, if they arise, must isolate details of proofs and, if they persist, go down to machine-level details, where they are automatically settled. A structure of bounties and stakes is set to incentivize agents to act in good faith. We propose a game-theoretic discussion of SPRIG, showing how agents with various types of information interact, leading to a proof tree with an appropriate level of detail and to the invalidation of wrong proofs, and we discuss resilience against various attacks. We then analyze a simplified model, characterize its equilibria and compute the agents' level of trust. SPRIG is designed to run as a smart contract on a blockchain platform. This allows anonymous agents to participate in the verification debate, and to contribute with their information. The smart contract mediates the interactions, settles debates, and guarantees that bounties and stakes are paid as specified. SPRIG enables new applications, such as the issuance of bounties for open problems, and the creation of derivatives markets, allowing agents to inject more information pertaining to proofs.

Full PDF

SSMART PROOFS VIA SMART CONTRACTS:SUCCINCT AND INFORMATIVE MATHEMATICAL DERIVATIONSVIA DECENTRALIZED MARKETS

SYLVAIN CARRÉ ♠∗ , FRANCK GABRIEL †∗ , CLÉMENT HONGLER †∗ , GUSTAVO LACERDA, AND GLORIA CAPANO Abstract.

Modern mathematics is built on the idea that a proof should be be translatable into a formalproof, whose validity is an objective question, decidable by a computer. In practice, however, proofs areinformal, succinct, and omit numerous uninteresting details: their goal is to share insight among a communityof agents. An agent considers a proof valid if they trust that it could (in principle) be expanded intoa machine-veriﬁable proof. A proof’s validity can thus become a subjective matter, possibly leading to adebate; if agents’ incentives are not aligned, it may be hard to reach a consensus. Hence, while the concept ofvalid proof is well-deﬁned in principle, the process to establish a proof’s validity is itself a complex multi-agentproblem.In this paper, we introduce the SPRIG protocol, which allows agents to propose and verify succinct andinformative proofs in a decentralized fashion; the trust is established by agents being able to request moredetails at steps where they feel there could be problems; debates, if they arise, need to isolate speciﬁc detailsof proofs; if they persist, they must go down to machine-level details, where they can be settled automatically.A structure of fees, bounties, and stakes is set to incentivize the agents to act in good faith, i.e. to not publishproblematic proofs and to not ask for trivial details.We propose a game-theoretic discussion of SPRIG interactions, illustrating how agents with diﬀerenttypes of information interact, leading to a veriﬁcation tree with an appropriate level of detail, and to theinvalidation of problematic proofs, and we discuss resilience against various attacks. We then provide anin-depth treatment of a simpliﬁed model, characterize its equilibria and analytically compute the agents’level of trust.The SPRIG protocol is designed so that it can run fully autonomously as a smart contract on a decentral-ized blockchain platform, without a need for a central trusted institution. This allows agents to participateanonymously in the veriﬁcation debate, being incentivized to contribute with their information. The smartcontract mediates all the interactions between the agents, and settles debates on the validity of proofs, andguarantees that bounties and stakes are paid as speciﬁed by the protocol.SPRIG also allows for a number of other applications, in particular the issuance of bounties for solvingopen problems, and the creation of derivatives markets, enabling agents to inject more information pertainingto mathematical proofs. Introduction

Mathematical Proofs.

Mathematical derivation, also sometimes called logical reasoning, rigorousderivation, formal rational reasoning, or mathematical proof is a process that allows one to derive mathemat-ical statements from other mathematical statements. By relying on a collection of statements accepted tobe fundamentally true, called axioms, this mechanism allows one to derive new mathematical truths, calledproven statements. Depending on the context, such proven statements are also called propositions, theorems(when they are deemed interesting), or lemmas (when they are ancillary in the derivation of theorems); thederivation leading to a statement (starting from another statement, assumed or already established to betrue) is called its proof.This form of reasoning is at the heart of rational thinking (in mathematics, all the sciences, and waybeyond), crucially leading to: • One’s trust in the truth of statements derived. • One’s insight into the reasons why such statements hold true.These two aspects of the question are discussed in Sections 1.1.1 and 1.1.2 below.

Date : 4 February 2021. ∗ Equal Contribution. ♠ Higher School of Economics, International College of Economics and Finance. † École Polytechnique Fédérale de Lausanne, Institute of Mathematics, Chair of Statistical Field Theory. a r X i v : . [ c s . G T ] F e b .1.1. Proofs as a Means of Trust.

In many ways, mathematically proven statements achieve the highestpossible level of certainty one can have. The validity of an established theorem (say, for instance Euclid’stheorem on the existence of an inﬁnite number of primes) does not ﬂuctuate with the evolution of knowledge.Conversely, for statements for which no known proof exists, the trust in their validity only grows progressivelywith empirical or heuristic evidence support, and it never quite reaches that of mathematically provenstatements.Once the trust in a statement is established via a proof, the statement can be used as a basis for establishingtrust in new statements: the proofs of all the statements obtained this way can (if needed) be ‘unrolled’ downto the axioms. Hence, by propagation, various agents are able to build together a set of trusted statementsrelying upon each other (a ‘tower of knowledge’) without necessarily knowing all the details of all the proofs.This is the way modern mathematics is built.More recently, due to the development of computer technologies, proofs have become fundamental notonly for the construction of mathematics and science, but they have also become objects manipulated by e.g.cryptography or program veriﬁcation systems. Most digital interactions are now mediated by cryptographicprimitives, which aim in particular at establishing trust in conﬁdentiality and authenticity: for instance aparty can prove their identity by providing a digital signature (more precisely, the mathematically provenstatement is: either the party knows a secret encryption key, or they are extremely lucky, or a certain widelybelieved algorithmic hardness assumption is in fact wrong). Similarly, for critical systems, there is often aneed for a proof that a piece of code meets some speciﬁcation, such as termination or type safety.Of course, in any case, the trust in a mathematically proven statement relies on the trust that the under-lying proof is indeed correct, i.e. that: • The derivation rules are clearly deﬁned, computable and consistent. • They are applied uniformly throughout the proof of the statement, as veriﬁed by computers or othermathematicians. • The premises and axioms underlying the reasoning can be trusted.These issues and the underlying challenges are brieﬂy discussed in Section 1.2 below.1.1.2.

Proofs as a Means of Explanation and Insight.

Arguably more important to most mathematicians’minds than the idea that a mathematical derivation answers the question ‘how do we know this statement istrue?’, is the resulting insight that it provides into the nature of the underlying problem (see e.g. [Thur94]).For instance, there are famous statements such as Goldbach’s conjecture that are already widely believedto be true (as supported by various heuristics and numerical veriﬁcations), despite being unproven. Findinga proof of Goldbach’s conjecture would be considered a breakthrough not so much because it would tell usthat it is true (which would hardly surprise anyone), but because it would tell us why, and because it maygive some new deep insight into the nature of prime numbers.The idea that proofs give insight is arguably a central driving force behind mathematical teaching andexposition: it motivates the great eﬀort that goes into the presentation of theorems’ proofs. Similarly, thework towards ﬁnding new, simpler, more intuitive or just diﬀerent proofs of existing theorems is greatlyvalued: multiple proofs of the same statement can bring insights on various issues, a complementary value[AiZi10], or enhance one’s intuition of a result.Conversely, certain proofs appear to bring little insight because of their complexity (for instance, computer-generated proofs such as the one of the four-color theorem [ApHa77, Thur94]). While the trust in such proofsis very high, they are considered somewhat unsatisfactory by many mathematicians, as they cannot becomprehended as well as more ‘elegant’ proofs.Often, insightful proofs appear to be centered around a limited number of new ideas. In fact, this seemsto be how educated agents should convince each other: convey credence about statements through thetransmission of a limited amount of relevant information.A balance between this idea of proof and that of Section 1.1.1, i.e. between ‘a short collection of insightfulstatements’ and ‘the list of all the statements needed to establish perfect trust’, is in principle possible,though somewhat delicate, as discussed in Section 1.2 below.The SPRIG protocol allows, for a system of agents with various levels of interest and information, to reachthis balance.1.2.

Nature of Mathematical Derivations.

In this subsection, we describe the modern view of the ideaof mathematical derivation, as it emerged at the beginning of the 20th century, that now serves as the basis ofll contemporary mathematics. This view also lies at the heart of all computer-based proof systems (Section1.3) and has, in recent years, faced a number of practical challenges (Section 1.4 ).1.2.1.

Hilbert’s Program and Logicism.

The clariﬁcation of the foundations of mathematics, as advocatedby Hilbert’s program, progressed greatly in the early 20th century. One key step was the idea of logicism[Korn60]: that mathematical statements should be written in a formal language (or unambiguously translat-able into one), and that a mathematical derivation ought to consist of a sequence of well-deﬁned manipulationsof such statements. This naturally connected to the then-emerging ﬁeld of computer science: mathematicalderivations, written in the appropriate language, ought to be veriﬁable by a computer program.These new foundations led to two key developments: • On the one hand, the emergence of the study of valid formal mathematical derivations as centralobjects, as a ﬁeld in itself (namely proof theory), primarily associated with mathematical logic andtheoretical computer science (see Section 1.2.2 below); • On the other hand, the emergence of modern mathematics, with more rigorous and standardizeddeﬁnitions, theorems, and proofs, implicitly relying on the new foundations (see Section 1.2.3 below).Interestingly, despite immense progress in computer technology, these two developments have only seen littleinteraction. Arguably, this is due to practical (rather than fundamental) reasons, and recent challenges anddevelopments suggest that this has led to a somewhat unfortunate situation (see Section 1.3 below). Theobjective of the present paper is indeed to present a new way to make this interaction practical and fruitful.1.2.2.

Logic View of Mathematical Derivations. "The development of mathematics toward greater precision has led, as is well known, to theformalization of large tracts of it, so that one can prove any theorem using nothing but a fewmechanical rules". – K. Gödel.The controversies of the late 19th century led to convergence on the foundational theory of mathematics,by which the disagreements could be resolved. Ideas of Frege, Hilbert, Gödel, Turing and others, led to adeﬁnition of formal proof, connected with the notion of computer (itself deﬁned in terms of Turing machines;see e.g. [Wigd19] for a modern account).

Deﬁnition 1.

A formal proof or formal derivation, or machine-level proof, or proof-object is a ﬁnite sequenceof sentences in a formal language, each of which is an axiom, an assumption, or follows from the precedingsentences in the sequence, by a rule of inference. The validity of the application of the rules of inference canbe checked by a computer.In practice, most of modern mathematics, by convention, relies on a standard set of axioms (based onthe Zermelo-Fraenkel set theory, with some version of the Axiom of Choice), and on higher-order logic. Anumber of computer proof systems implement such a framework (see Section 1.3.1 below).Besides the important clariﬁcations that they bring, the strength of formal proofs is that the veriﬁcation oftheir validity is completely mechanical. As a result, they can be checked reliably by computers. In principle,computers can also then be used to try to produce proofs, as was already suggested by Gödel in his LostLetter to Von Neumann [Lipt10]. The use of a formal language also allows in principle various areas ofmathematics to communicate with each other, by allowing them to inter-operate unambiguously.Unfortunately, formal proofs are extremely long in practice [Wied03] and hard to produce, even withmodern computer proof assistants (see Section 1.3.1 below); and most importantly, they are often quitediﬀerent from the way mathematicians think of proofs (a formal proof may bring little insight to a math-ematician). Still, the stronger rigor of such proofs has inﬂuenced the shaping of modern mathematicalderivations as mathematicians use them today, as described in Section 1.2.3 below: they must stay ‘in theback of mathematicians’ minds’, as they write and communicate their proofs, something that the SPRIGprotocol introduced allows us to formalize.1.2.3.

Modern Mathematical Derivations in Practice.

The core of mathematical activity has seen little changeover the last century, and the focus of mathematics has shifted away from formal veriﬁcation and foundationalissues, in favor of introducing new objects, discovering new ideas, and solving interesting problems. Atthe same time, mathematics still emphasizes strict rigor: for instance, heuristic arguments or numericalsimulations, however convincing, are not accepted as being parts of a mathematical derivations and proofs,while for instance in theoretical physics derivations, they are often deemed suﬃcient.he following deﬁnes what it means for a mathematician to know the proof of a statement:

Claim . A mathematician knows how to prove a statement rigorously, if they have the conﬁdence in thefollowing: given access to a corpus of references, they would be able, if pressed and given enough time, togive details at arbitrarily high level in the proof of each statement, down to a computer-checkable formallevel if needed.The working deﬁnition of ‘a proof’ that a modern mathematical text uses can be phrased as follows:

Claim . A written proof consists of • A Proof Sketch P = D , S , . . . , S k , consisting of a collection of deﬁnitions and references D and alist of statements (lemmas, propositions, theorems, remarks) S , . . . , S k using symbols in D , whereeach S j is allowed to assume that S i holds true for i < j . • A text in free format F (including proof arguments, drawings, informal explanations, etc.).such that it is claimed that the proof is complete and valid in the eyes of mathematicians (of the givenaudience), in the following sense: Claim . A written proof ( P , F ) of a statement S , with P = D , S , . . . , S k is considered complete and validin the eye of a mathematician if, by using using the text from F and standard mathematical knowledge ifneeded, they know how to prove: • for each j = 1 , . . . , k the statement S j assuming (if needed) S , . . . , S j − ; • the statement S from the statements S j , where j = 1 , . . . , k . Remark . Another way to phrase the structure of the proof sketch P is to say that the statements S , . . . , S k , S form a directed acyclic graph of dependence, with root S (where a statement points to thestatements it assumes). The order in which the parts P are presented in a paper may not follow the orderhere, but the vertices of any directed acyclic graph can be ordered so the vertex i → j implies i > j . Remark . In mathematical papers, deﬁnitions-statements may appear (for instance deﬁning Riemann’s ζ function may require a proof of convergence); see Section 2.2.1 for a discussion of how such deﬁnitions-statements can be recast in the format of proof sketches above. Remark . Proofs by contradictions can be written in the proof sketch format as above (see Section 2.2.1).The free part F of a proof is what is sometimes called a Social Proof [Buss98], and the proof sketch part P should be directly translatable into a collection of formal statements, sometimes called a Formal ProofSketch [Wied03]. In this article, we will consider that the proof sketches are always formal. It should benoted that in mathematical articles, these two parts are sometimes not clearly separated; also, in some cases,the entire proof is a social proof.1.2.4. Amount of Detail in Proofs.

As explained in Section 1.2.3 above, ‘being convinced by a proof givenas a mathematical text’ means: knowing enough to be conﬁdent that one would be able to produce all thedetails, if needed, while at the same time knowing that this most likely will not be needed. Indeed, it wouldbe so energy-consuming and uninformative that there would be no point in doing it. In the language ofstructured proofs above, every mathematician must build their own structured proof at a level of detail thatthey deem satisfying.Being a mathematician hence crucially requires a great deal of self-discipline, in order not to delude oneselfinto thinking that one knows how to prove a statement. How to be conﬁdent in one’s ability to perform atask (providing machine language proof of new results), that one will (in all likelihood) never perform?Moreover, what constitutes a complete proof (in the sense of Section 1.2.3) becomes as a result somewhatsubjective, depending on the reader’s standards: the amount of detail required from a student at an examwill typically be very diﬀerent from the level of detail in a research paper.For research papers, it will depend on the subﬁeld and the journal and editorial standards: a debatebetween the authors and referees can arise, in which the editor is the arbiter.In any case, determining the relevant amount of detail to provide in a mathematical paper is a diﬃculttask, which requires a delicate balance between the need to guarantee the validity of the reasoning, to limitthe pre-requisites and the work on the reader’s side, to stay within page limits, to avoid unnecessary clutter,and to keep only the essential arguments.ithin these constraints, there is a lot of room for subjective choices and writing modern mathematicalproofs is largely an art, as discussed in e.g. [Lamp95, Lamp12]. With the increasing complexity of proofsinvolved in contemporary mathematics, this has led to a number of challenges, as discussed in Section 1.4below. At the same time, the development of computer-based proof systems oﬀers great promises to helptackle such challenges.The goal of SPRIG protocol is to unify these two visions of formal proofs, as veriﬁed by computers,and of informal proofs as done by mathematicians, to leverage the advantages of both (trust and insight,respectively): SPRIG will allow the agents to inject various levels of information to reveal a subtree of theproof tree as in Section 2.2.1 below.1.2.5.

Structured Proofs.

An interactive way of viewing proofs as in Claim 3 is the following: instead ofasking to be able to write down a proof of the formal statements in the machine-level language directly, onecan think of being able to answer requests for formal details for each of the statements (assuming perhapsan audience that is more and more curious into the details); and in our answer, if any requests for moredetails arises, one should again be able to provide them. This way, one should eventually be able to reach (ifneeded) the machine level after a reasonable number of steps, using a reasonable amount of space, keepingan informative structure (and abstracting away the free part F of the proof).This leads one to the following deﬁnition of structured proof (written in a formal language), upon whichthe proof format of our protocol is based (see Section 2.2.1). A structured proof of a statement S ∗ of level L ≥ consists in a tree with the following structure: • The root (‘top-level’) is the statement S ∗ : A ∗ = ⇒ C ∗ itself, where A ∗ includes axioms and acceptedstatements used to derive the conclusion C ∗ . • For each non-leaf (‘high-level’) statement S : A = ⇒ C , its children ( S j ) j =1 ,...,k are statements A j = ⇒ C j , where A j is of the form A j = A ∪ { C i for i ∈ I j } for some I j ⊂ { , , . . . , j − } , and where C k = C . • For each leaf (‘low-level’) statement S : A = ⇒ C , a machine-veriﬁable proof is provided. • The tree height (distance between the root and leaves) is at most L ≥ . Remark . The idea of structured proof in our paper is very close to structured proofs suggested by Lamport[Lamp95, Lamp12]; the diﬀerence is that we make the level more explicit. Ideally, a proof tree should be wellbalanced (not too deep, and at the same time with moderately large degree, with fairly short statements),and highest levels should be the most interesting to experienced mathematicians.The SPRIG protocol is based on the assumption that valid known mathematical derivations can be struc-tured with well balanced trees (in principle, as the complete tree is as large as a machine-level proof), and thatthe existence (or non-existence) of a complete structured proof tree can be determined with high conﬁdenceby a mathematician only knowing a small subset of the tree (which may depend on the level of informationof the mathematician).The veriﬁcation of a proof in a debate (between a teacher and a student, or a reviewer and an author1.4 below) works largely with the idea of balanced tree: a teacher may ask a student to produce the highestlevels of the tree, to reach the conﬁdence that the student would know how to provide the rest of the tree (ifgiven enough time).As discussed in Section 1.2.4 above, determining the relevant amount of details to provide in a publishedproof is delicate and somewhat subjective. This paper aims at explicitly taking this subjectivity into account,by considering a system of agents with various levels of information and conﬁdence in their ability to ﬁll inthe details of a proof, and proposing a protocol by which such agents can exchange information.1.3.

Recent Developments in Computer-Based Proofs.

The second half of the 20th century saw theexplosion, both theoretical and practical, of computer science. As discussed in Section 1.2.2, the modernnotion of formal proof leads to the idea such proofs are machine-veriﬁable, and as a result, we use formalproofs and machine-level proofs as synonyms. This has led to a desire to see a formalization of mathematicsin computer-checkable terms (see e.g. the QED Manifesto [Anon94]). At this point, this wish remains largelyunfulﬁlled, and most modern mathematics has not beneﬁted from the progress in computer-based proofs.In this subsection, we discuss key features of such systems, and the associated challenges..3.1.

Computer-Assisted Proofs.

The idea that computers could (and perhaps should) verify the validityof proofs goes back at least to Gödel’s Lost Letter to Von Neumann [Lipt10]. It is extremely natural: inprinciple, any modern mathematical derivation can be translated into a sequence of formulae, which areprogressively derived by applying speciﬁc rules (which we will call logic system); specifying the formulae andthe applied derivation rules thus constitutes a computer-checkable proof, sometimes called proof object (seeSection 1.3.2 below).Computer-assisted derivations have yielded a number of successes in mathematics, and have been instru-mental for the proofs of celebrated conjectures, which involve dealing with a large number of cases separately: • Famously, computer-assisted proofs were instrumental to the ﬁrst proof of the four-color theorem[ApHa77, AHK77]; a fully machine-veriﬁed proof was given later [Gont08]. • The Kepler conjecture about sphere packings was established by a machine-checked proof [Hal+17]In addition to enabling proofs that are too hard for humans to write and check, computer-assisted proofsare also important in the ﬁeld of software veriﬁcation, where they allow one to guarantee that functions of aprogram will behave according to speciﬁcation.As a result of their appeal (as trusted elements of knowledge, as objects that computers can sometimesproduce better than humans), a desire to formalize mathematics has grown over the years. SPRIG is basedon the idea that to convey succinct, informative and trustable proofs, entire proofs are not necessary: only asubset that is relevant to the agents exchanging information is needed. As discussed in Section 1.3.2, manyproof systems exist; the rest of this paper is based, for concreteness, on one of them, which is particularlyreadable by mathematicians.1.3.2.

Computer Proof Systems.

A number of computer-based systems such as Mizar, TLA+, Isabelle/Isar,Coq, Metamath, HOL, or Lean enable and facilitate the writing of formal proofs. These systems rely onsome basic low-level language: proofs written in this language are called proof objects, and they are what thecomputer ultimately checks the validity of. At the same time, the users of such systems usually work witha higher level language, called user-level proof, which are usually much shorter (yet still very long comparedto proofs used by mathematicians [Wied12]).This article is largely agnostic on the speciﬁc choice of computer-system, but could be implemented easilyin so-called declarative systems such as Mizar or Isabelle/Isar.A user-level proof consists of a sequence of statements (which include equivalent of high-level mathematicalproofs, with deﬁnitions, proof steps, justiﬁcations, sub-statements, cases, etc.) written in a formal language.The proof is called complete when the system is able to validate each of the justiﬁcations for the steps. Whilesomewhat tedious to write and to read, formal proof sketches, which are valid proofs in which a number ofsteps have been removed, are particularly easy to understand by mathematicians, as pointed out in [Wied03](see also Section 1.2.3 above).In languages such as Isar, reading and writing simple statements or deﬁnitions (as opposed to writingcomplete justiﬁcations) is relatively easy. As a result, a mathematician can be expected to be able todetermine if a simple statement deemed to have a short proof does indeed have one, without having to tryto write it; this idea is at the heart of SPRIG.1.3.3.

Interactive Proofs.

We conclude this subsection by discussing the development of a completely diﬀerenttake on proofs induced by the algorithmic reduction of proofs, motivated in particular by cryptographicapplications: that of interactive proof protocols. An interactive proof protocol allows a prover to demonstrateher knowledge of a proof to a veriﬁer, through her ability to answer the veriﬁer’s questions.The cornerstone of interactive proof checking relies based on computational complexity theory and NP-completeness (see e.g. [Wigd19] for a modern account): for any statement S in a formal logic system Λ , thestatement ‘ S has a proof of length ≤ L in Λ ’ can be reduced to the statement ‘ G is 3-colorable’ for a certaingraph G computable in polynomial time in terms of S , L , and Λ ; proofs of length ≤ L of S in Λ then arein one-to-one correspondence with 3-colorings of G (i.e. assignments of colors to the vertices of G such thatany pair of adjacent vertices has diﬀerent colors). Note that we focus on 3-colorings for concreteness, but anyNP-complete problem would do the job as well. Through the prism of computational complexity, a provercan demonstrate her knowledge of a proof of length ≤ L of a statement S by demonstrating her ability tocolor the corresponding graph G .This view of proofs allows in particular for a number of interesting applications (see e.g. [Wigd19]): Probabilistically checkable proofs: the prover may be able to map her 3-coloring problem to a 3-coloring problem with ampliﬁed gap, and to use it to convince a skeptic of her knowledge of a proof(of length ≤ L ) via a limited number of interactions (independent on the proof length); in essence, ifthe prover bluﬀs in her answers to questions, she will be caught with high probability. • Zero-knowledge interactive proofs: by relying on cryptographic primitives, the prover may be able todemonstrate her knowledge of a proof (of length ≤ L ) without divulging anything about the proofitself.The view of proofs underlying the ﬁeld of interactive proofs is at odds with that of theorem proving inmathematics: the proofs in that world yield little, if any, insight to mathematicians (as discussed in Section1.1.2 and 1.2.2 above) into why proven theorems are true (and for instance, in the case of zero-knowledgeproofs, the goal is to give zero information about the proof).Still, the idea of establishing trust via a few interactions is very appealing for proof veriﬁcation in mathe-matics. In light of this question, SPRIG aims at allowing agents to communicate both trust and insight toeach other, via a limited number of interactions.1.4. Challenges in Modern Mathematical Derivations.

In mathematical practice, the amount of detailneeded to assess a proof’s validity is usually decided by a peer-review refereeing process, whose goal is alsoto determine how interesting the results and insights are. Usually, this happens within the context of apublication by a journal, in which a small number of independent experts assesses the validity of a proof (i.e.whether enough details are provided to transmit them the conﬁdence that the proof is correct, in the senseof Claim 2 above). In the case of higher-proﬁle results, validation also comes from the larger community,where all experts of the ﬁeld may discuss the results, identify weaknesses, and exchange comments with theauthors.In this context of ‘high-level’ proofs, the last decades have seen a number of developments that havecreated additional challenges, in particular: • The inﬂation in the complexity of (published) mathematical proofs makes their validation morediﬃcult (see Section 1.4.1 below). • The boundary conditions of the process (see Section 1.4.2). • The alignment of various external incentives with the ones of the validation process (see Section1.4.3).In principle, as suggested in Section 1.2 above, all of these challenges are of a purely practical nature: givenenough competent, reliable, properly incentivized experts, these challenges would not exist; or, alternatively,if all proofs were easy to write down in a format veriﬁable by a computer (as discussed in Section 1.3), therewould be no need for expert veriﬁcation. As discussed in the following subsections, however, these challengesare very real today; it is the aim of SPRIG protocol allowing to help tackle them.1.4.1.

Complexity of Proof Validation.

Checking the validity of mathematical derivations is a time-intensivetask, which naturally depends on the length of the result as published, and on the time it takes a referee tocheck the details (i.e. explicitly or implicitly ﬁlling in the blanks to convince oneself of its validity). Over thelast 100 years, the complexity of published mathematical proofs has grown signiﬁcantly: • The average lengths of mathematical proofs has grown: for instance, the average length of a paperin Annals of Mathematics in 1950 was less than 17 pages, but in 2020 it was more than 58. • The papers, in turn, usually rely on larger and larger bodies of work, and proofs are now rarelyself-contained. • Some proofs are split into many mathematical articles: for instance, the classiﬁcation of the ﬁnitesimple groups consists of tens of thousands of pages in several hundred journal articles, publishedover a 50-year period.As a result of this inﬂation in complexity, the process of validating a proof has increased in diﬃculty. Somedocumented, high-proﬁle, examples are: • The Jacobian Conjecture: it has seen a large number of claims of proof in the 20th century, whichhave survived for a number of years, before being invalidated, and standing as an open problem[Wiki21]. • Poincaré’s Conjecture: after a number of incorrect claimed proofs were proposed throughout the 20thcentury, a collection of papers was published by Perelman in 2002–2003, which led to a validation byhe Clay Institute in 2006 following many debates, including the publication of a number of papers,some of which ﬁlled in details, of which some claimed to be the ﬁrst complete proof of the conjecture[NaGr06, Szpiro08]. • Hilbert’s 16-th Problem: currently an open problem, for which many attempted solutions have beenproposed, some of which took decades to be invalidated [Ilya02]. • The ABC Conjecture: a solution has been proposed, which is considered wrong or incomplete by asigniﬁcant number of experts, but at the same time considered correct by a signiﬁcant number ofexperts [Cast20]. • The classiﬁcation of ﬁnite simple groups was announced as completed in 1983. Yet a number of gapshave been found over the years, which were ﬁlled over the following decades [Solo01].In a number of high-proﬁle cases, the underlying debates have taken years to settle. In principle, such debatesshould not last: it should be enough to provide a computer-veriﬁable proof 1.3; in practice, the sheer lengthof the relevant proofs in machine-level language makes such a task daunting. As discussed in Section 1.4.2,this poses a number of problems in terms of the boundary conditions of the process.SPRIG aims at enabling various agents (including computers) with diverse areas of expertise to collaboratein the reviewing process and in the writing of the proofs’ details.1.4.2.

Boundary Conditions of the Reviewing Process.

The reviewing process, as performed by a journal or acommunity as a whole, involves a number of delicate boundary conditions, which are usually not formalized: • The reviewing process involves matching authors and expert reviewers. Picking experts may be adiﬃcult task for a journal editor; in the case of public debates, for the community to decide whomto listen to requires the build-up of a consensus. • The interaction protocol between authors and reviewer is not formalized. Should the authors orreviewers not act in good faith or have drastically views of what a complete proof means, the processwill stall: – In principle, there is nothing that prevents reviewers from nitpicking or claiming they don’tunderstand some parts, and hence of deeming a correct proof incomplete: in some sense, a proofis indeed incomplete until it is completely written down in a machine-veriﬁable format. – Dually, an author whose proof is too vague or incomplete (or possibly void) may keep addingirrelevant details that do not address the heart of the issue, or claim there is nothing importantto add and that the reviewers are nitpicking on trivialities. – Since going down to the machine level is not a feasible option; for a journal, the editor ends upbeing the ultimate arbiter; when the whole community discusses the issue, a consensus forms(or doesn’t form). • The dual role of reviewers: they are expected to emit at the same time, a judgement on the validityand the interest of the result. • Ultimately, there is no speciﬁcation as to which of the ‘wrong until proven correct’ (the machine-levelproof standard) or ‘correct until proven wrong’ principles prevails in case of disagreement, and inwhich timeframe, i.e. where the burden of proof lies.While the above are theoretical weaknesses of the protocol of the reviewing process, they are not necessarilyproblematic in practice if the various agents work constructively towards aligned goals (such as uncoveringmathematical truths, acting in good faith, etc.). But this is no longer the case as soon as conﬂicts of interestexist: see Section 1.4.3.In light of the boundary conditions problem, SPRIG allows one to rely on computer-based proof veriﬁcationalgorithms (as in Section 1.3) as the ultimate arbiters, and to enforce explicit and transparent time constraints.1.4.3.

Alignment of Incentives.

The functioning of the reviewing process involves a number of agents, whoseidentities may or may not be known; for high-proﬁle proofs, the whole community may end up being involved.As discussed in Section 1.4.2, the boundary conditions can in principle be the source of problems; this canin particular be the case if there is misalignment between the goal of thorough and quick validation and theagents’ objectives.Arguably, a large part of the incentives underlying the reviewing process are implicit, rather than explicit(namely: desire to discover the truth, to be intellectually honest, to participate in the good functioning ofthe community, to be respected as an expert, etc.). However the agents’ strategies may be also inﬂuenced byhe presence of various external incentives whose alignment with the reviewing goals is unclear (e.g. funding,jobs, prizes, recognition).In terms of explicit incentives, a number of problems may arise: • The reviewing process is rarely explicitly incentivized (in the case of journals, reviewers are usuallyanonymous and not compensated), in particular related to their ability to spot e.g. mistakes; andthere is an asymmetry of incentives: there are usually only negative consequences for not ﬁndingerrors and no signiﬁcant downside to rejecting a valid proof. • For high-proﬁle problems, there is a problematic asymmetry: numerous amateurs may see a lot ofupside in submitting (mostly incorrect) proofs of famous conjectures, while at the same time fewerexperts are available to spend their precious time on ﬁnding errors in these proofs (with no upside). • Authors may be incentivized to publish vague, incomplete proofs to claim precedence over otherauthors. • Independent experts are hard to ﬁnd in very specialized ﬁelds, and the incentives to disclose conﬂictsof interests are limited. If the reviewers are competing with the authors (or conversely are interested inseeing them succeed) they may stall the reviewing process (or conversely be too lenient), as discussedin Section 1.4.2 above. • In the case when proofs involve security issues (such as in cryptographic contexts), there may be ad-ditional problems, with experts having possibly incentives to keep discovered mistakes to themselves,as exploiting them may be worth money.The above problems are illustrated by a number of high-proﬁle examples [NaGr06, Cast20], and even whenaware of the existence of alignment problems, it is hard for external agents to identify in which instances ofthe above problems a situation falls [Cast20, NaGr06].In light of the alignment problem, the goal of SPRIG is to mediate multi-agent interactions with explicitincentives, designed in such a way as to align the agents’ objectives with the ones of the reviewing process.1.5.

Markets, Information, and Games.

As discussed in Section 1.4, the validation process of proofs asperformed by mathematicians involves a number of agents, with diﬀerent levels of information, interactingin a variety of manners. Those include publishing proofs, detecting issues in papers, asking for more detailsand providing them.Similarly, SPRIG invites repeated interactions between members of the mathematical community withpotentially variable levels of information and degree of involvement.In both the current validation process and the SPRIG protocol, understanding the set of involved agentsand their interactions as an economic system is of paramount importance. Indeed, the raw data of theprotocol outcome (say: number of refereeing rounds, ﬁnal status: accepted/rejected) is in general insuﬃcientto determine precisely what credence the community should have in the validity/invalidity of a proof.Understanding the motives, incentives and beliefs of the relevant agents allows one to assess what infor-mation we can actually extract from a protocol outcome. The simplest example is perhaps the case of anaccepted paper, the author of which sits in the editorial board of the publishing review. All other thingsbeing equal, and because of an obvious incentive problem, it seems rational to (at least slightly) decrease thecredence in the validity of the published results.Section 1.4.3 lists a number of other incentive issues. For completeness, we now brieﬂy review the coreconcepts of the modern microeconomics toolbox and the main economic theories pertaining to the discussionabove. A deeper game-theoretic treatment of the validation process is provided in Section 4.1.5.1.

Agents, Bayesian Views, and Markets.

The raison d’être of incentives problems is that each agent fol-lows their own agenda, being driven by speciﬁc motives or preferences. These preferences can be representedby a utility function, a notion that traces back at least to Bentham and J. S. Mill and has become common-place since the emergence of neoclassical economics. Von Neumann and Morgenstern [VNMo44] establishedconditions on agents’ preferences such that those can be ordered by an expected utility calculation, providingthe economics profession with a key tool for dealing with decision-making under uncertainty. However, otherconcepts were needed in order to think and make predictions about the way economic agents (inter)act.In particular, a proper micro-economic treatment of incentive problems and their consequences was virtu-ally impossible before the advent of two intellectual revolutions.

The ﬁrst one, largely initiated by Nash [Nash50], is game theory: it provided economics with a much-needed tool to model situations where uncertainty arises from one’s imperfect knowledge, not aboutthe state of Nature, but rather from other agents’ actions. • The second one, initiated by Muth [Muth61] and later supported, consolidated and popularized byLucas, is rational expectations: rational agents not only maximize their own utility, but have thesame knowledge as the economic modeler and are able to correctly compute the model’s outcome.This requires making assumptions about other agents’ behavior, but at the same time, allows them topredict this behavior (given that agents will take utility-maximizing decisions). Rationality requiresthat the predictions coincide with the assumptions. Key to the rational anticipation process is theability to correctly compute expectations; in particular, rational agents are Bayesian updaters.Several contexts, including the issues we investigate in the current paper, called for an extension of these toolsto the case of asymmetric information: situations in which agents operate under diﬀerent information sets andcan extract some information from others’ actions. Akerlof [Aker70], Spence [Spen3] and the various worksof Stiglitz and his co-authors in the seventies closed this gap. Contributions such as [ChKr87] and [FuTi91]provided reﬁnements of the equilibrium concept for strategic interactions under asymmetric information. Inthis literature, agents have a ‘type’, i.e. a characteristic that is not directly observable, but partially or fullyinferred given a history of actions. As we shall see, in SPRIG, this type is the subjective probability that aproposed proof can be unrolled up to machine language level (itself a function of variables such as personalskill, amount of work and carefulness which are not fully observable by outsiders).Townsend’s model [Town79] with costly state veriﬁcation initiated a large literature on optimal contractingunder asymmetric information. In SPRIG, the goal is, indeed, to estimate the status (i.e. valid/invalid) ofa proof. But the very design of our validation process implies that veriﬁcation (while potentially costly interms of time and intellectual energy) might in fact be beneﬁcial to the veriﬁers, who can collect bounties.The system formed by SPRIG and its users can be seen as a market for proofs, although not quite in thesense of a stock market. However, just as the (semi-strong) eﬃcient market hypothesis [Fama70] stipulatesthat the utility-maximizing behavior of agents will lead stock prices to reﬂect any available public information,we expect that with properly chosen parameters, SPRIG will aggregate information and eventually disclosethe actual status of a proof.The dark markets reviewed by Duﬃe [Duﬀ12], while diﬀerent from our market in a variety of regards,also share similarities with SPRIG. These markets are dealer networks in which connected agents conductbilateral negotiations to trade ﬁnancial assets (there is no ‘market price’). Each transaction reveals part ofthe private information that dealers have, and therefore one can expect information to ‘percolate’ throughthe network.While scoring rules can be used to aggregate the credences of various agents on statements (see e.g.[Hans03a, Hans03b]), SPRIG features two additional key characteristics. First, it provides a built-in termi-nation date at which the validity of the proof/question is decided and thus “bets” can be settled. Second, thedynamic veriﬁcation process generates explicit information about the strengths and weaknesses of a claim.In fact, SPRIG can be viewed as a multi-round security game (see e.g. [BCDPS13]), in which agents ‘debate’the validity of claims ([ICA18]), with automatically set boundary conditions.1.5.2.

Economic Markets and Mathematical Truth.

The idea of agents with a Bayesian view on the truth ofmathematical questions dates back at least to the works of Solomonoﬀ [Solo64]. Recent works on systems ofsuch agents interacting through a market [GBCST16] have shed light on how such systems may be viewedas (decentralized) algorithms that estimate and reﬁne probabilities of truths for mathematical statements.[GBCST16] ’s algorithm, a logical inductor, dynamically assigns probabilities to mathematical statementsand the belief system thus produced is shown to be consistent asymptotically. This consistency as well asother desirable properties of their algorithm derives from a logical induction criterion, which is essentially ano-arbitrage condition on a market deﬁned as follows. Each mathematical statement φ is associated with aderivative that pays $1 if is true and zero otherwise; the market price of this derivative can then be seen asthe current belief about the truth of φ . A trader can observe the history of prices up to time t − , make somecomputations of their own, and post market orders at time t , adjusting their portfolio of derivatives writtenon statements φ , ..., φ n ( t ) . Their trading strategy is adapted, i.e. a function of past prices. Importantly, this“past” includes time t : naturally, the demand function depends on the price that will eventually be set. A market maker (a subroutine of the logical inductor) then sets prices in such a way that the demand of therader is (approximately) zero for all derivatives. By construction then, a “fair price” obtains, which capturesthe probability of the statements underlying the derivatives.While appealing in many aspects, these ideas remain unfortunately extremely theoretical. In the words of[GBCST16], ‘logical inductors are not intended for practical use.’ • The required computation times and spaces are unreasonably large. • There is an important distinction between being true and being provable: the latter is the focus ofmathematics research, and accumulating evidence for the veracity of result may not result in anyprogress towards proving it (for instance verifying Goldbach’s conjecture up to a large N may increaseone’s trust in the truth of the statement, but not bring any insight into how to prove it). • There is no focus on the amount of insight associated with the agents’ discoveries. • There is no obvious way to incorporate agents seeking information about speciﬁc statements, i.e. toshift the attention of the market towards a set of problems currently of interest to these agents.SPRIG leverages on the view that the combination of Bayesian updating and individual proﬁt-seeking be-haviour leads the market to reveal information about fundamentals (in our context: the validity of mathe-matical claims). However, rather than aiming at constructing an “exhaustive encyclopedia” of mathematicalpropositions, our framework incentivizes agents to inject (or induce injection of) information about speciﬁc statements, which are relevant for the community either because they are mathematically interesting or be-cause they correspond to critical points in a proof. Furthermore, our focus is on the eﬀective provability ofstatements rather than on the credence that they are, in an abstract way, true. That is, SPRIG not only in-vites its users to focus on important mathematical statements, but also induces them to discuss/prove/refutethose in a way that provides intuition and insights about why they are true or false.1.6.

Blockchain and Related Technologies.

The last decade saw the emergence of fully decentralizedcomputing systems, in particular in the context of public databases made of public, immutable records, calledblockchains. These systems, running on a decentralized network of computers (typically connected to theinternet), have grown out of the desire to build transparent, trustless consensus systems, relying on no centralauthority, or speciﬁc machine, following a secure, time-stamped, and easily auditable behavior.While initially restricted to the context of the storage of digital assets into accounts (such as for Bitcoin),blockchains have grown in terms of applications and features. The introduction of smart contracts, allowingthe blockchain to perform complex operations and transactions conditioned on taking various inputs, hasopened new possibilities. A particularly prominent development is the creation of inexpensive, open, eﬃcientand trusted general-purpose markets.SPRIG provides a way to construct such markets aimed at decentralized, public proof veriﬁcation: its verydesign makes it perfectly suitable for running on the blockchain.In this subsection, we review a number of key principles and mechanisms of blockchain technologies, uponwhich SPRIG relies.1.6.1.

Distributed Systems.

A blockchain is a certain type of distributed system. A distributed system in-stance consists of execution instances of programs (often called clients) running on a network of computers(often called nodes), which communicate via a speciﬁed protocol. The protocol prescribes the communi-cations that the nodes should emit and receive. Early examples of distributed systems across the internetinclude peer-to-peer ﬁle sharing networks, in which a node has a number of ﬁles available for other nodes torequest. As a whole, a distributed system can be viewed as an execution instance of a program: a peer-to-peerﬁle network can for instance be viewed as a single database, emerging from the various nodes.Unlike regular program instances, distributed systems cannot be viewed as running on any particularnode, while emerging from the nodes; this makes distributed systems more tolerant to localized failures inthe system.A distributed system is called (fully) decentralized when there is no principal node coordinating thenetwork. A fully decentralized system instance is hence a form of consensus, emerging from the executionof the clients on the nodes: the choice of the nodes to adhere to the protocol deﬁning it, by running clientprograms which respect the protocol, is what gives life to the instance.

Remark . In some regards, the mathematical activity (on planet Earth) can be viewed as a decentralizedsystem: mathematicians are agents who choose to adhere to a protocol of logic rules, and whose work shouldbe accepted by other agents as long as it follows the protocol, and there is no central authority entitled toeciding what is correct mathematics or who should be able to publish mathematics. Of course, an importantdiﬀerence is that the protocol’s rules are not completely speciﬁed in practice, and that most nodes are notcomputers, but humans.Distributed systems have grown in importance over the last decades, due in large part to the developmentof the internet. As discussed in Section 1.6.2 below, a special of distributed systems have risen to prominencein the last decade: blockchains. SPRIG is designed to run on such systems.1.6.2.

Blockchain and Cryptocurrency Basics.

A blockchain is a speciﬁc type of distributed system, wherethe underlying database is made of a chain of immutable pieces of data called blocks, which grows over time(new blocks are appended as the instance evolves). This feature of blockchain allows for a consensus abouttime-stamped data to develop.For instance, Bitcoin is a blockchain instance consisting of blocks describing transactions between accounts(represented by cryptographic public keys), where each node possesses an entire copy of the blockchain; aboutevery 10 minutes a new block is added, which contains the transactions that have been validated in that timeinterval. The Bitcoin software is designed in such a way that the Bitcoin blockchain can be viewed as a publicledger of amounts of currency units (called bitcoins) owned by each account, and where validated transactionsmove bitcoins between accounts. More generally, blockchains can be used to implement crypto-currencies,and allow users to trade virtual assets, called tokens. In such systems, accounts are also represented by acryptographic public key, and transactions from an account submitted to the network are accepted if theyare signed by the private key associated with the account and the funds are still available.A key feature of blockchains such as Bitcoin is that they are based on a publicly available protocol(typically with an open-source reference client implementation); as a result, the laws governing the systemare transparent (the ‘code is law’ motto is sometimes used to describe such systems), and it informs theadherence of various nodes and stakeholders to the system.The implementation of most blockchain systems usually relies on the Internet’s infrastructure, and oncryptographic primitives to ensure both the integrity of the data and the authenticity of the transactions.The guarantee of data integrity provided by blockchains is at the heart of such protocols and of the interactionsof the agents, who can thus use blockchains as a medium for information exchange and trusted hub. As aresult, a number of blockchains (such as Bitcoin, Ethereum, etc.) have emerged as focal points (or Schellingpoints) for a growing population of users [Brei17]. Blockchains thus play the role of trusted platforms foragents interested in exchanging information and digital assets.The fact that the functions of blockchains are executed by code running on nodes has led to many extensionsbeyond the original application of token ledgers. In particular, the advent of smart contracts, as discussedin Section 1.6.3, has opened many new possibilities, including the system proposed in this paper.1.6.3.

Smart Contracts.

Smart contracts emerged naturally from the desire to leverage the power of blockchainsas decentralized computing platforms to implement programs to perform automated transactions: for e.g.a cryptocurrency, one would like to be able to run a program that automatically moves some asset froman account to another, at a time when certain conditions are met. Such programs run on the blockchain(i.e. are executed by the nodes of the blockchain) and update its state (by contributing new blocks); oncerunning on the blockchain, such a program can be viewed as a contract, guaranteeing the execution of certaintransactions if pre-speciﬁed conditions are met, hence the name smart contract.The behavior of a smart contracts is speciﬁed by its code, together with inputs from the blockchain: forinstance, a smart contract may be the recipient of a transaction from another agent on the blockchain, or itmay act according to a signed information source (for instance a trusted information feed from the physicalworld, known as an ‘oracle’). As a simple example, one can imagine a smart contract implementing a chesscompetition with automated rewards distribution: players submit their (cryptographically signed) moves tothe blockchain, and the outcome is either determined by one party resigning, both parties agreeing to a draw,or by reaching a position computed as terminal by the smart contract.A number of blockchains have come up with developed infrastructures for smart contracts, includingEthereum, Tezos, Algorand, Avalanche, ... Each of these platforms allows for the writing of smart contractsin fairly rich (sometimes Turing-complete, as for Ethereum) high-level languages, and their execution on theblockchain against a fee (depending on the complexity of the operations, and payable in the blockchain’scryptocurrency token).he smart contract infrastructure has enabled the construction of numerous decentralized platforms, inparticular in decentralized ﬁnance (as discussed in Section 1.6.4 below): exchanges, betting markets (relyingon information feeds), automated market makers, stable coins, etc. can now be run as smart contracts.Such platforms allow for applications that previously needed to rely on the good behavior of expensive (andcorruptible) trusted third-parties to enforce the execution of the contracts. Their transparence allows for adetailed risk analysis (see e.g. [AnCh20, AKCNC20]).Smart contracts platforms can be used to build trusted interactions and consensus, to establish transparent,reliable, and eﬃcient institutions. SPRIG is designed to run on smart contracts (without relying on externaloracles), allowing it to aim for such goals, and to be a building block for further decentralized applications(see Section 5.5 below).1.6.4.

Decentralized Markets.

We now brieﬂy discuss advances in the applications of smart contracts todecentralized markets, which have experienced a surge of interest in recent years; SPRIG can be viewed as aform of decentralized market for mathematical derivations.One of the ﬁrst interesting applications of smart contracts is that of decentralized betting markets (e.g.[Augur]). In the simplest form of decentralized betting smart contracts, two parties decide to bet at givenodds on the outcomes of some future event. To do so, they create a smart contract to which they send theirbets (i.e. the contract acts as an escrow) and that can look up a pre-speciﬁed, commonly trusted data feed,aka the ‘oracle’; when the event has happened, the smart contract determines the outcome from the datafeed and sends the wagered funds to the winner.For certain betting markets, no external oracle is even needed since the relevant event occurs directly onthe blockchain. The power of the blockchain to move assets based on the results of computations has attractedsome attention in the mathematical community. Indeed, in principle, checking a proof (in a machine-veriﬁableformat) can be performed by a smart contract (provided that the platform’s language is expressive enough,and enough computing resources are available): in particular the projects Qeditas [Whit16] and Mathcoin[Su18] are based on such ideas. Both aim at constructing a ledger of agreed-upon mathematical statementswhere the prospect of ﬁnancial rewards induce agents to contribute their knowledge.

Mathcoin:

In the Mathcoin project proposed by [Su18], the ledger is constructed using a bottom-upapproach: agents successively append statements that are logical consequences of the previous ones,starting from the Zermelo-Fraenkel axioms. These statements are appended if the agents provide avalid proof at the machine level. Connected to this growing ledger, a market allows the agents tobet on yet unproven propositions. A user in possession of a result potentially relevant to the proofof such a proposition can buy a derivative paying conditional on the validity of the proposition, thenpost their result on the ledger. They should subsequently beneﬁt from an appreciation of the price ofthe derivative. Hence agents are incentivized to contribute their knowledge. However, the Mathcoinprotocol and SPRIG diﬀer in several important respects. First, the former produces a ledger ofmachine-level claims only, which is likely to be impractical for the scientiﬁc community as a whole;the machine-language requirement also presumably implies that growing the ledger will be a slow andcumbersome task. By contrast, we use machine-language expansion only as a boundary conditionand expect SPRIG to produce concise, human-tailored proofs. Second, its pricing function exposesMathcoin to an attack where agents are incentivized to post trivial claims. The associated token isinitially priced at . . The claimer can then immediately post a proof and collect , as their claimwas proven. This attack pollutes the blockchain and more importantly drains the public fund, whichis intended to reward agents who successfully bet on substantive propositions, eﬀectively renderingthe system unusable. While [Su18] mentions this attack, no satisfactory ﬁx is provided. Qeditas:

The Qeditas system of [Whit16] also suggests the construction of a ledger of propositions.There, agents are incentivized to append a result (written in machine language) either to collectbounties from a foundation or an individual interested in the result, or because they expect otheragents to need it to prove something else in the future and hence to buy its ‘rights’. As for Mathcoin,the bottom-up approach combined with the requirement for complete machine-level proofs impliesusability and practicability issues.To sum up, while projects such as Qeditas or Mathcoin are promising endeavours, their functioning seemsat odds with the way mathematicians work, as discussed in Sections 1.2.2 and 1.2.3. SPRIG creates adecentralized market for mathematical derivations, which allows one to avoid evaluating unneeded regions ofhe proof, while still relying on the ability of smart contracts to arbiter, in case of disagreement, mathematicaltruths.1.7.

Outline.

As discussed in the previous subsections, mathematical proofs aim at eliciting trust into thevalidity of statements and transmitting insight into their justiﬁcations. While trust relies on the conﬁdencethat a machine-veriﬁable proof could be produced if needed, machine-veriﬁable proofs are diﬃcult to produceand convey little insight on a per-line basis. As a result, proofs are made at a high, informal level in practice,and are much more concise than their machine-level counterparts, omitting many details. The productionand veriﬁcation of such high-level proofs thus represent a challenge: various agents with various levels ofinformation may disagree on what consists a complete and valid proof. As a result, while a proof’s validityis an objective question in principle, in practice it becomes a complex multi-agent problem, where incentivealignment problems may arise.In this paper, we introduce the SPRIG protocol, which aims at enabling proof submission and veriﬁcationin a decentralized manner, allowing agents to participate with their various degrees of information, and to beincentivized to behave honestly. It is designed to run on a blockchain, allowing a smart contract to handlethe distribution of stakes and bounties, and to serve as an arbiter of debates without relying on any trustedinstitution.More precisely, the structure of the following sections is as follows. • In Section (2), the SPRIG protocol is presented. – In Section 2.1, the ideas leading to SPRIG are introduced, starting with a simple game betweena claimer and a skeptic (Section 2.1.1), and introducing variants one by one (Sections 2.1.2,2.1.3, and 2.1.4). – In Section 2.2.1, the basic version of SPRIG is described in detail: it is based on a hierchicalproof format, called Claim of Proof Format (Section 2.2.1), and a recursive structure of nestedclaims and questions (Sections 2.2.2 and 2.2.3). – In Section 2.3, the SPRIG illustrated, through interactions mediated by it, in a number of cases. – In Section 2.4, a number of variants of the basic version of SPRIG are presented, and theirmerits discussed. – In Section 2.5, various aspects pertaining to the blockchain implementation of SPRIG are dis-cussed. • In Section 3, a game-theoretic perspective on SPRIG is introduced, presenting informally the eﬀectsof the incentives structure on the agent’s interactions, and the protocol’s resilience to attacks. – In Section 3.1, the strategic interaction between claimers and skeptics, and the results on theproofs constructed in this interaction are discussed. – In Section 3.2, the robustness properties of SPRIG against various types of attackers are dis-cussed. • In Section 4, an in-depth quantitative analysis of a simpliﬁed model of SPRIG is presentend. – In Section 4.1, the simpliﬁed model is introduced, which consists of a two-player game of depth . – In Section 4.2, the solution of the model is presented, with a description of the minima. – In Section 4.4, key questions about the reliability of SPRIG are answered in terms of the model’ssolution. – In Section 4.5, the dynamics and robustness of SPRIG are discussed in light of the analysis ofthe simpliﬁed model. • In Section 5, a number of applications of SPRIG and outlook for future research are discussed: – In Sections 5.1, 5.2, 5.3, a number of direct applications of SPRIG to concrete veriﬁcationsituations are outlined: for theorem proof veriﬁcation, for the creation of mathematical challengeswith bounties, and for the elicitations of decentralized security audits. – In Sections 5.4 and 5.5, possible uses of SPRIG as a platform for new applications are discussed,in particular for the development of automated theorem proving and derivatives markets, allow-ing agents to inject various types of informations. – In Section 5.6, a number of other applications, relying on exteranl oracles, are proposed..

The SPRIG Protocol

In this section, we describe the protocol at the heart of the present paper, which allows one to constructand incentivize a debate between claimers (provers) and skeptics to determine the validity of a high-level,declarative mathematical derivation: SPRIG, short for Smart Proofs via Recursive Information Gathering. • In Section 2.1, the key ideas of the protocol are progressively introduced. • In Section 2.2.1, the claim of proof format upon which SPRIG is based is introduced. • In Sections 2.2.2, 2.2.3 and 2.3, SPRIG is presented in detail: ﬁrst, via an informal top-down view,then via a formal bottom-up deﬁnition, and ﬁnally via illustrative examples. • In Section 2.4, a number of natural variants and extensions of the basic SPRIG protocol are proposed. • In Section 2.5, a number of aspects of the blockchain implementation of SPRIG are discussed.2.1.

Prologue.

In this prologue, we proceed step by step to introduce SPRIG: we start by introducing aproof-checking protocol with two agents, a claimer and a skeptic, debating the provability of a statement,and using machine-level veriﬁcation as the ultimate arbiter.2.1.1.

Claimer and Skeptic Debate.

We ﬁrst introduce the Claimer-Skeptic debate as a simple process betweentwo agents, called Claimer (pronoun: she) and Skeptic (pronoun: he): • Claimer claims to have proven a theorem in the sense of modern mathematical proofs (1.2.3): shehas a high-level proof sketch of the theorem (a collection of statements claimed to break down thediﬃculty of the theorem into smaller pieces), and feels conﬁdent she could ﬁll in the details, down tomachine level if needed (i.e. she can provide a sequence of statements which follow from each otherby the application of elementary rules of logic); at the same time, she cannot or does not want toprovide all the details down to the machine level, because the proof would be impractically long. • Skeptic sees what Claimer shows him. For any statement shown by Claimer, Skeptic has beliefs aboutthe probability that Claimer could indeed, if pressed, provide the details.For the game to allow for a veriﬁcation of the proof, Claimer and Skeptic use the following protocol: • Skeptic may ask for more detail on any proof statement shown by Claimer that is of higher level thanmachine-level detail. • Skeptic may invalidate the proof by revealing a mistake in the machine-level proof details. • Claimer cannot indeﬁnitely propose high-level proofs: if pressed to give details down a certain numberof levels (say 9), she must reach machine-level details, or her proof is considered invalid. • Claimer has explicit bounds on the size of the proof sketches and of the machine-level details. • After Claimer has published a proof sketch, Skeptic has a limited (ﬁxed in advance) amount of timeto request details, and after Skeptic has asked for details, Claimer has a limited (ﬁxed in advance)amount of time to provide details on a proof statement.Claimer and Skeptic play a role that is somewhat akin to that of an author and a reviewer; in practice,Skeptic can just be Claimer’s critical thinking, which probes for possible weaknesses of Claimer’s proof, toassess Claimer’s conﬁdence in her proof being indeed complete.The assessment made by this protocol is whether Claimer can provide details in the proofs quickly enough:there is indeed a limit in terms of how much into the details she can go (to avoid a case where Skeptic wouldjust state tautologies, instead of giving proofs).A central weakness of the above protocol arises if there is no alignment between Claimer and Skeptic:Skeptic may start bombing Claimer with useless questions, or conversely not ask any question at all. In orderto prevent this, incentives can be put in place, as explained in Section 2.1.2 below.2.1.2.

Incentives: Claimer’s Stakes and Skeptic’s Bounties.

In order to avoid the alignment problem in thedebate between Claimer and Skeptic introduced in Section 2.1.1, incentives can be added: • Claimer must put a stake with the claim: this stake encourages Skeptic to ask questions. • Skeptic must pay a bounty to ask a question: this prevents Skeptic from bombing Claimer withuseless questions. • In case Claimer can answer a question from Skeptic, she gets Skeptic’s bounty: this compensates herfor the work required to answer. • In case Claimer cannot correctly answer a question from Skeptic, Skeptic gets Claimer’s stake.etting the parameters correctly will incentivize Claimer and Skeptic to do their work: Skeptic will only askquestions about the places where he feels there is a reasonably good chance Claimer cannot ﬁll in the details;conversely, he will not ask about obvious points in the proof. This selective revealing of the proof may beuseful to an external observer, as well: the details that are revealed are only the interesting, non-trivial ones.At each stage, one side may disagree with the other; the ﬁrst one to stop debating loses, unless we reach themachine level, where it is Claimer’s burden to prove her statement in machine-level language (otherwise sheloses).In this sense, the machine level serves as the ultimate arbiter of who is right; somewhat interestingly, ifthe incentives are set correctly, a debate between a rational claimer and a rational skeptic would probablyend before reaching the machine level (as in a game between chess masters where a checkmate position isalmost never reached: the losing side will resign beforehand).2.1.3.

Many Claimers and Skeptics.

In Sections 2.1.1 and 2.1.2, we had a debate between only two agents(Claimer and Skeptic). With good incentives, the roles of Claimer and Skeptic can in fact be completelydecentralized: we will have a market of agents, claimers and skeptics (where an agent can play both roles atvarious levels). The skeptics can ask details about any published proof sketch (by paying an upfront bounty,being the ﬁrst to ask, and doing so within time limits); conversely, the claimers can propose proof sketches toany unanswered question (by paying an upfront stake, and doing so within time limits). The claimer’s stakesare actually split in two: an ‘up’ stake and ‘down’ stake: if the claim of proof ends up being invalidated, the‘up’ stake goes to the question that the claim of proof was trying to answer, and the ‘down’ stake to thequestion that ﬁrst invalidated the claim.This structure allows various agents to perfom in various capacities: for instance, agents with a goodhigh-level vision can propose high-level proof sketches, while agents who are more comfortable with low-leveldetails will provide proofs of sub-sub-claims, etc. Again, if the bounties and stakes and times are set well,each agent will inject their own information into the system by either proposing claim of proofs and questionsreﬂecting their beliefs, in a fully decentralized way.2.1.4.

Question as Root of the Process.

A small variant of the protocol can be introduced in the case where‘we start with Skeptic’: Skeptic may start by putting a bounty (for instance, because he is interested insponsoring research about a question he cares about), to which claimers may propose proof sketches (payingand ‘up’ and ‘down’ stake upfront).With this scheme, claimers should be able to submit several proofs for a statement, while compensatingskeptics who may debunk wrong proofs. The rest of the process is completely symmetric.This concludes the prelude part of the SPRIG protocol description. In Section 2.2, a precise formalizationof the SPRIG is detailed. In Section 2.4, a number of variants and extensions are presented. In Section 2.5,questions associated with the blockchain implementation are discussed.2.2.

SPRIG Protocol Description.

Building upon the insights of the previous sections, we now formalizethe Smart Proof by Recursive Information Gathering (SPRIG) protocol. At the root of SPRIG is either aquestion or a claim of proof; the protocol then builds a tree starting from the root, with questions followingclaims of proof and vice versa. All the questions and claims of proof consist of statements written in a formalmathematical language, leaving no room for ambiguity (see Section 2.2.1).2.2.1.

Claim of Proof Format.

SPRIG is based on the communication of unambiguous mathematical state-ments, written in a formal proof language. The description we give here is agnostic of the speciﬁc formalsystem; our format description can be implemented using a declarative proof language such as Mizar, Isar orLean.The format we describe is based on hierarchical proofs. A complete machine-veriﬁable proof of a state-ment is a proof that can be structured as a tree with nodes corresponding to statements, and with leavescorresponding to machine-veriﬁable statements. SPRIG allows agents to query and provide a subtree of thecomplete proof tree that is large enough to reach a consensus about whether or not the tree can be completedinto a complete tree (with given size and time constraints), as discussed in Sections 2.2.2 and 2.2.3 below.Recalling the deﬁnition of Section 1.2.5, and setting aside the question of deﬁnitions for a moment (thiswill be discussed in Deﬁnition below 17 below), the structured proof format of level L ≥ is that of a treewith the following structure: The root (‘top-level’) is the statement S ∗ : A ∗ = ⇒ C ∗ itself, where A ∗ includes axioms and acceptedstatements used to derive the conclusion C ∗ . • For each non-leaf (‘high-level’) statement S : A = ⇒ C , its children ( S j ) j =1 ,...,k are statements A j = ⇒ C j , where A j is of the form A j = A ∪ { C i for i ∈ I j } for some I j ⊂ { , . . . , j − } , where C k = C . • For each leaf (‘low-level’) statement S : A = ⇒ C , a machine-veriﬁable proof is provided. • The tree height (distance between the root and leaves) is at most L ≥ . Remark . In our framework, an assumption A may include the introduction of notation (e.g. ‘let x be suchthat ...’); some mechanism for the resolution of overloaded symbols is naturally needed (but not discussedhere, being an implementation detail). Remark . Theorems are often explicitly of the form T : α = ⇒ γ (e.g. we could have α correspondingto ‘ f is a holomorphic function on C ’ and γ corresponding to ‘ f has a convergent power series expansion on C ’). In such a case, we could write the statement with γ = C ∗ , and A ∗ would include α , as well as the listof axioms and assumed results used to derive γ . Remark . Various formats of proof ﬁt in this framework, including proofs by contradictions, etc. SeeSection 7 in the Appendix for examples.To make the writing of statements eﬀective, deﬁnitions are needed, which allow one to reserve notation torefer to properties of objects.

Deﬁnition 13.

A collection of deﬁnitions D introduces symbols specifying properties of objects, written informal language, and speciﬁes references from which other deﬁnitions can be imported. Remark . For instance a deﬁnition could be ‘is-group(G, op)’ which would imply that G is indeed a set,op is indeed a function G × G → G and that the various properties of the op operation are satisﬁed. Remark . In the format as we specify it, deﬁnitions need to be syntactically correct, but not necessar-ily consistent; they are to be thought of as mere shortcuts enabling more concise and clearer statementformulations.

Remark . In mathematics, deﬁnitions of objects such as Riemann’s zeta function ζ by a series such as (cid:80) ∞ n =1 n − s on H := { s ∈ C : (cid:60) e ( s ) > } involve a lemma (saying the series converges on H ); in our frame-work, we would instead deﬁne a property ‘is-zeta-on-H1’ for a function f : H → C which would mean thatthe series (cid:80) ∞ n =1 n − s converges for any s ∈ H and its value is f ( s ) ; a statement (needed to e.g. prove theprime number theorem) would then assert that there exists a unique function H → C that satisﬁes the‘is-zeta-on-H1’ property; a lemma such as ζ ( s ) = (cid:81) p − p − s would then read "ﬁx a function ζ : H → C ;assume that ζ satisﬁes the ‘is-zeta-on-H1’ property; then for any s ∈ H , we have ζ ( s ) = (cid:81) p − p − s ". Aspeciﬁc language may include shorthands to make the notation shorter, of course.Adding deﬁnitions to the proof construction, we obtain the following format for statements: Deﬁnition 17.

The Claim of Proof Format (see Figure 2.1) consists of statements, high-level claims of proof,and machine-level claims of proof, associated with a ﬁxed logic system Λ (as in Section 1.3.2): • A statement S consists of a context Γ of deﬁnitions and an implication A = ⇒ C . • A claim of proof P L of level L ≥ of a statement S with context Γ and implication A = ⇒ C consists of a chain of reasoning D , S , . . . , S k , where – D is a collection of deﬁnitions (as in Deﬁnition 13). – ( S j ) j =1 ,...,k are statements with contexts Γ ∪ D and where S j is the implication A j = ⇒ C j ,such that ∗ A j is of the form A ∪ { C i for i ∈ I j } for some I j ⊂ { , . . . , j − } . ∗ C k = C . – It is claimed that the statements S , . . . , S k have claims of proof of level ≤ L − . • A claim of proof P of level of a statement S is a sequence of statements which can be validatedby a computer, which follow the rules of the logic system Λ , and which allow one to deduce S . = ⇒ C · · · ... A = ⇒ C A = ⇒ C A k = ⇒ C k ... ... ... Figure 2.1.

Claim of Proof Format. The boxes’ left sides correspond to assumptions, whilethe boxes’ right sides correspond to conclusions. The light dashed lines represent possibleassumption dependencies.

Remark . A high-level claim of proof comes with the following implicit claim: for each j = 1 , . . . , k , deriving S j is signiﬁcantly easier than deriving S ∗ .The length of a claim of proof is measured by its aggregated length measure µ that is an increasingfunction of the length of its chain of reasoning (measured in number of symbols in the language in which itis expressed, possibly with a weight associated with diﬀerent symbols); for instance, a natural choice for µ issimply the total length of the statements measured in number of symbols. Remark . Top-Down Informal View of SPRIG.

Generalizing the examples, we now give an informal descriptionof SPRIG, in the order in which the interactions between agents take place.Given a context Γ and an aggregated length measure µ the validation mechanism goes as follows:(1) Given a parameter L ≥ , the root of the process consists of(a) either a claim C L of level L (a statement S with context Γ , together with a claim of proof P L of level L , as in Deﬁnition 2.2.1) with a pre-speciﬁed stake σ L = σ ↓ L ;(b) or a question Q L (i.e. a statement S with context Γ ) with status ‘unanswered’ and a pre-speciﬁedbounty β L .The root speciﬁes maximum proof lengths λ L − , . . . , λ , stakes σ ↑ L − , σ ↓ L − , . . . , σ ↑ , σ ↓ , σ ↑ , σ andbounties β L − , . . . , β to be used at each of the lower levels.(2) For (cid:96) ≥ , a claimer might attempt to answer a question Q (cid:96) = ( S ) by producing a claim C = ( S , P ) of level (cid:96) (cid:48) ∈ { (cid:96), } , i.e. by providing: • A claim of proof P of level (cid:96) (cid:48) of the statement S . • If (cid:96) (cid:48) = (cid:96) : – The claim of proof P must be of total length at most µ ( P ) ≤ λ (cid:96) . – The claimer must lock a stake pair σ ↑ (cid:96) , σ ↓ (cid:96) (with σ ↑ (cid:96) = 0 if (cid:96) = L ). • If (cid:96) (cid:48) = 0 : – The claim of proof P must be of length at most λ , – The claimer must lock a stake σ ↑ (cid:96) and pay a computation cost c . – In this case, if the claim of proof is (automatically) validated, it gets the status ‘validated’,otherwise, it gets the status ‘invalidated’. • In all cases: – All claims of proof addressing Q (cid:96) must be proposed within an response time τ (cid:96) of Q (cid:96) ’spublication. If a claim of proof addressing Q (cid:96) gets the status ‘validated’, then this claim is said to beanswering Q (cid:96) and Q (cid:96) gets the status ‘answered’; if no such claim exists, then Q (cid:96) gets thestatus ‘unanswered’.(3) For (cid:96) ≥ , a skeptic might dispute a level- (cid:96) claim C (cid:96) = ( S , P (cid:96) ) by asking a question Q (cid:96) − = ( S ) where S is one of the statements appearing in the claim of proof P (cid:96) .(a) The skeptic must lock a bounty β (cid:96) associated with the question.(b) All questions about C (cid:96) must be asked within the veriﬁcation time θ (cid:96) of C (cid:96) ’s publication.(c) If a question originating from the claim gets the status ‘unanswered’, then this question is saidto be a defeating question, and the claim gets the status ‘invalidated’; if no such question exists,the claim gets the status ‘validated’.The incentivization mechanism for the proposal is based on bounties ( β (cid:96) ) (cid:96) and stakes (cid:16) σ ↑ (cid:96) , σ ↓ (cid:96) (cid:17) (cid:96) as follows:(1) If a claim C (cid:96) addressing a question Q (cid:96) is the ﬁrst one to get the status ‘validated’, then C (cid:96) receivesthe bounty β (cid:96) from Q (cid:96) .(2) If a claim C (cid:96) addressing a question Q (cid:96) gets the status ‘invalidated’, then Q (cid:96) receives the stake σ ↑ (cid:96) from C (cid:96) .(3) If a question Q (cid:96) disputing a claim C (cid:96) +1 is the ﬁrst one to get the status ‘unanswered’, then Q (cid:96) receives the stake σ ↓ (cid:96) +1 from C (cid:96) .In a nutshell, there are debates between claimers (agents providing claims of proof for statements) andskeptics (agents asking questions about proofs), where each side debates while having some ‘skin in thegame’: claimers and skeptics must pay upfront to play, and they will be paid back if their point is valid (i.e.the claim is validated, or the question remains unanswered) and possibly further compensated (for a question,if it is the ﬁrst to defeat the claim it originates from; for a claim, if it the ﬁrst to answer the question itoriginates from). Winning occurs when one of the side concedes, and in case no side wants to concede, afterat most D steps, one reaches the point where only machine-level proofs are accepted; hence the ultimatejudge is an algorithm that runs the checking of the machine-level proof.Claimers and skeptics have dual roles. Let us simply point out the following diﬀerences: • The skeptics only have a limited number of possible moves (limited by the number of steps in theclaims of proof that have been published), while the provers have a virtually inﬁnite number ofpossible moves (they can provide any purported claim of proof). • While invalidated claims must share their stake to the question they address (if it exists) and of theﬁrst defeating question, the answered questions must only pay their bounties to the ﬁrst validatedclaim of proof closing them.

Remark . The protocol interaction does not necessarily stop immediately after the root status has beenset. For instance, a claim may be invalidated by a ﬁrst unanswered question, but the status of questionsthat were raised after that ﬁrst question may still be undecided; the protocol interaction must run until allquestions and claims get a status.2.2.3.

Formal Description.

We now give the formal description of the SPRIG protocol introduced in Section2.2.2. A context Γ is ﬁxed by the root, as well as a high-level proof aggregated length measure µ .We work with claims C (cid:96) and questions Q (cid:96) of levels (cid:96) = 0 , , . . . . We denote by C (cid:96) , Q (cid:96) the correspondingtypes (and we write e.g. C (cid:96) ∈ C (cid:96) to indicate that C (cid:96) is a claim of level C (cid:96) ).For simplicity, the protocol assumes that questions and claims are submitted in continuous time and cannotappear simultaneously. Similarly, we assume that the claims and questions are published at the moment oftheir creation. See Section 2.5 for a discussion of this issue in the context of smart contracts.Type C (cid:96) for (cid:96) ≥ . • Data: – An origin question Q (cid:96) ∈ Q (cid:96) ∪ { none } (we say that the claim originates from Q (cid:96) ). – A mathematical statement S : ∗ The statement S of the origin Q (cid:96) if Q (cid:96) (cid:54) = none . ∗ An independent mathematical statement if Q (cid:96) = none . – A claim of proof P (cid:96) = D , S , . . . , S k of level (cid:96) of S of aggregated length µ ( P (cid:96) ) ≤ λ (cid:96) , Parameters π C (cid:96) : max-length λ (cid:96) , stake pair (cid:16) σ ↑ (cid:96) , σ ↓ (cid:96) (cid:17) (with σ ↑ d = 0 if Q (cid:96) = none ), veriﬁcationtime θ (cid:96) , Q (cid:96) − parameters π C (cid:96) − if (cid:96) ≥ (bounty β (cid:96) − , response time τ d − , π (cid:96) − parameters). • Necessary creation of initial funds: σ ↑ (cid:96) + σ ↓ (cid:96) . • Status outcome: – The claim gets the status ‘invalidated’ if there exists a defeating question, i.e. a question Q (cid:96) − ∈ Q (cid:96) − that(1) originates from the claim and disputes from one of statements S , · · · , S k of its claim ofproof;(2) respects the parameters π Q (cid:96) − (i.e. whose parameters are π Q (cid:96) − );(3) has the appropriate creation funds;(4) appears less than θ (cid:96) units of time after the publication of the claim;(5) gets the status ‘unanswered’. – Otherwise (if no defeating question exists): the claim has status ‘validated’. • Stakes/bounties outcomes: – If the claim gets the status ‘validated’: ∗ the stakes σ ↑ (cid:96) , σ ↓ (cid:96) are reimbursed to the claim owner; ∗ if Q (cid:96) (cid:54) = none , and it is the ﬁrst descendent of Q (cid:96) to get the ‘validated’ status, the bounty β (cid:96) of Q (cid:96) is paid to the claim. – If the claim gets the status ‘invalidated’: ∗ the stake σ ↑ (cid:96) is paid to the Q (cid:96) origin, if there is one; ∗ the stake σ ↓ d is paid to the ﬁrst defeating question owner.Type Q (cid:96) for (cid:96) ≥ . • Parameters π Q (cid:96) : bounty β (cid:96) , response time τ (cid:96) , C (cid:96) parameters π C (cid:96) (max-length λ (cid:96) , stake pair (cid:16) σ ↑ (cid:96) , σ ↓ (cid:96) (cid:17) ,veriﬁcation time θ (cid:96) , Q (cid:96) − parameters π Q (cid:96) − if (cid:96) ≥ ). • Necessary creation of initial funds: β (cid:96) . • Data: – An origin claim C (cid:96) +1 ∈ C (cid:96) +1 ∪ { none } , we say that the question originates from C (cid:96) +1 . – A mathematical statement S with context Γ : ∗ One of the statements S in the the claim of proof P (cid:96) +1 = D , S , . . . , S k of C (cid:96) +1 if C (cid:96) +1 (cid:54) =none ; in such a case, we say that the question disputes S . ∗ An independent mathematical statement if C (cid:96) +1 = none . • Outcome: – The stake σ ↑ (cid:96) is paid to the question owner by any claim C (cid:96) ∈ C (cid:96) that(1) originates from the question;(2) respects the π C (cid:96) parameters (i.e. whose parameters are π C (cid:96) );(3) has the appropriate creation funds;(4) appears less than τ (cid:96) unit of time after the publication of the question;(5) gets the status ‘invalidated’. – If there is a claim C ∈ C (cid:96) ∪ C that(1) originates from the question;(2) respects the π C (cid:96) parameters;(3) has the appropriate creation funds;(4) appears less than τ (cid:96) unit of time after the publication of the question(5) gets the status ‘validated’then the question gets the status ‘answered’.Otherwise (i.e. no such claim has appeared) the question is marked as ‘unanswered’. • Stakes/bounties outcomes:(1) If the question gets the status ‘answered’: – The owner of the ﬁrst validated claim C ∈ C (cid:96) ∪ C originating from the question gets thebounty β (cid:96) ; – The next such claims get nothing (but lose nothing).(2) If the question gets the status ‘unanswered’: (cid:51) (cid:51) (cid:51)(cid:55) (cid:55)

Figure 2.2.

A basic validated claim of proof (top horizontal segment): one question wasraised (vertical segment on the left). In answer to this question, a ﬁrst claim of proof wasproposed (middle horizontal segment) and invalidated by two unanswered questions (twoshort vertical segments), but then a second claim of proof was proposed, which was validatedas no question was raised about it. – The bounty β d is reimbursed to the question owner. – If the question is the ﬁrst question originating from C (cid:96) +1 to get the ‘unanswered’ status,the stake σ ↓ (cid:96) +1 is paid by the claim to the question owner; – The next such questions get nothing (but lose nothing).Type C . • Data: – An origin question Q (cid:96) ∈ Q (cid:96) (we say that the claim originates from Q (cid:96) ) for (cid:96) ≥ . – The statement S of the origin Q (cid:96) . – A machine-veriﬁable claim of proof P of length ≤ λ . – Parameters: max-length λ , stake σ ↑ , computation cost c . • Necessary creation initial funds: σ ↑ + c . • Status outcome: – If P is validated by the computer veriﬁcation system, the claim gets the status ‘validated’. – Otherwise, the claim gets the status ‘invalidated’. • Stakes/bounties outcomes: – If the claim gets the status ‘validated’: the stake σ ↑ is reimbursed to the claim owner. – If the claim gets the status ‘invalidated’: the stake σ ↑ is paid to the origin. – The computation cost c is burnt. Remark . As each claim and question contains the parameters that the claim and questions originatingfrom it must respect, the parameters of the entire interaction deﬁned by the SPRIG protocol are speciﬁedby the parameters of the root.2.3.

Illustrations Of SPRIG.

In this subsection, we present illustrations of SPRIG-based interactions. • Claims of proof are depicted as horizontal segments, with dashed segments representing machine-levelproofs. • Questions on parts of the proofs are represented as vertical segments. • Small teal segments on the claims of proof/questions segments represent the end of the allottedresponse time. • The validation status of a claim of proof at the end of the interaction is represented at the right endof the line (validated: (cid:88) , invalidated: × )). Claims of proof that consist of a machine-level proof aremarked with a green diamond. • Questions are represented as vertical lines (emanating from a statement in a claim of proof), andtheir eventual status is represented at the bottom end of the line (answered: (cid:88) , unanswered: × ).Four basic examples are provided (Figures 2.2–2.5), which are subpart of two examples of SPRIG runs, onewith a claim of proof as root (Figure 2.6) and another one with a question as root (Figure 2.7). (cid:55)(cid:55)(cid:55)(cid:55) (cid:55) (cid:55) (cid:51) Figure 2.3.

A basic invalidated claim of proof (top horizontal segment): one question wasraised (vertical segment on the left); as an answer to this question, a claim of proof wasproposed (second horizontal segment from the top); the claim of proof was itself questioned,and an additional claim of proof was proposed as an answer (second horizontal segmentfrom the bottom); two questions were raised about that additional claim of proof (the twobottom-most vertical segments), the ﬁrst of which was unanswered, and the second of whichwas answered by a validated claim of proof (bottom-most horizontal segment). As a result,the claim was invalidated. (cid:51) (cid:51) (cid:51)(cid:55)(cid:55) (cid:51)

Figure 2.4.

A basic answered question (left-most vertical segment): a ﬁrst claim of proofwas proposed (top-most horizontal segment), which was then invalidated by an unansweredquestion, but then a second claim of proof was proposed which was validated after a questionwas raised (bottom-most vertical segment), and that question was answered by a claim(bottom-most horizontal segment) which itself was not questioned. (cid:55)(cid:51) (cid:55) (cid:55)(cid:51)(cid:55) (cid:55)

Figure 2.5.

A basic unanswered question (left-most vertical segment). A claim of proofwas proposed (top-most horizontal segment), which resisted a ﬁrst question (second left-most vertical segment), as that question was answered by a unquestioned claim of proof, butwhich did not resist the second question, as the only claim of proof answering it (right-mosthorizontal segment) was invalidated by an unanswered question.2.4.

Variants and Extensions.

In this section, we present a number of variants and extensions of theSPRIG protocol, which can be enabled to optimize for various goals under certain environments. Many morevariants and extensions can in principle be considered, but we focus here on the ones that appear to bethe most naturally motivated and that live directly on the protocol itself. A number of further interesting (cid:51)(cid:55)(cid:55) (cid:51)(cid:51)(cid:51)(cid:51)(cid:51)(cid:51)(cid:51)(cid:51)(cid:51)(cid:55)(cid:55)(cid:51)(cid:51)(cid:51) (cid:51) (cid:51) (cid:51) (cid:51) (cid:51) (cid:51) (cid:51)(cid:51)(cid:51) (cid:51)(cid:51)(cid:51) (cid:51)(cid:51)(cid:55)(cid:55)(cid:55)(cid:55) (cid:55)(cid:55) (cid:51)

Figure 2.6.

In this case, the claim was eventually validated. Time goes horizontally in thelifetime of claims and vertically in the lifetime of questions, but it is not represented at scale.In this ﬁgure, four questions were asked and successfully answered; for the ﬁrst question, aﬁrst claim of proof was proposed, which was then invalidated, followed by one which waslater validated; the same is true for the fourth question that was asked. The second andthird questions were answered by claims of proof which were later validated. The secondclaim of proof proposed for the ﬁrst question was itself only validated after the ﬁrst questionabout it saw two claims of proof (a ﬁrst one, which was invalidated after going down 3 morelevels), and a second one, which was validated after questions were asked and answered atthe machine level.extensions can be then built upon protocol instances, in particular decentralized markets for derivatives canrely on using protocol instances as oracles, as discussed in Section 5.5. While these variants and extensionsappear to be promising, their detailed analysis is signiﬁcantly more complex and goes beyond the scope ofthis article.2.4.1.

Time-Varying Stakes and Bounties.

Intuitively (and as discussed in Sections 3 and 4 below), the trustin the fact that a claim of proof is correct depends on how favorable the incentives are to those askingquestions: if a skeptic has little to gain, and too much to risk in asking questions (in terms of explicitincentives), he may not ask a question about a claim, unless he is very conﬁdent that the question cannot beanswered (and he may not want invest time and energy to ﬁnd questions about claims).An agent publishing a claim of proof as a means to validate it may wish to establish a high level of trustin it (by oﬀering a high stake, to be paid to a skeptic succesfully challenging her claim), but may herself notbe very conﬁdent in its ultimate validity: there may be a fairly obvious mistake (for instance of notation),and she would not want to pay a high price for an obvious mistake. Similarly, an organization may want toincentivize the solution to a given open problem, but may not want to pay too much for it, if the questionturns out to be obvious.A solution to this is to rely on time-dependent stakes and bounties, in a way similar to Dutch auctions:start with conditions that are very favorable to the defending side (the side at the root of the interaction), andmake them more and more favorable for the challenging side. If there is an obvious challenge (i.e. a questionof whether the root is a claim of proof, if the root is a question), challenging agents will still be incentivizedto pose it as soon as possible (rather than to wait to increase their reward), as they are in competition withother challenging agents. (cid:51) (cid:55)(cid:51)(cid:51) (cid:55)(cid:55) (cid:55) (cid:55) (cid:55) (cid:55)(cid:51)(cid:51) (cid:55)(cid:55)(cid:55) (cid:51)(cid:51)(cid:51)(cid:51)(cid:55)(cid:55)(cid:55) (cid:55)(cid:55) (cid:55)(cid:51) (cid:55)(cid:51) (cid:55)(cid:55) (cid:55)(cid:55) (cid:55)(cid:51)(cid:51) ?? ? ? ? ? (cid:55) (cid:55)(cid:51) (cid:51)(cid:51) (cid:51)(cid:51) (cid:51)(cid:51)(cid:51) (cid:51)(cid:51)(cid:51) (cid:51)(cid:51)(cid:51)(cid:51) ? ????? (cid:51) (cid:55) (cid:55) (cid:55) ? Figure 2.7.

Illustration of the SPRIG protocol when the root is a question. The samenotation convention is used as in Figure 2.6. In this case, the question was answered bythe third claim of proof (and the interaction was ended as a result); a fourth claim of proofwas submitted, but its status was not yet decided at the time the interaction has ended; allinterrogation marks indicate statuses that are not yet assigned.At the same time, the owner of a claim may want to get back some of the liquidity that she locked intothe smart contract after a while, while still incentivizing the search for mistakes in the proof; in this case,having the stakes decrease over time could prove useful.In any case, the bounty and stake parameters at all levels of SPRIG can be replaced by time-evolvingfunctions, which must be speciﬁed when the protocol instance is created..4.2.

Generalized Bounties and Stakes.

In the basic version of SPRIG, a bounty β must be locked to ask aquestion, which will be paid to the ﬁrst validated claim of proof answering it; dually, a stake pair (cid:0) σ ↑ , σ ↓ (cid:1) must be locked to propose a claim of proof, where, in case of invalidation, the part σ ↑ is paid to the questionthe claim of proof was trying to answer, and the part σ ↓ is paid to the ﬁrst question that invalidated theclaim of proof. A number of variants can be introduced in terms of distributions for stakes and bounties.It may seem natural to propose an ‘upwards’ bounty β ↑ to be paid to the claim of proof a question derivesfrom; however, as discussed in Section 3, such upwards bounties may however open the possibilities for certainattacks (called the ‘Plagiarist’s attack’ below).Another possible extension to incentivize agents to disclose more information into challenging a claimconsists in splitting the stake σ ↓ into p shares σ ↓ , . . . , σ ↓ p for the ﬁrst p unanswered questions about the claim(this may require forbiding to ask the same question more than once). While a single unanswered questionsuﬃces to invalidate a claim, a stake may incentivize agents to look more closely at a claim of proof evenafter a ﬁrst question has been raised. On the other hand, rewarding multiple claims of proof that succesfullyanswer a question appears to be much more delicate: this opens the possibility to Plagiarist’s attacks thatare diﬃcult to counter.2.4.3. Claim-of-Proof-Dependent Parameters.

In the basic version of SPRIG, a ﬁxed stake is associated witha claim of proof, together with a ﬁxed amount of time to raise questions, and a ﬁxed limit on the claim ofproof length. While this setup incentivizes the publishing of correct claims of proof and incentivizes agentswho spot a problem in a claim of proof to question it, it does account for the fact that claims of proof may bemore or less hard to read. A way to account for this in the protocol is to allow for the stakes/bounties andveriﬁcation times to depend on the complexity of the claims of proof submitted: in particular, longer claimsof proof (as measured in the lengths of their claims, or the number of them) may warrant longer veriﬁcationtimes (or the veriﬁcation could be encouraged by requiring higher stakes); for claims of proof consisting ofmany statements, one may want to reduce the bounty to ask a question.For such variants, the parameters set by the root owners should be replaced by functions (also set by theroot owner) of the proof complexity and number of claims.2.4.4.

Synchronous SPRIG.

The original version of SPRIG is intrinsically asynchronous in nature: eachquestion and each claim of proof runs on an independent clock, and new questions or claims of proof may comeat arbitrary time, starting their own clocks. While this process incentivizes agents to disclose informationas soon as they have it, and is more eﬃcient at closing obvious cases sooner rather than later, a number ofsituations may suggest using a synchronous variant of the protocol: after a question is posed (say), a ﬁxedresponse time is given to post claims of proof. Claims of proof are hidden until the response time is elapsed,at which point they are revealed. Then, a round of questioning starts, giving a ﬁxed amount of time to askquestions about the various claims of proof; the questions are also hidden and revealed when the questioninground ends. Then a third round starts, in which claims of proof are proposed in response to the questionscan be posted, and revealed at the end of the third round.This variant of the protocol may be useful for revealing information at speciﬁc times (e.g. yearly contest)or to make the best uses of a community’s resources (perhaps the best critics of a theorem’ claim of proof areother agents trying to submit at the same time their own claim of proof, and they can focus on criticizingothers’ proofs some of the time, while focussing on answering questions the rest of the time).2.4.5.

Exclusive Disclosure.

In the basic version of SPRIG, agents ask questions when they doubt the validityof a step in a claim of proof. However, an intrinsic motivation may come from a question-raiser: that theyare curious, for independent reasons, about the answer to the question. In such a setup, it may be usefulto guarantee to the ﬁrst question-raiser an exclusive access to the answers to a question for a brief amountof time, before the answer is made public (the poster of answers would be incentivized not to disclose theiranswers to other parties in the exclusive time period, as it limits their attack surface).2.4.6.

Expedited Validation or Invalidation.

In a number of cases, it may be desirable to expedite a validationprocess: a variant can be introduced that allows a claimer to reduce the validation time in exchange for higherstakes and/or lower response times for the lower levels. This feature may prove desirable in certain situations,but it must be dealt with carefully in order to not introduce ﬂaws (leading to the validation of a proof thatshould not have been validated)..4.7.

Open Questions and Multi-Question Bounties.

One may want to put at the root of a SPRIG instanceboth a question and its contraposite: for instance, the Clay Institute oﬀers prize for the ﬁrst proof of P = N P or of P (cid:54) = N P . In this case, a single bounty (1M USD) is put at the root of the two questions, and it shouldgo to the ﬁrst validated claim of proof for either question that gets validated (as a result there is no bountyleft for the other question; this should not be a problem as a statement and its negation should not haveboth validated proofs!).More generally, an institute may want to put a single bounty for the ﬁrst agent answering one of a listof questions; as soon as one of the question is answered, there will be no bounty left for answering otherquestions will be cancelled. Even more generally, a limited number K of bounties could be made available forthe ﬁrst K questions answered, after which the there will no bounties left for answering the other questions.2.4.8. Stake-Sharing and Bounty-Sharing.

A possible downside of the system is the barrier of entry for aparticipating agent, who may not have enough funds, while at the same time possessing useful information.For questions, such an agent could put a partial bounty, and wait until this bounty is completed by otheragents: at that moment, the question is formally asked, and should the question be the ﬁrst to invalidate theclaim of proof, the stake will be shared by the agents who put the bounty, at the pro-rata of their bountyshare.For claims of proof, such an agent could put an encrypted claim of proof with a partial stake, try to ﬁndother agents who also believe in it to complete the stake (for instance, by proving her identity and usingher reputation), and decrypt it when the stake is full (thus avoiding a plagiarist to copy her claim of proof);again, in this case, if there is a bounty to be won, it will be shared among the stake holders at the pro-rataof their stake (or according to some other pre-determined rule).2.5.

Blockchain Implementation.

The SPRIG protocol presented in Section 2.2 and its variants andextensions presented in Section 2.4, are designed so it can be implemented on a blockchain, in a fully decen-tralized manner, without reliance on an external oracle. In this subsection, we discuss a number of designquestions related to the implementation of the protocol on a blockchain infrastructure.2.5.1.

Automated Proof Settlement.

As emphasized in the top-down view of SPRIG (Section 2.2.2), whatsettles the boundary conditions of the protocol (and hence ensures its good functioning) is the presence ofan ultimate arbiter, in the form of a computer-based system to verify machine-level claims. ImplementingSPRIG on a blockchain thus requires the ability to perform the necessary computations on the blockchain toensure transparency of the result of the computation (or to oﬄoad the computation to another blockchain,or to ﬁnd a veriﬁable way to ensure the relevant computations were done oﬀ-chain).While a large number of powerful proof veriﬁcation programs are available (see Section 1.3.2), their em-phasis is usually on helping users to write proofs. The most desirable features for a blockchain-based proofveriﬁcation system are somewhat diﬀerent. • Low memory usage: ultimately, a smart contract needs to be able to verify any step of the compu-tation. The sequence of computations may not need to be performed entirely on-chain, as long as itis auditable (see e.g. [EbHe18]). • Syntax making the writing of deﬁnitions and statements (as speciﬁed in Section 2.2.1) should betransparent to the agents. This may be helped by the development of open oﬀ-chain statementtranslators, assisting the users in the formalization of deﬁnitions and statements.On the other end, the system living on the blockchain can be very primitive in its ability to assist users towrite down proofs; should debates ever go down to the machine level, proof assistants can in principle be usedoﬀ-chain to propose machine-level claims of proof. Still, if incentives are set well and agents are rational, thepresence of a well-functioning proof system will only serve as a deterrent: close enough to the machine level,rational skeptics and agents should already agree on the existence of a machine-level proof and the side thatis wrong is incentivized to concede early.As a result of the above design goals and considerations, the development of a proof veriﬁcation systemtailored for them seems desirable.2.5.2.

Timing and Concurrency Issues.

The block-based structure of blockchains serves crucially as a time-stamp mechanism to validate the transactions: the consensus on the order of the blocks serves as the measureof the passage of time (and crucially at determining anteriority of modiﬁcations submitted to the blockchain:his is in particular what prevents double-spending with Bitcoin). As a result, the natural time unit of ablockchain is the number of blocks emitted so far.In the description of SPRIG, time is treated as a continuous resource, and time is asynchronous (exceptin the synchronous variant discussed in Section 2.4.4). For blockchains with a suﬃciently short validationtime, and non-trivial enough problems discussed with the protocol, it is unlikely that two questions are askedsimultaneously (i.e. in the same block – in such a case, a rule should be speciﬁed. Still, it is important tokeep this granularity in mind to avoid attacks by e.g. a quick plagiarist who could copy a claim of proofand try to push it into the same block; the solution in such a case is simply to make claimers ﬁrst commit asigned and encrypted version of their proof at least one block before disclosing its content.2.5.3.

Stakes and Bounties Lock.

For the protocol implementation, the agents need to lock their bountiesand stakes in the smart contract for a long time. In case they need liquidity, it is possible for them to resell(i.e. transfer ownership of) their stake in the contract to a third party.For the variant with time-varying stakes and bounties 2.4.1, a number of challenges also arise: either thestaker should put the maximum amount of capital upfront or they could be mandated to inject additionalcapital as time passes (at the risk of losing their stake if they don’t do so).3.

Informal Game Theoretic Discussion

Strategic Interactions and Protocol Outcome.

As mentioned in Section 1.5, understanding howagents interact through the SPRIG protocol, as well as interpreting the validation process outcome requirestaking an economic perspective. Indeed, while SPRIG is a set of rules that, given the decisions of varioususers, deterministically deﬁnes a tree, allocates rewards and eventually settles the status of the claims andquestions, these very decisions are in essence strategic.A complete characterization of the strategic interaction between users is out of reach as they dependon a variety of elusive elements. First, we do not have access to the real world’s information structure(the information set of each user and their beliefs about others’ information sets), which at any rate wouldbe highly intricate. Second, this information structure is endogenous since the incentive scheme can leadagents to work and gather additional information in a way that is hard to capture (it depends e.g. on themathematical background of the agent, the diﬃculty of the problem). Third, the incentive scheme itselfis not fully characterized by the protocol bounties and stakes as e.g. (i) a claimer presumably enjoys an(unobservable) intrinsic reward when their proof is accepted and (ii) there might be external incentives,too, as rewards related to SPRIG’s outcome might conceivably also be collectable in a secondary/derivativesmarket. However, we can identify for each category of agents a number of high-level features independent ofthe details discussed above:

Provers:

The decision to enter (i.e. start interacting) in a SPRIG instance depends on one’s conﬁdenceabout the validity of one’s claim of proof, the explicit incentives (stakes and bounties), the privateincentives (intrinsic reward of having one’s claim of proof accepted), beliefs about the skeptics’ abilityto identify a ﬂaw in the claim and beliefs about their incentives for attempting to do so. Given that theskeptics’ incentives are also partly shaped by expectations about the incentives of subsequent claimers,the validation game is dynamic; the ﬁnal, machine-level step provides the boundary condition. Oneimportant aspect of this dynamic process is that the initial claimer need not be the one to addressall (or indeed any) subsequent questions from the skeptics. If the blockchain’s users get a suﬃcientlygood grasp of the claimer’s argument, competition fostered by the incentive scheme makes it likelythat ungrounded skeptics’ challenges are answered by third parties. This should deter ‘spamming’by the skeptics. Hence, the initial claim must be suﬃciently clear for the baton to be passed; butbeing too explicit and detailed does not seem optimal for the claimers either. Indeed, in that casethey perform upfront a task that would have only been needed in case of a question, so that, if theproof is correct, there is no need to do it immediately, and if it is incorrect, the excessive level ofdetail makes it easier to detect.

Skeptics:

The decision-making process of skeptics is similar in the sense that it responds to the sameincentive scheme and relies on the formation of beliefs over the same objects. First, and obviously,Skeptic’s incentives to challenge increase with their subjective probability of the claim of proof beingwrong. Second, they decrease with the probability that any part of the initial claim can be convertedinto machine language in due time if necessary. A skeptic can deem this unlikely if they observe thathe claim of proof or parts of it are somewhat obscure, or even if they have a sense that the claimshould be correct but the proof is too convoluted to be transformed into machine language beforethe deadline. But there are other incentives for a skeptic to challenge: they might want to obtaininformation that is also relevant to another ongoing validation process; or purely out of scientiﬁcinterest.(Claim of) proof shapes are endogenous in SPRIG, emerging from the interaction between claimers and skep-tics. Indeed, it appears from the discussion above that with properly designed incentives, claimers wouldbeneﬁt from writing (claims of) proofs that are concise and elegant (and easier to convert into machinelanguage if necessary), without being excessively terse (e.g. because no third party would have enough infor-mation to defend the claim if needed). Hence, beyond providing a decentralized way to produce a consensusabout mathematical claims, SPRIG also naturally delivers balanced, ‘agent-tailored’ proofs: suﬃciently de-tailed to be convincing but suﬃciently concise to give intuition and be remembered. In particular, we expectthat one would rarely, if ever, need to reach the ﬁnal, machine-language step. As soon as the convertibilityto machine level is credible, no skeptic would have an incentive to push the process to that step. (The classicanalogy is with a government guaranteeing to intervene in case of a banking panic; if such a guarantee iscredible, then the panic would not occur, and the intervention would never be needed).The economic approach is also key for dealing with a crucial point: how to interpret the fact that a claim ofproof has been accepted by the protocol? Understanding incentives is necessary for answering this question.The simplest example is one where a claim has been accepted without any challenge: is it because all userswere fully convinced, or because the incentive scheme makes it prohibitively costly (in expectation) to askquestions? In principle, given the correct economic model (data of all incentives and information structure),any Bayesian observer can one use the protocol’s outcome to compute the probability that the proof is knownby the market participants.In Section 4, we investigate some of the game-theoretic aspects presented above in a stylized setting. Inparticular, we explain how to do a Bayesian estimate of the probability that a claim of proof is correct giventhat it has been accepted in a (highly) simpliﬁed version of the protocol.3.2.

Robustness Properties.

As in many blockchain systems, strategies in SPRIG can broadly speakingbe divided into ‘honest strategies’ and ‘attacks’. The former refers to actions whose motivations are alignedwith the purpose of the protocol. The latter refers to attempts to game the system, i.e. take advantage of theincentive scheme without contributing to the end goal of the blockchain. We expect SPRIG to be robust, asits array of parameters is suﬃciently rich to shape incentives that deter attacks. To guide the speciﬁc choiceof these parameters, we now list several potential attacks together with which parameters are to be tuned tothwart them.3.2.1.

The Carpet-Bomber.

A skeptic may decide to question all parts of a claim in the hope of stalling theprocess. The idea would be to induce the claimer to concede by lack of resources and because there are notenough third-party claimers available to help them defend. This is similar to a DDoS attack on the protocol.The skeptic’s goal is to collect the stake of the claim.This attack can be thwarted by appropriately choosing the question bounties and the time alloted for thesubsequent claimers’ replies. The former should not be too small relative to the stake and the latter shouldbe suﬃciently long.3.2.2.

The Nitpicker.

A skeptic may decide to ask for more and more details about a claim of proof andrefuse to concede until the machine level is reached. Such an attack is not only based on the hope that aﬂaw will be identiﬁed at some point but more importantly on the skeptic’s desire to delay the acceptance ofthe claim as much as possible. One reason could be that the skeptic are themselves a claimer of an identicalor similar result, which they want to be accepted ﬁrst.This attack induces the claimer to present their proof with lemmas of similar complexity. This mitigatesincentives to nitpick as it reduces the depth needed in order to expand the proof up to machine level. Well-balanced bounties (i.e. not too low) and deadlines that take into account the possibility of nitpicking (i.e.the maximal time alloted for expansion up to machine level might indeed be reached) contribute further tothwarting nitpicking attacks..2.3.

The Evasive Prover.

A claimer may decide to be evasive, i.e. to stuﬀ his claim with a combination ofirrelevant lemmas (purposely looking intricate but for which they actually hold a machine level-proof) andone lemma of complexity similar to the initial theorem, such that it is not clear to outsiders which lemma toquestion. The goal is to deﬂect questions towards the irrelevant lemmas and hence get the Claim acceptedand collect the questions’ bounties.This attack can be thwarted by choosing the following parameters appropriately: the stakes, the timealloted for the subsequent skeptics’ questions, the maximal level of a proof, and the maximal length of aclaim. The ﬁrst two should be suﬃciently large to incentivize skeptics to work and identify the weak link.The last two should be suﬃciently small in order to cap the number of deﬂection targets and force the claimerto ‘show their hand’ quickly enough.3.2.4.

The Sandbagger.

This attack mirrors the Carpet-Bombing one: a claimer may decide to answer aQuestion with a multitude of claims in the hope of stalling the process. The idea would be to induce theskeptic to concede by lack of resources and because there are not enough third-party skeptics available tohelp them continue challenging. The claimer’s goal is to collect the bounty of the Question.This attack can be thwarted by appropriately choosing the claim stakes and the time alloted for thesubsequent skeptics’ questions. The former should not be too small relative to the bounty and the lattershould be suﬃciently long.3.2.5.

The Misleader.

A claimer may decide to stuﬀ his claim with dubious lemmas and pursue one of thefollowing two strategies: • they attack the dubious lemmas and provide answers themselves in order to “intimidate” the skeptics(improving their general credence in the initial claim); • they attack the dubious lemmas and postpone answers to the very last moment in order to misleadthe skeptics into believing that other users are already challenging (so there is no point in joining thefray, as the stakes no longer seem earnable).This attack can be handled similarly to the Sandbagger attack.3.2.6. The Plagiarist.

A mathematically illiterate agent can have a ﬁrm belief that a claimer is able toanswer a given Question correctly. This may occur, for instance, on occasion, when an eavesdropper obtainsinformation that a researcher has a proof for a Question submitted by an institution, or, more frequently,when the Question concerns the claim of another claimer. The agent may then attempt to appropriate theproof of the claimer in order to collect the Question’s bounty as illustrated below.Consider a mathematician Alice who found a correct proof of a theorem. She posts a correspondingclaim on the blockchain. Then, Bob asks a question about the claim, targetting statement S . At this stage,there could be an incentive for Charlie, the mathematically illiterate agent, to immediately reply to Bob’squestion with a tautological answer: ‘the proof of S is S ’; and to stick to this strategy when questions areasked/repeated until Alice decides to provide an answer herself to ensure that her Claim is not rejected. Fromthat point onwards, Charlie replicates any of Alice’s replies and challenges her using the questions skepticsask him.In doing so, people may prefer to ask the questions straight to Alice to know if she can provide a satisfyinganswer; they only challenge Charlie with the same question if she does not. This may lead to a faster approvalof Charlie’s claim.Fortunately, Alice can defend herself: as soon as she is challenged, she answers the question, then asksCharlie the same question if it was not already done by another skeptic and, instantaneously, provides thesame answer. This thwarts the Plagiarist’s attack since it provides a zero-cost defense mechanism to ensurethat Charlie’s Claim cannot be validated before her own Claim.4. A Simplified Equilibrium Analysis

In this section, we analyze a tractable sequential game that captures several key features of the strate-gic interaction between claimers and skeptics through SPRIG. The adequate equilibrium concept for suchdynamic games with incomplete information is that of Perfect Bayesian Equilibrium (PBE) [FuTi91]. Oursetup consists of a game involving two players: Claimer (pronoun: she) and Skeptic (pronoun: he). In oursetup, Skeptic does not observe the initial conﬁdence of Claimer (a correctly estimated probability that herroof is validatable, i.e. it is possible to unroll it down to machine level) and must therefore form beliefsabout it to proceed. A PBE is a collection of actions and beliefs such that:(1) Given beliefs, the action taken by any agent at any node of the game tree maximizes her or hisexpected utility.(2) Beliefs at each node of the game tree are consistent with the history of actions, i.e. computed fromBayes’ rule.Hence, by constructing PBEs, one recognizes that the mere fact of initiating a process in the protocol (or, ingeneral, of pushing it further) has informational content: intuitively, if a claimer posts a proof, this shouldreﬂect the fact that she is relatively conﬁdent about her proof and it should lead to an upwards update ofthe outsiders’ beliefs. For simplicity, we focus on the signalling content of the entry decision, not of theparameters (deadlines, stakes and bounties) chosen at initiation. One could assume there is a set of ‘default’parameters suggested by the protocol and then using them conveys limited (although non-empty) signallingcontent. There is no fundamental obstacle in extending the solution of our game to the case where the choiceof parameters is endogenous; but this would lead to a dramatic increase in complexity without altering thekey messages that we want to convey in this section.Section 4.5 discusses the strengths and limitations of our simpliﬁed protocol model and highlights directionsin which it can be enriched.4.1.

Model Setup.

We consider a highly stylized version of SPRIG. The maximal level is two and thereare only two agents: Claimer and Skeptic. Both are risk-neutral and do not discount the future. Claimer isendowed with a claim of proof C (of some statement).We say that the claim is validatable if it is possible to unroll it down to machine level and that it isaccepted if either Skeptic renounces challenging or the claim is indeed unrolled down to machine level.Claimer initially receives a (random) signal P ∈ [0 , , uniform on [0 , , such that(4.1) E [ X | P ] = P where X = { C is validatable } (where A ( x ) = 1 if x ∈ A and A ( x ) = 0 if x / ∈ A ). We use the economic termsignal to refer to a random variable whose realization can be informative about the variable X of interest.Here, we could deﬁne U to be an independent copy of P and assume that C is validatable exactly on the event { U ≤ P } . In words, Claimer has more information than Skeptic, as she knows an updated probability, therealization of P , that her claim of proof is validatable. By contrast, Skeptic initially only has the knowledgethat P is uniformly distributed over [0 , .On top of the potential collection of bounties, Claimer derives private beneﬁts of having her claim of proofaccepted. We denote by B , B , B ≥ the beneﬁts of being accepted at level , , respectively.If Claimer decides not to post C , the game ends immediately, and both players receive a payoﬀ of . Ifshe posts C , the protocol speciﬁes a stake σ ↓ to be collected by Skeptic in case of a successful challenge. If C remains unchallenged (no questions are asked about it within time θ after publication), then Claimer gets B , and Skeptic gets . If Skeptic challenges the claim of proof within time θ , staking a bounty β , we makethe assumption that Claimer gets to know the realization of X and will be able to provide a machine-levelproof of her claim if valid at level of the protocol. Given this realization, she decides whether or not topost a claim at level within time τ after publication of Skeptic’s question. If she does not, her ﬁnal payoﬀif − σ ↓ and Skeptic’s is σ ↓ . If she does, the protocol speciﬁes two stakes σ ↑ , σ ↓ . Skeptic has a last chanceto challenge the claim within time θ after its publication: if he does not, Claimer’s payoﬀ is B + β andSkeptic’s is − β . If he does, he stakes a bounty β and then Claimer posts the machine language proof ifavailable within time τ after publication of Skeptic’s question. Claimer’s ﬁnal payoﬀ is − σ ↓ − σ ↑ − σ ↓ if shecan not provide a machine language proof, and B + β + β if she can. The corresponding Skeptic’s payoﬀsare σ ↓ + σ ↑ + σ ↓ and − β − β . Since we consider a single skeptic, the recipient of the up and down stakes σ ↑ , σ ↓ is the same and hence we can aggregate those in σ = σ ↑ + σ ↓ . From now on, we also denote σ = σ ↓ .The game is represented in Figure 4.1.The time lengths θ , θ , τ , τ are key to make sure that the status (‘challenged or not’) of a claim/questionis eventually settled but their value does not play a role in our stylized model. Hence, our game is fullycharacterized by the parameter set Θ = { B , B , B , σ , σ , β , β } . ,

0) ( B ,

0) ( − σ , σ ) ( B + β , − β )( B + β + β , − β − β )( − σ − σ , σ + σ ) n o p o s t p o s t c h a ll e n g e n o c h a ll e n g e r e p l y n o r e p l y n o c h a ll e n g e c h a ll e n g e v a li d a t a b l e n o t v a li d a t a b l e Claimer discovers ifproof is validatable.

Figure 4.1.

Simpliﬁed protocol game tree4.2.

Model Solution.Proposition 22.

The simpliﬁed protocol game possesses a unique Perfect Bayesian Equilibrium. Dependingof the parameter set Θ , this PBE takes one of the three types detailed below. In all of them, there is athreshold π ∗ := π ∗ (Θ) ∈ [0 , such that Claimer posts if and only if P ≥ π ∗ and: • Type 1: Skeptic always challenges, Claimer replies if X = 1 and replies with probability p := p (Θ) if X = 0 . Conditional on reply, Skeptic challenges w.p. q := q (Θ) (and Claimer successfullyterminates the process if and only if X = 1 ). • Type 2: Skeptic challenges the initial claim with probability q := q (Θ) ∈ (0 , . Then actions unfoldas in Type 1. • Type 3: π ∗ = 0 and Skeptic never challenges. The parameters q , q , p and π ∗ are known explicitly and their values are provided in Appendix 8.1. Theproof of the proposition can be found in the same Appendix. For illustrations on how the nature of theequilibrium (type 1, 2 or 3) depends on the parameters, see Section 4.4.4.3. Extracting Relevant Information from the Protocol’s Outcome.

One key appeal of our model isthat it allows us to compute various measures of protocol reliability, in particular the likelihood that type I ortype II errors (in the statistics sense) occur. This is crucial because the outcome of the protocol’s validationprocess for a claim (accepted/validated or rejected/invalidated) does not say, in isolation, what credence theagents’ community should have in the claim. Section 4.4 discusses further these issues.4.3.1.

Notation.

We shall need the following notation: • A (resp. A c ) is the event ‘The claim is accepted’ (resp. rejected, i.e. not accepted). • A (resp. A ) is the event ‘The claim is accepted at level ’, i.e. no question was asked (resp. level , i.e one question was asked). • Q is the event ‘Skeptic challenges Claimer after reply of Claimer’, i.e. Skeptic posts a question atlevel . • R is the event ‘Claimer replies to ﬁrst challenge’ (i.e. posts a claim at level ).4.3.2. Results.

Proposition 23.

In a Type 1 equilibrium: • The probabilities that a claim of proof is accepted (resp. accepted and valid) are P ( A ) = 12 (1 − π ∗ ) (1 + π ∗ + (1 − π ∗ ) p (1 − q )) (4.2) P ( A , X = 1) = 12 (1 + π ∗ ) (1 − π ∗ ) . (4.3) The probabilities that a claim of proof is accepted given that it is valid (resp. false) are P ( A | X = 1) = (1 + π ∗ ) (1 − π ∗ ) (4.4) P ( A | X = 0) = (1 − π ∗ ) p (1 − q ) . (4.5) • The probabilities that a claim of proof is valid given that it is accepted (resp. rejected) are P ( X = 1 | A ) = 1 + π ∗ π ∗ + (1 − π ∗ ) p (1 − q ) (4.6) P ( X = 1 | A c ) = π ∗ π ∗ + 1 − (1 − π ∗ ) p (1 − q ) . (4.7) • The probabilities that a claim of proof is accepted at level 2 (resp. 1) given that it is accepted andvalid are: P ( A | A , X = 1) = 0 (4.8) P ( A | A , X = 1) = 1 − q . (4.9) Such expressions can also be derived in the case of Type 2 and Type 3 equilibria: see the proof.

Results.

We now use our model solution to explore various trade-oﬀs faced by protocol designers.Obviously, the fact that we consider both a stylized model and a simpliﬁed information structure does notallow us to produce general and quantitative positive or normative statements about SPRIG parameters.However, our sequential game with imperfect information is rich enough to illustrate several important forcesthat must be taken into account by designers, and that can be illustrated through examples. As baselineparameters, we consider B = 10 , B = B = 40 and β = σ = β = 5 , and let the stake σ vary. Of course,one could evidence similar trade oﬀs by varying another parameter, as well as the transitions between thediﬀerent equilibrium types. We focus on varying σ merely for the sake of brevity.4.4.1. Stakes and bounties, entry and reliability ratio.

The two following properties are desirable for theprotocol: • (i) Have as many correct claims of proof as possible passing through the protocol. • (ii) Have a (very) high probability that an accepted claim of proof indeed corresponds to a proof(i.e. is correct). The left panel of Figure 4.2 evidences that the two objectives, are, in general, inconﬂict with each other. In this plot, the solid line depicts the probability that a claim is producedand accepted by the protocol, while the dashed line depicts the probability that a correct claim isproduced and accepted by the protocol. Deﬁne the reliability ratio RR as the ratio of the latter bythe former (‘dashed/solid’). RR is close to the desirable 100% as long as the equilibrium is of Type 1, and quickly deteriorates as weenter the Type 2 equilibrium region. While there is no ideal conciliation of Objectives (i) and (ii) above, theleft panel of Figure 4.2 suggests that a good way to resolve the trade-oﬀ is to select a stake σ just barelysuﬃcient to incentivize Skeptic to systematically challenge. This maximizes the probability of having a claimgoing successfully through the protocol among Type 1 equilibria. Of course, by reducing σ further, one couldincrease this probability further, but the ‘price’ to pay (the quick drop of RR ) is likely to be prohibitive.The right panel of Figure 4.2 indicates that π ∗ , the equilibrium entry threshold, increases with the bounty σ . This is consistent with intuition: if she must pay a large amount in case of successful challenge, Claimerwill only enter when she is very conﬁdent about her claim of proof. Hence, increasing σ reduces entry; butit also increases the likelihood that a claim of proof is true conditional on entry. Thus, the impact of σ on P ( A ) was a priori non-trivial. The left panel of Figure 4.2 indicates that there is a monotone decreasingrelationship between the two variables. In fact, this is always true, as one can easily deduce from the formulasof Proposition 23.

20 40 60 80 10000.10.20.30.40.50.60.70.80.91 0 20 40 60 80 10000.10.20.30.40.50.60.70.80.91

Figure 4.2.

Bounties, reliability and entry.4.4.2.

Statistical Type I and Type II Errors.

In this section, we focus on the four quantities P ( A c | X = 1) , P ( A | X = 0) , P ( X = 0 | A ) and P ( X = 1 | A c ) . All are measures of the likelihood that the protocol produces anundesirable outcome (at least from a scientiﬁc standpoint, as a claimer would presumably have no problemwith having an incorrect claim accepted). The last two correspond to the standard statistical notions of TypeI and Type II errors, respectively. The reliability ratio RR introduced above is simply the complement tothe probability of a Type I error.Being able to compute such measures is of paramount importance for protocol users and for the mathemat-ical community at large. Without them, there is no clear link between the outcome of the validation processand the credence that humans should give to a claim (or its negation). In particular, humans may want tolower their conﬁdence in claims accepted in some particular equilibrium type and state. As an illustration,consider the right panel of Figure 4.3. If the equilibrium is of Type 3, all proofs are accepted, but this isirrelevant from a scientiﬁc standpoint, as the probability of Type I error is . If stakes and bounties aredesigned in such a way that challenging is prohibitively expensive, one should not give too much credit toa claim simply because it has passed through the protocol. Such a design would be severely ﬂawed. Moregenerally, given a model that allows one to predict the equilibrium type, one can and should observe theblockchain in order to reﬁne the statement ‘the claim has been accepted’ into ‘the claim has been acceptedat level d after history h ’ and update the correctness probability accordingly.While the right panel of Figure 4.3 illustrates that there are some entirely ﬂawed protocol designs (if itgenerates a Type 3 equilibrium) it also highlights that there is no perfect design: one cannot simultaneouslydecrease the likelihood of Type I and Type II errors. Again, the juncture point between Type 1 and Type 2equilibria seems to be a good candidate: for instance, any choice of a larger σ would only very marginallydecrease the probability that an accepted claim is incorrect, but signiﬁcantly increase the probability that arejected claim is correct.Arguably, the aggregate costs of accepting invalid claims are much larger than the costs of rejecting correctones. Indeed, once accepted, an invalid claim could be used repeatedly in subsequent research or applications,so that the mistake propagates and its consequences grow. Moreover, there might not be enough incentivesor reasons to challenge the claim again in the future. By contrast, a wrongly rejected claimer could alwaysrewrite her claim of proof, improve communication and post again at a later stage, getting another chanceto be accepted.The left panel of Figure 4.3 tells a similar story, with a perspective closer to the point of view of theclaimer. Reducing the risk of rejecting a correct claim of proof increases the risk of accepting a wrong claimof proof. Once again, thinking about the aggregate costs of both types of error should allow designers toselect their preferred parameters.As can be seen, producing graphs such as those of Figure 4.3 gives a lot of information about whichparameter values are the most eﬀective. This will be precious for future (more realistic and quantitativelyaccurate) model descriptions of the protocol.

20 40 60 80 10000.10.20.30.40.50.60.70.80.91 0 20 40 60 80 10000.10.20.30.40.50.60.70.80.91

Figure 4.3.

Probability of false positives and related reliability measures4.4.3.

Terminations at Intermediate Level.

Fixing the likelihood of statistical errors discussed above, a shortacceptance process (termination after a few steps) has advantages and drawbacks. On the one hand, acceptingcorrect claims of proof quickly saves signiﬁcant time and intellectual energy that can be invested in tacklingother problems. On the other hand, longer acceptance processes can have positive externalities, as theyinvolve clarifying steps and new lemmas that could be useful in other contexts. In the current discussion,we wish to focus on the former point and consider that accepting a correct proof quickly is desirable—to theextent of course that it does not harm its credibility too much, see Section 4.4.2. But the latter requirementis key: in our model, claims accepted immediately after posting have little scientiﬁc relevance. Indeed, wesaw that the reliability ratio quickly decreases away from 100% as the likelihood to terminate immediatelyincreases away from 0. Hence, a good proxy for ‘having proofs that terminate before ﬁnal level withoutharming reliability’ is the probability of termination after exactly 1 step, P ( A | A , X = 1) . This quantity isdepicted in Figure 4.4.Again, the behaviour of this quantity as a function of σ is non-trivial. Indeed a larger stake (i) increasesthe incentives to challenge (direct ‘greed’ eﬀect) but also (ii) decreases them as the average quality of a proofis higher (this is evidenced by the fact that π ∗ is larger). As before, a good choice seems to take the lowest σ that implements a Type 1 equilibrium. Figure 4.4.

Probability of correct termination at intermediary level.4.5.

Discussion of the Model’s Analysis.

Our stylized model captures several important features. First,we take into account the likely information asymmetry that exists between claimers and skeptics (at least atthe time a claim is posted). Then, the theory of signaling games allows us to predict how the mere fact ofosting a claim impacts the ‘market’s beliefs’. Second, our simpliﬁed game is dynamic. In particular, we canunderstand the impact of stakes and bounties at further levels on current decisions, and get estimates of theprobability that the protocol stops before the ﬁnal, machine-language step. Third, our model is rich enoughto inform a Bayesian agent about the correctness of a claim, using its status in the protocol (accepted vsrejected).The model can be enriched in several directions. First, Skeptic could be able to perform some work onhis own: by paying some cost, he could be able to access a private signal about the correctness of Claimer’sproof. We have studied this possibility in a one-period version of the model. New insights appear, but theenlargement of Skeptic’s strategy set also implies the emergence of multiple equilibria. Those are potentiallyinteresting but render predictions diﬃcult. Second, the information structure could be enriched: there couldbe several skeptics (and several subsequent claimers replying to these skeptics), each potentially endowedwith their own information. A challenging and very interesting question is to understand how informationdynamically incorporates in such a context. Third, our stylized model is not able to answer questions suchas whether we should expect SPRIG instances to generally terminate at the top level or the machine level,or instead in between, or regarding the structure of the tree that an initial claim or question generates.5.

Applications and Outlook

In this section, we discuss how the SPRIG protocol provides a solution to the challenges of mathematicalderivation raised in Section 1, which are centered around the communication of trustable, succinct, andinformative proofs in a system with agents with various levels of information.5.1.

Theorem Veriﬁcation.

The validation of a theorem’s proof by authors can be done through a protocol:they can place a stake (which may increase over time, as discussed in 2.4.1) and set up a SPRIG instancefor a given amount of time, incentivizing anyone to ﬁnd a gap in their proof. Compared to the classicalpublishing model, many more agents are incentivized to be skeptical of the proof (and no one is pressured toparticipate either), and their questions can be assumed to be made in good faith (since there is nothing togain by asking trivial questions); also, anonymity of the reviewers is guaranteed (unlike the reviewing process,which consists in the redaction of a report, and which relies on an editorial board, both of which may leakinformation). The results of the validations can thus be made transparent and convey information about thevalidation of theorem. At the same time, as they provide their claim of proof, the authors can also publish apaper written in an informal way, which may help the community participate in the process more rationally.5.2.

Bounty for Open Problem.

The research on open questions in mathematics can be incentivized bybounties, such as the celebrated Millennium Problems posted by the Clay Institute. In cases such as that ofthe Millennium Problems, an open two-sided question is at the root of the problem, as discussed in Section2.4.7.SPRIG allows one to outsource the validation of claims of proof (which in principle relies on a committee),to disincentivize bogus claims of proof (a stake must be put to propose a claim of proof), and to limit conﬂictsof interest.Incentives for shorter answers can also be added, by creating extra challenges, with tighter limits on prooflength, or by using claim-of-proof-dependent parameters (as in Section 2.4.3).5.3.

Security Proof Certiﬁcation.

An organization may want to elicit trust in their system. For instance,it may want to publish their source and incentivize the public to ﬁnd security ﬂaws in any of N subsystems.It may have a limited number K of bounties available for ﬁnding problems in any of the N subsystems. Thismay be done using the multi-question bounty variant discussed in Section 2.4.7, and elicit a trust in thesystem (e.g. that there is no ﬂaw in any of the subsystems) that is as strong as if there were N bounties,while at the same time locking up and risking only K bounties worth of capital.5.4. Automated Theorem Proving.

A great deal of eﬀort has been put in recent years into constructingintelligent automated provers, relying on e.g. reinforcement learning techniques or text prediction mecha-nisms, with encouraging successes [UrJa20]. SPRIG can serve as playground for the development of suchagents, allowing them to participate using some level of information (in particular by ﬁrst developing anability to write low-level proofs or to validate them), and learning by playing..5.

Derivatives Markets.

A promising feature of SPRIG is that its outcomes can then be used as oraclesfor other smart contracts. In particular, other prediction markets can run on such outcomes.For instance, agents can inject information by betting that a certain question will or will not be answeredbefore a certain time, or that a certain step in a proof will be unanswered, conditionally on there being anunanswered question (thereby injecting some information about where relevant questions may be).Securities markets relying on SPRIG may also prove to be useful for incentivizing diﬀerent types ofcontributions for agents. For instance, an agent able to provide good formalizable heuristics but not knowinghow to formalize them may participate in a market betting that a certain question will be answered beforea certain time, buy (for relatively cheap) a security betting that it will be answered, and then publish herheuristics; if the heuristics looks like they can be formalized by some agent in time, the odds for bettingthat the question will be answered will change, and she can net a proﬁt by re-selling her security or lettingit mature. Thus, she can inject interesting information into the market, i.e. information that changes thefeasibility landscape of proof construction by the community.5.6.

Beyond Mathematical Reasoning.

Beyond mathematics, many ﬁelds rely on rigorous formal reason-ing intertwined with external elements of reasoning. Adding support for external sources for SPRIG appearspromising for a number of applications: • Support for importing empirical knowledge in the protocol: this would allow it to submit and verifyarguments pertaining to experimental sciences. • Support for numerically-justiﬁed heuristic arguments or recognized heuristics: this would allow foruseful derivations in e.g. theoretical physics. • Support for validated time-stamped predictions: this could help rational discourse in disciplines basedon forecasting. An economic model could be presented and challenged similarly to claims of proofin SPRIG. In lieu of the machine-level terminal condition, the ﬁnal validation step would be givenby the publication of oﬃcial numbers. A ‘claim’ (i.e. a model) would then be validated if it hascorrectly predicted a n-tuple of economic variables (e.g. interest rate set by the Fed, GDP) up to aprespeciﬁed error margin. • Support for oracles with zero-knowledge proofs: this would allow for auditable arguments in publicdebates in which certain sources must be protected.6.

Conclusion

In this paper, we introduced the Smart Proofs by Recursive Information Gathering (SPRIG) protocol,which allows agents to propose and verify succinct and informative proofs in a decentralized fashion. Claimersand skeptics ‘debate’ about statements and their proofs: consensus arises from the skeptics being able torequest details on steps that they feel could be problematic and from the claimers being able to providedetails answering the skeptics’ requests. Importantly, to participate in the process, claimers and skepticsmust attach a bounty/stake to their moves: this gives the proper incentive for subsequent users to verifythose. As a result, agents with various types of information can participate and inject their knowledge into theproof construction and veriﬁcation process; this allows one to strike a balance between the ‘short collection ofinsightful statements’ vs ‘list of all the statements needed to establish perfect trust’ tradeoﬀ in mathematicswriting.In our claim of proof format, mathematical proofs can be viewed as trees, in which claimers and skepticscan expand branches containing the relevant level of detail for the agents in the community: branches onlygrow in places where there is uncertainty, until either that uncertainty is cleared or a speciﬁc problem isisolated. This resulting subtree thus serves as a proof that is useful to the community, as it makes theconsensus-building process transparent and can help agents build their own credence in the validity of theproof.Our analysis of SPRIG and its robustness is based on game-theoretic considerations that take into accountthe various incentives of the agents, address possible attacks, and leading up to a detailed equilibrium analysisof a simpliﬁed protocol. While the complete SPRIG protocol is very complex to study analytically, our resultsgive a clear insight into a number of qualitative aspects of its strategic features.We also present a number of variants and applications of SPRIG, allowing it to be useful in numerouscontexts, and demonstrating its versatility. cknowledgements

The authors would like to thank Tarun Chitra for enlightening and inspiring explanations about blockchainsand many other topics, Thibaut Horel for numerous useful suggestions about the present manuscript, as well asJuhan Aru, Dmitry Chelkak, Fedor Doval, Julien Fageot, Patrick Gabriel, Max Hongler, Kalle Kytölä, RedaMessikh, Christophe Nussbaumer, Daniele Ongari, Victor Panaretos, Stanislav Smirnov, Fredrik Viklund,Jérémie Wenger, and Matthieu Wyart for interesting conversations.C.H. acknowledges support from the Blavatnik Family Foundation and the Latsis Foundation.

E-mail Addresses

Sylvain Carré: [email protected]

Franck Gabriel: [email protected]

Clément Hongler: [email protected]

Gustavo Lacerda: [email protected]

Gloria Capano: [email protected] eferences . Appendix: Claim of Proof ExamplesIn this appendix, we give a number of examples of proofs written in the claim of proof structure described inSection 1.2.5 (we naturally only give subtrees of the entire trees). We denote by A the standard backgroundassumptions (including basic axioms and whatever statement is taken for granted).7.1. A simple proof.

A simple proof concerns the existence of an inﬁnite numer of primes.In this case, we have γ = { there exists an inﬁnite number of primes } and the root of the claim of proof is • S ∗ : A ∗ = ⇒ C ∗ , with A ∗ = A and C ∗ = γ .The nodes at distance from the root are: • S : A = ⇒ C , where A = A and C corresponds to ‘For any N ≥ , N ! + 1 is not divisible byany k ∈ N with ≤ k ≤ N ’; • S : A = ⇒ C , where A = A ∪ { C } and C corresponds to ‘For any N ≥ , there exists aprime number p > N ’. • S : A = ⇒ C , where A = A ∪ { C } and C = C ∗ .If we expand the proof of S (at distance from the root), we ﬁnd: • S , : A , = ⇒ C , , where A , = A and C , corresponds to ‘For any N ≥ , any prime factorof N ! + 1 is larger than N ’. • S , : A , = ⇒ C , , where A , = A ∪ { C , } and C , = C .7.2. A proof by contradiction.

Proofs by contradiction can be formulated naturally in our framework. Aclassical proof by contradiction is that of the fundamental theorem of algebra α = ⇒ γ , where • α corresponds to ‘ P is a complex polynomial of degree ≥ ’. • γ corresponds to ‘there exists z ∈ C such that P ( z ) = 0 ’.In this case, the root of the claim of proof is • S ∗ : A ∗ = ⇒ C ∗ , where A ∗ = A ∪ { α } , and C ∗ = γ .The nodes at distance from the root are: • S : A = ⇒ C , where A = A ∗ and C corresponds to ‘There exists M, R > such that | P ( z ) | ≥ M for all | z | ≥ R ’; • S : A = ⇒ C , where A = A ∗ and C corresponds to ‘ | P | cannot have a nonzero minimum on C ’; • S : A = ⇒ C , where A = A ∗ ∪ { C , C } and C = C ∗ .If we go further into the details of the proof of S : A = ⇒ C (the heart of the proof by contradiction),we have (at distance from the root): • S , : A , = ⇒ C , where A , = A and C , corresponds to ‘If P ( z ) (cid:54) = 0 , then there exists z (cid:48) ∈ C such that | P ( z (cid:48) ) | < | P ( z ) | ’; • S , : A , = ⇒ C , , where A , = A , ∪ { C , } and C , corresponds to ‘If | P | has a nonzerominimum on C , then we have a contradiction’; • S , : A , = ⇒ C , , where A , = A ∪ { C , } and C , = C .Ultimately, in the above format, proofs by contradictions must explicitly carry their ‘wrong assumption’ (i.e.the negation of the conclusion) in the conclusion part of the statements: if we wish to assume the negation ¬ C ∗ of a the conclusion C ∗ to arrive at a contradiction, this will involve substatements S : A = ⇒ C , where C will be of the form ‘if ¬ C ∗ then ...’. While this makes the proofs by contradiction heavier in notation,this makes individual statements easier to verify: if the proof is correct, the conclusion of each statement iscorrect (and not contingent on an assumption which itself is wrong).7.3. Inverse Function Theorem Proof.

A richer example of proof is that of the inverse function theorem.In this case, we have: • α : Let U ⊂ R n be an open set and let f : U → R n be a function that is C with derivative x (cid:55)→ Df | x .Let x ∗ ∈ U be a point such that Df | x ∗ is an invertible matrix. • γ : There exists an open neighborhood V ⊂ U of x ∗ and an open neighborhood W of f ( x ∗ ) such that f | V : V → W is a bijection from V to W , and an inverse ( f | V ) − : W → U that is diﬀerentiable at f ( x ∗ ) with derivative ( Df | x ∗ ) − . = ⇒ γA ∗ = α : P is a complex polynomial of degree ≥ C ∗ = γ : ∃ z ∈ C s.t. P ( z ) = 0 TA = A ∗ C : ∃ M, R > s.t. | P ( z ) | ≥ M for all | z | ≥ RS A = A ∗ C : | P | does not havea nonzero minimum on C . S A = { A ∗ , C , C } C = C ∗ S A , = A ∗ C , : If | P ( z ) (cid:54) = 0 |∃ z (cid:48) s.t. | P ( z (cid:48) ) | < | P ( z ) | S , A , = { A , C , } C , : if | P | has anon-zero minimum,we have a contradiction. S , A , = { A , C , } C , = C : | P | does not havea non-zero minimumon C . S , Figure 7.1.

Proof by Contradiction in the Fundamental Theorem of Algebra. Straightarrows denote importation of assumptions, while curved arrows denote importation of con-clusions.In this case, the root of the claim of proof is: • S ∗ : A ∗ = ⇒ C ∗ , where A ∗ = A ∪ { α } and C ∗ = γ .Informally, we ﬁrst argue that without loss of generality, we may assume that x ∗ = 0 , f ( x ∗ ) = 0 , Df (cid:12)(cid:12) x ∗ = Id n .In this case, the nodes at distance from the root are: • S : A = ⇒ C , where A = A ∗ and C = ( α = ⇒ γ ) with α = (cid:110) x ∗ = 0 , f ( x ∗ ) = 0 , Df (cid:12)(cid:12) x ∗ = Id n (cid:111) . • S : A = ⇒ C , where A : A ∪ { C } and C = C ∗ .If we go further into the details of why S holds true, we ﬁnd (at distance from the root): • S , : A , = ⇒ C , , where A , = A , and, denoting by (cid:107) · (cid:107) M n the operator norm on n × n matrices, C , = ( α = ⇒ γ , ) , with γ , γ , = (cid:26) ∃ r > with (cid:107) Df (cid:12)(cid:12) x - Id n × n (cid:107) M n ≤ ∀ x ∈ B (0 , r ) (cid:27) . • S , : A , = ⇒ C , , where A , = A ∪ { C , } and C , = ( α = ⇒ γ , ) , with γ , = (cid:26) ∃ r > such that ∀ y ∈ R n , the function x (cid:55)→ x + y − f ( x ) is − Lipschitz on B (0 , r ) (cid:27) , • S , : A , = ⇒ C , , where A , = A ∪ { C , } and C , = ( α = ⇒ γ , ) , with γ , = {∃ r > ∀ y ∈ B (0 , r/

2) : ∃ ! x ∈ B (0 , r ) such that f ( x ) = y } . • S , : A , = ⇒ C , , where A , = A ∪ { C , } and C , = ( α = ⇒ γ , ) , with γ , = {∃ U, V open neighborhood of such that f is a bijection U → V } . • S , : A , = ⇒ C , , where A , = A ∪ { C , } and C , = ( α = ⇒ γ , ) , with γ , = γ , ∩ (cid:8) ∃ f − : V → U, f − is the inverse of f and f − is diﬀerentiable at with diﬀerential Id n × n (cid:9) . • S , : A , = ⇒ C , , where A , = A ∪ { C , } and C , = C .f we go further into the details of why S , holds true, we ﬁnd (at distance from the root): • S , , : A , , = ⇒ C , , , where A , , = A , and C , , = ( α = ⇒ γ , , ) , with γ , , = γ , ∩ { if h n is a seq. in V \ { } with h n → , we have (cid:107) h n (cid:107) / (cid:107) f ( h n ) (cid:107) → } . • S , , : A , , = ⇒ C , , , where A , , = A , ∪ { C , , } and C , , = ( α = ⇒ γ , , ) , with γ , , = γ , , ∩ { if h n is a seq. in V \ { } with h n → , we have (cid:107) f ( h n ) − h n (cid:107) / (cid:107) f ( h n ) (cid:107) → } . • S , , : A , , = ⇒ C , , , where A , , = A , ∪ { C , , } and C , , = ( α = ⇒ γ , , ) , with γ , , = γ , , ∩ (cid:8) ∃ f − : V → U f − is the inverse of f | U (cid:9) ∩ (cid:8) if k n is a seq. in V \ { } with k n → we have (cid:107) k n − f − ( k n ) (cid:107) / (cid:107) k n (cid:107) → (cid:9) . • S , , : A , , = ⇒ C , , , where A , , = A , ∪ { C , , } and C , , = C , . Remark . As the above example reveals, the context of a proof needs to be explicitly carried from statementto statement; a good concrete implementation of the claim of proof format should facilitate this operation inthe writing of proofs.. Appendix: Game-Theoretic AnalysisIn this appendix, we give the proofs of the statements of Section 4.8.1.

Proof of Proposition 22.

First, note that the expected payoﬀ of Claimer is increasing in P . Hence,for any given anticipated actions of Skeptic, if initially posting at some P is optimal, then it is also thecase for any P (cid:48) > P . Hence, the entry decision of the claimer must be of the threshold form given in theProposition.Let h e and h be the histories (Post, Challenge) and (Post, Challenge, Reply) respectively, and π e = P h e ( X = 1) , π = P h ( X = 1) Skeptic’s beliefs at these histories. Note that π e = (1 + π ∗ ) .8.1.1. Subgame equilibria at h e . Let us check when (Reply, No Challenge) is an equilibrium of the subgameat h e . In this scenario, Claimer always replies so observing Reply has no informational content: π e = π . ForNo Challenge to be the best response, we need: − β ≥ π ( − β − β ) + (1 − π )( σ + σ ) i.e. π ≥ π ∗ ≡ σ + σ + β σ + σ + β + β . (8.1)Because Reply is trivially the best response to No Challenge, we have constructed an equilibrium of thesubgame at h e as soon as π = π e = (1 + π ∗ ) satisﬁes (8.1).Now check when (Reply if X = 1 , Reply if X = 0 w.p p , Challenge w.p. q ) is an equilibrum of thesubgame at h e . In this scenario, Bayes’ rules indicates that(8.2) π = π e π e + (1 − π e ) p . For Skeptic to be indiﬀerent between challenging or not, we must have equality in (8.1). Using (8.2), we seethat this implies:(8.3) p = p ( π e ) ≡ π e (1 − π ∗ ) π ∗ (1 − π e ) . Since p < , π e < π ∗ . When X = 1 , it is trivially optimal for Claimer to reply. When X = 0 , she mustbe indiﬀerent. Not replying gives payoﬀ − σ , while replying gives the expected payoﬀ (1 − q )( B + β ) − q ( σ + σ ) . This pins down the equilibrium value of q :(8.4) q = B + σ + β B + σ + β + σ . This completes the description of equilibria of the subgame at h e . Indeed, Skeptic cannot be expected tochallenge with certainty, for the best response of Claimer would be to never reply when X = 0 , which in turnwould make systematic challenge suboptimal.Hence, we have characterized equilibrium expected proﬁts at h e :- if π e ≥ π ∗ : ( B + β , − β ) - if π e < π ∗ :- X = 1 : ( q ( B + β + β ) + (1 − q )( B + β ) , − q ( β + β ) − (1 − q ) β ) - X = 0 , ( − σ , (1 − p ) σ + p ( − β )) .8.1.2. Type 1 Equilibria.

For such an equilibrium to exist, we must have π e ≡

12 (1 + π ∗ ) < π ∗ (8.5) < φ ( p e ) ≡ π e ( − q ( β + β ) − (1 − q ) β ) + (1 − π e )((1 − p ( π e )) σ − p ( π e ) β ) . (8.6)These conditions, obtained from the results of Section 8.1.1, ensure that Skeptic has a positive continuationvalue after Claimer posts C . Indeed, his expected payoﬀ at node h e is positive. They are suﬃcient toguarantee an equilibrium of the Type 1 exists, as soon as Claimer is indeed willing to post if and onlyif P ≥ π ∗ . That is, we have an indiﬀerence condition at P = π ∗ , where the expected proﬁt of posting, π ∗ ( q ( B + β + β ) + (1 − q )( B + β )) − (1 − π ∗ ) σ , must equate 0, the proﬁt of not posting. Hence(8.7) π ∗ = σ q ( B + β + β ) + (1 − q )( B + β ) + σ . he “ − π e ” term in the denominator of p ( π e ) cancels out with the “ − π e ” term in (8.6), so that thefunction φ is in fact linear. Moreover φ ( π ∗ ) < as the only positive term of φ , (1 − p ( π e )) σ , vanishes. Inparticular, if φ (0) < , a Type 1 equilibrium cannot exist. The properties of φ will also be important tocharacterize Type 2 equilibria.A Type 1 equilibrium exists if and only if conditions (8.5), (8.6) and (8.7) are simultaneously satisﬁed.8.1.3. Type 2 Equilibria.

For such an equilibrium to exist, Skeptic must be indiﬀerent between challengingthe initial claim or not. Hence, we must have π e ≡

12 (1 + π ∗ ) < π ∗ (8.8) φ ( π e ) . (8.9)If (8.8) is not satisﬁed, Skeptic makes a negative proﬁt by continuing because he cannot challenge back uponreply of Claimer. Hence, he cannot be indiﬀerent between challenging the initial claim or not. Equation (8.9)writes down explicitly the payoﬀ of replying when (8.8) holds. (8.9) has a valid root if and only if(8.10) φ (0) ≥ . Claimer should also be indiﬀerent between posting C and not posting when P = π ∗ . That is, her expectedpayoﬀ of posting, (1 − q ) B + q ( π ∗ ( q ( B + β + β ) + (1 − q )( B + β )) − (1 − π ∗ ) σ ) , should be 0. Thisgives:(8.11) q = B B − π ∗ ( q ( B + β + β ) + (1 − q )( B + β )) + (1 − π ∗ ) σ . A Type 2 equilibrium exists if and only if conditions (8.8), (8.10) and (8.11) are simultaneously satisﬁed,with < q < .8.1.4. Type 3 Equilibria.

From the previous analysis, it is now clear that if = π e ( π ∗ = 0) ≥ π ∗ or < π ∗ but φ (0) < then we have a type 3 equilibrium: Claimer always enters the game and Skeptic never challenges.8.1.5. Existence and Uniqueness. If ≥ π ∗ , a Type 3 equilibrium exists and no equilibrium of Type 1 orType 2 can exist. From now on, assume < π ∗ , and successively (i) φ (0) < and (ii) φ (0) ≥ .Case (i): we know that a Type 3 equilibrium exists and we have seen that no equilibrium of Type 1 orType 2 can exist. (That φ (0) < indicates that Skeptic does not want to challenge even under the worstpossible belief about the correctness of C . Hence, for any belief π ∗ about the posting threshold, Skepticwould also ﬁnd it optimal not to challenge.)Case (ii): we know that no equilibrium of Type 3 exists. Assume a Type 2 equilibrium exists, characterizedby, say, π ∗ T and q ,T . Recall that we have(8.12) − q ,T ) B (cid:124)(cid:123)(cid:122)(cid:125) > + q ,T ( π ∗ T ( q ( B + β + β ) + (1 − q )( B + β )) − (1 − π ∗ T ) σ ) (cid:124) (cid:123)(cid:122) (cid:125) payoﬀ of Claimer if she is challenged with < q ,T < . This implies that Claimer expects a negative proﬁt conditional on being challenged at π ∗ T . In particular, if a Type 1 equilibrium were to exist, it would need to feature π ∗ = π ∗ T > π ∗ T . But φ decreases (the incentives of Skeptic to challenge decrease with the probability that Claimer is right), so φ ( π ∗ T ) < φ ( π ∗ T ) = 0 and Skeptic has no incentive to challenge, so that one cannot construct a Type 1equilibrium.At this stage, we have seen that the diﬀerent types of equilibria are mutually exclusive. To show thatthere is always one, remark that if Type 2 and Type 3 equilibria do not exist, then < π ∗ , φ (0) ≥ butthere is no value of q ∈ (0 , such that (8.11) holds. This means that at the unique root π of φ over [0 , π ∗ ] ,for all q ∈ [0 , , the expected payoﬀ of posting is non-negative. In particular this holds at q = 1 : at π (meaning: when P = π and under the belief that Claimer posts if and only if P ≥ π ), Claimer is willing topost even conditional on Skeptic always challenging, and Skeptic is indiﬀerent between challenging or not.As the candidate π ∗ decreases away from π , the incentives to Challenge increase, and the expected payoﬀof Claimer decrease at π ∗ . Hence, if we deﬁne π ∗ as the inﬁmum of the π such that Claimer is willing topost even conditional on Skeptic always challenging (a bounded, non-empty set from what we saw above),we have at π ∗ that Claimer is indiﬀerent between posting or not, and that Skeptic will always challenge: wehave constructed a Type 1 equilibrium..2. Proof of Proposition 23.

We will need the following:

Lemma 25.

The probabilities that Claimer enters the game conditional on the claim of proof being correct(resp. incorrect) are P ( P ≥ π ∗ | X = 1) = (1 + π ∗ ) (1 − π ∗ ) (8.13) P ( P ≥ π ∗ | X = 0) = (1 − π ∗ ) . (8.14) Proof.

From Bayes’ formula,(8.15) P ( P ≥ π ∗ | X = 1) = E [ X | P ≥ π ∗ ] P ( P ≥ π ∗ ) P ( X = 1) . Since E [ X | P ] = P and P is uniformly distributed,(8.16) E [ X | P ≥ π ∗ ] = E [ E [ X | P ] | P ≥ π ∗ ] = E [ P | P ≥ π ∗ ] = 1 + π ∗ , and P ( P ≥ π ∗ ) = 1 − π ∗ , P ( X = 1) = , which yields the ﬁrst equality. The second one is obtained usingsimilar arguments. (cid:3) We are now in a position to prove the results relative to Type 1 equilibria. We ﬁrst compute the proba-bilities that a claim of proof is accepted given that it is correct/incorrect. Since a correct claim of proof isaccepted if and only if Claimer posts, P ( A | X = 1) = P ( P ≥ π ∗ | X = 1) , the value of which is given in Lemma25. An incorrect claim of proof is accepted if and only if Claimer posts, then replies and Skeptic does notchallenge at the last step. Hence this has probability P ( A | X = 0) = P ( P ≥ π ∗ , R , Q c | X = 0) . Using thefact that P ( P ≥ π ∗ , R , Q c | X = 0) = P ( P ≥ π ∗ | X = 0) P ( R ) P ( Q c ) and using Lemma 25, we obtain theformula for P ( A | X = 0) .Note that the probability that a proof is true is . Hence P ( A , X = 1) = 12 P ( A | X = 1) (8.17) P ( A ) = 12 ( P ( A | X = 1) + P ( A | X = 0)) . (8.18)This yields the result for the probability that a claim of proof is accepted and true as well as accepted.The probabilities that a proof is true given that it is accepted/rejected are obtained by applying P ( X = 1 | A ) = P ( A , X = 1) P ( A ) (8.19) P ( X = 1 | A c ) = (1 − P ( A | X = 1)) P ( X = 1)1 − P ( A ) . (8.20)Finally, in a Type 1 equilibrium, Skeptic always challenges at the ﬁrst step, so that P ( A | A , X = 1) = 0 .Moreover, if X = 1 , the proof ﬁnishes after the Reply of Claimer if and only if Skeptic renounces to challenge.This occurs with probability − q , hence P ( A | A , X = 1) = 1 − q .Using similar computations, one can obtain these event probabilities in the case of a Type 2 equilibrium.Speciﬁcally, in a Type 2 equilibrium: • The probabilities that a claim of proof is accepted (resp. accepted and true) are P ( A ) = 12 (1 + π ∗ ) (1 − π ∗ ) + 12 (1 − π ∗ ) (1 − q + q p (1 − q )) (8.21) P ( A , X = 1) = 12 (1 + π ∗ ) (1 − π ∗ ) . (8.22) • The probabilities that a claim of proof is accepted given that it is true (resp. false) are P ( A | X = 1) = (1 + π ∗ ) (1 − π ∗ ) (8.23) P ( A | X = 0) = (1 − π ∗ ) (1 − q + q p (1 − q )) . (8.24) The probabilities that a claim of proof is true given that it is accepted (resp. rejected) are P ( X = 1 | A ) = 1 + π ∗ π ∗ + (1 − π ∗ ) (1 − q + q p (1 − q )) (8.25) P ( X = 1 | A c ) = π ∗ π ∗ + 1 − (1 − π ∗ ) (1 − q + q p (1 − q )) . (8.26) • The probabilities that a claim of proof is accepted at level 0 (resp. 1) given that it is accepted andtrue are: P ( A | A , X = 1) = 1 − q (8.27) P ( A | A , X = 1) = q (1 − q ) ..