[PDF] Explosive Proofs of Mathematical Truths

Abstract

Mathematical proofs are both paradigms of certainty and some of the most explicitly-justified arguments that we have in the cultural record. Their very explicitness, however, leads to a paradox, because their probability of error grows exponentially as the argument expands. Here we show that under a cognitively-plausible belief formation mechanism that combines deductive and abductive reasoning, mathematical arguments can undergo what we call an epistemic phase transition: a dramatic and rapidly-propagating jump from uncertainty to near-complete confidence at reasonable levels of claim-to-claim error rates. To show this, we analyze an unusual dataset of forty-eight machine-aided proofs from the formalized reasoning system Coq, including major theorems ranging from ancient to 21st Century mathematics, along with four hand-constructed cases from Euclid, Apollonius, Spinoza, and Andrew Wiles. Our results bear both on recent work in the history and philosophy of mathematics, and on a question, basic to cognitive science, of how we form beliefs, and justify them to others.

Full PDF

EExplosive Proofs of Mathematical Truths

Scott Viteri

Department of Computer ScienceStanford UniversityStanford, CA 94305 USASocial & Decision SciencesCarnegie Mellon University5000 Forbes AvenuePittsburgh, PA 15213 USA

Simon DeDeo

Social & Decision SciencesCarnegie Mellon University5000 Forbes AvenuePittsburgh, PA 15213 USASanta Fe Institute1399 Hyde Park RoadSanta Fe, NM 87501 USA [email protected]

April 2, 2020

Abstract

Mathematical proofs are both paradigms of certainty and some of the most explicitly-justiﬁedarguments that we have in the cultural record. Their very explicitness, however, leads to aparadox, because their probability of error grows exponentially as the argument expands.Here we show that under a cognitively-plausible belief formation mechanism that combinesdeductive and abductive reasoning, mathematical arguments can undergo what we call anepistemic phase transition: a dramatic and rapidly-propagating jump from uncertainty tonear-complete conﬁdence at reasonable levels of claim-to-claim error rates. To show this, weanalyze an unusual dataset of forty-eight machine-aided proofs from the formalized reasoningsystem Coq, including major theorems ranging from ancient to 21st Century mathematics,along with four hand-constructed cases from Euclid, Apollonius, Spinoza, and Andrew Wiles.Our results bear both on recent work in the history and philosophy of mathematics, and ona question, basic to cognitive science, of how we form beliefs, and justify them to others.Mathematical proofs are commonly taken as a gold standard for certainty. How we come to discover, justify,and believe those proofs, however, is a diﬀerent matter. In part because the things mathematicians believeare often remote from direct experience and what can be tested experimentally, the study of mathematicalcognition has the potential to give new insights into the workings of the mind, and is a growing domain ofresearch in cognitive science, psychology, and neuroscience (1–3). At the same time, the explicit and idealizednature of mathematical proof can also give us insight into a central cultural practice of the modern world,that of justiﬁcation via abstract reason-giving (4).One view of mathematical proofs is that they are at heart speciﬁcations for, or summaries of, rule-basedsymbol manipulation. In this picture, formalized most famously by Alan Turing (5), proofs specify a sequential,mechanical process of logical deductions that, operating according to purely syntactic rules, terminates witha formula corresponding to the thing to be proved.Taken seriously as an account of mathematical practice, however, this model has many problems. Mostrelevant for our work, it implies that mathematical knowledge may be less justiﬁed than ordinary beliefs.This is because, in an argument that goes back to Hume (6), the possibility of errors in a sequential deductioncompounds step by step (7). At an error probability of 10 − , for example, a proof with more than sevenhundred steps is more likely to have failed than not.Worse yet, mathematicians themselves often go out of their way to disclaim the precision required by the Turingpicture. Henri Poincaré writes, for example, that he is “absolutely incapable of adding without mistakes” (8),a claim that under standard accounts of mathematical belief bears comparison to a neurosurgeon boasting ofan unsteady hand. a r X i v : . [ c s . S C ] M a r iteri & DeDeo Even allowing for rhetorical exaggeration, the commonness of sentiments like Poincaré’s among practicingmathematicians points to additional epistemic processes for both discovery and justiﬁcation. Mathematiciansdescribe, for example, posing questions to nature by the checking of cases (9), drawing on abductive oranalogical principles (10, 11), and using intuition and idiom (12).This paper uses an unusual dataset—machine-aided proof networks—to provide new empirical support forlong-standing qualitative accounts of how proofs lead to belief. We show that mathematical proofs have anetwork structure that enables what we refer to as an epistemic phase transition (EPT): informally, while thetruth of any particular path of argument connecting two points decays exponentially in force, the numberof distinct paths increases. Depending on the network structure, the number of distinct paths may itselfincrease exponentially, leading to a balance point where inﬂuence can propagate at arbitrary distance (13).In the presence of bidirectional inference— i.e. , both deductive and abductive reasoning—an EPT enablesmathematical arguments to achieve near-unity levels of certainty even in the presence of skepticism about thevalidity of any particular step. Deductive and abductive reasoning, as we show, must be well-balanced forthis to happen. A relative overconﬁdence in one over the other can frustrate the eﬀect, a phenomenon werefer to as the abductive paradox.We present these results in three parts. We ﬁrst introduce, and justify, a simple model of belief formationdrawn from the cognitive science literature. We then describe the data sets we apply this model to, and thenpresent the results of the analysis. Technical details of the implementation can be found in the Materials andMethods.

Our model of belief formation is based in two core features of proofs. First, that while proofs are usuallypresented in a linear narrative, most will refer back to the same claims at diﬀerent places. This turns a lineardeduction into a network of interacting claims. A proof that combines two independent lines of reasoning ata particular point may be robust to counterexamples that invalidate one of those paths (14): in this way, an1 − (1 − (cid:15) ) k failure rate may be improved, by orders of magnitude, to (1 − (1 − (cid:15) ) k ) . As von Neumannestablished in the case of faulty computer logic gates (15), multiple paths in a modular organization canovercome noise.Second, that reasoning on the basis of coherence, intuition, or analogy can also support evidentiary ﬂow“down” from conclusions as well as “up”, deductively, from axioms (16, 17). A proof that 1 + 1 = 2 (18)helps establish the validity of the axioms and propositions that precede it, rather than resolving lingeringdoubts about elementary school arithmetic. A wide consensus on the truth of the Clay Institute’s MillenniumPrize Problems guides the mathematician’s attempts to solve them, providing at least provisional support tosupporting claims (19).Pólya’s Patterns of Plausible Inference is a famous argument for the importance of this downward direction.It corresponds to Peircean abduction (20, 21), an epistemic process that now plays a central role in the studyof contemporary mathematical practice (22). Abduction can, of course, be just as fallible as deduction, asdemonstrated by long-standing gaps in proofs of famous theorems such as the Euler characteristic (11), wherethe intuitive truth of steps in the proof leads one to neglect ﬂaws in more basic claims, and the historicallyunexpected conclusion of Gödel’s Incompleteness Theorem.On this basis, we model mathematical belief-formation as the navigation of a network of claims whereevidentiary support includes multiple, potentially bi-directional, pathways. Such networks are also the basicstructures for coherence theories of belief formation in cognitive science (23–25). We emphasise that modelsof this form capture the fallible, real-world process of reasoning, not the parallel normative process; the samenetwork coherence perspective can model both (objectively) true and false beliefs; see, e.g. , Ref. (26).To specify a general model of belief formation on these networks, we invoke three requirements. First, werequire that the degree of belief in any particular claim is (all other things being equal) conditional solely onits dependencies (the claims that support it in a deductive fashion), and those that depend on it in turn;we allow for diﬀering strengths of these respectively deductive and abductive pathways. Second, followingstandard models in Bayesian cognition, we require this dependency be additive in log-space. Finally, werequire that the model is otherwise unconstrained in the patterns of beliefs that can be formed. Takentogether, these requirements lead to a minimal (27) and unique “maximum entropy” model (see Materialsand Methods), corresponding to a probabilistic version of constraint satisfaction (28, 29). (Such a model2 iteri & DeDeo also provides a framework for extensions to include more complex, synergistic eﬀects; for brevity, we do notconsider them here.)Despite its simplicity, this model makes three predictions directly relevant to the question at hand. Firstthat, under certain conditions, the very things that lead to the reliability problem can become a virtue. Thephysical analogues of our model are known to undergo phase transitions, where small changes in a controlparameter can lead to (approximately) discontinuous shifts in global properties. Here, in the cognitive domain,the control parameter corresponds to the local degree of dependency ( i.e. , how good the reasoner is, or thinkshe is, about drawing the correct conclusion), while the global property is the average degree of belief in aclaim.The existence of such transitions is sensitive to network structure: they can not happen, for example, for alinear network, nor indeed for any tree-like network of ﬁnite ramiﬁcation (30). However, when the topologicaldemands are met, justiﬁcation becomes easier, not harder, as the network size, N , increases, with—dependingon the structure of the network—a sharp transition to total deductive certainty in the limit of large N .As mentioned above, this is due to the emergence of multiple paths between any two claims. On the onehand, error compounds exponentially along any particular path, as it does in any linear chain. On the other,the number of distinct paths between points can, depending on network structure, grow exponentially (31).At a critical point, the exponential decay is balanced by the exponential growth, and inﬂuence can propagateundiminished across the entire network (13).EPTs are a double-edged sword, however, because disbelief can propagate just as easily as truth. A secondprediction of the model is that this diﬃculty—the explosive spread of skepticism—can be ameliorated bywhen the proof is made of modules: groups of claims that are signiﬁcantly more tightly linked to each otherthan to the rest of the network.When modular structure is present, the certainty of any claim within a cluster is reasonably isolated from thefailure of nodes outside that cluster. This means that a reasoner can come to believe sub-parts of the overallnetwork without needing to resolve the entire system at once. Such a mechanism is has been hypothesised toexist in the case of mathematical proofs (32).Modules can be identiﬁed by standard clustering algorithms such as Girvan-Newman (33). The subsequentrobustness can be tested by comparing the relative diﬃculty of forming a belief within a module compared tothat of forming a belief in an arbitary collection of nodes (see Materials and Methods).A third prediction of the model concerns the balance of deductive and abductive reasoning. Informally, onewould imagine that increasing conﬁdence in either process would aid the overall conﬁdence in the proof: ata given level of deductive conﬁdence (say), I can only become more certain by increasing conﬁdence in myabductive intuitions.This turns out not to be the case: for a ﬁxed level of deductive conﬁdence, increasing abductive conﬁdencecan lead to lower levels of certainty, and similarly in reverse. This is because, at a critical point, abductioncan come to dominate deduction completely, leading to solely downward propagation and cutting oﬀ theproliferation of paths necessary for the EPT. This destroys a key topological features necessary for a phasetransition: there are now fewer paths of inﬂuence between nodes and, for example, theorem “siblings” can nolonger re-enforce each other.We refer to this as the abductive paradox. Informally, the proof becomes dependent solely on the mathe-matician’s belief in the conclusion: doubts propagate downwards, and even the best axioms are powerless toovercome them. In order to determine if real-world mathematical theorems have the necessary properties to trigger theepistemic eﬀects described in the previous section, we analyze two datasets. First, forty-eight machine-assistedproofs constructed by mathematicians with the aid of the formal veriﬁcation system Coq (34), ranging fromthe Pythagorean Theorem to the Four-Color Theorem; see Table 1.Proofs in a formal veriﬁcation system are constructed by a mathematician who then invokes machine-implemented heuristics to ﬁll in the gaps. The proofs themselves are interpretable, if exceedingly pendantic;see Materials and Methods for a proof that the number four is even. We extract the abstract syntax trees3 iteri & DeDeo representing the underlying deductions, and then identify of equivalent claims which turns these trees intodirected acyclic graphs.This dataset is supplemented with four “human” proofs: the original texts of Euclid’s

Geometry , Apollonius’

Conics , and Spinoza’s

Ethics (all of which mark explicit dependencies), and a hand-coded network based on aclose-reading of Andrew Wiles’ 1995 proof of Fermat’s Last Theorem (35). Although human networks areorders of magnitude smaller than those supplemented with the ﬁne-grained deductions of a machine, thecomparison allows us to ﬁnd, and conﬁrm, the similarities between the two.The simultaneous examination of both machine- and human-proofs provides an important check on ourclaims on the epistemic status of mathematical knowledge. Human mathematicians may plausibly haveintrospective access to the ways in which they come to believe something, but there is no guarantee that theymatch the actual reasoning process itself. Machines, through expansions that are orders of magnitude largerthan the self-reported steps in the corresponding human case, make visible what is idealised and implicit inhuman communication. Despite the vast technological gulf between them, the machine-aided proofs of thetwenty-ﬁrst century share, as we shall see, the same basic epistemic properties as Euclid’s.

We present our results in three sections. First, the topological properties of the proof networks; second, theemergence of epistemic phase transitions and the existence of modular ﬁrewalls; and ﬁnally, the abductiveparadox.

Fig. 1 presents four examples of the proof networks we use in this analysis, with modules identiﬁed by theGirvan-Newman algorithm coloured, and with node size indicating out-degree. Fig. 2 shows the in- andout-degree distributions for the machine proof of the Four Color Theorem and G odel’s First IncompletenessTheorem. While in-degree ( i.e. , the number of prior nodes a particular claim depends on) is exponentiallydistributed, the out-degree ( i.e. , the number of nodes that use that claim) is has a heavy-tailed distribution,with a fraction of nodes having inﬂuence hundreds of times larger than average. This heavy-tailed distributionfollows a power-law, with the probability of a node having degree d given by P ( d ) ∝ d − α . Across our sample of both machine and human proofs, these α values cluster tightly around two (Table 1; ﬁtusing the methods of Ref. (36)). This pattern, of both an exponential distribution for in-degree, and theparticular value of α for the out-degree power-law tail, is a characteristic sign of the generative assemblymodel of Ref. (37, 38). This construction process has two steps: ﬁrst, a new node chooses some number ofnodes to depend upon; second, from that set of chosen nodes, it chooses to link to some of their dependenciesin a probabilistic fashion. It is found in both cultural and biological systems governed by successive accretionof links in a distinctive pattern associated with opportunistic tinkering and reuse (39). As described in Materials and Methods, we look for epistemic phase transitions as a function of both deductiveand abductive implication strength: the degree to which the truth of a claim is coupled to the truth of eithera claim it depends on, or a claim that it implies. We parameterize these by two terms, β dep and β imp , for thetwo pairwise eﬀects of truth (or falsehood).On the abductive side, exp (2 β imp ) is the multiplicative factor by which a correct implication makes thenode more likely to be true; on the deductive side, exp (2 β imp ), the multiplicative factor by which a correctdeduction makes the node more likely to be true (see Materials and Methods for discussion of alternativechoices). Taking (for simplicity) a symmetric error-making model, where the probability of incorrectly drawinga false conclusion from a true premise is the same as drawing a true conclusion from a false premise (andrespectively for the abductive case), β implies an error rate (cid:15) of (cid:15) = 11 + e β , which corresponds to the error rate of Hume’s original paradox.4 iteri & DeDeo Figure 1: Implication structure for four proof networks in our database. Clockwise from top left: theFour Color Theorem (Coq), the uncountability of the Reals (Coq), Gödel’s First Incompleteness Theorem(Coq), and Euclid’s

Geometry (original Greek Text). We color the top clusters by membership, and sizenodes according to out-degree ( i.e. , the number of nodes that have that node as a deductive pre-requisite).Both human and machine-aided proofs are characterized by high levels of modularity, and a heavy-taileddistribution of out-degree. 5 iteri & DeDeo

Theorem Nodes α f ∆ L Machine-Aided

Euclid’s

Geometry . ± .

02 0 .

985 +111st Gödel Incompleteness 28,984 1 . ± . ? +13Bertrand’s Ballot 24,137 2 . ± .

05 0 .

998 +13Polyhedron Formula 23,750 2 . ± .

04 0 .

988 +13Euler’s FLiT 22,444 2 . ± .

05 0 .

998 +11Bertrand’s Postulate 22,434 2 . ± . ? +15F.T. Algebra 20,431 2 . ± . ? +14Subsets of a Set 20,205 2 . ± . ? +9Pythagorean Theorem 18,230 1 . ± . ? +19Desargues’s Theorem 18,213 1 . ± .

04 0 .

987 +18Taylor’s Theorem 17,809 2 . ± . ? +13Heron’s Formula 17,487 2 . ± . ? +17F.T. Calculus 16,845 2 . ± . ? +13Binomial Theorem 16,314 2 . ± .

05 0 .

998 +16Geometric Series 16,160 2 . ± . ? +14Wilson’s Theorem 16,120 2 . ± . ? +12Sylow’s Theorem 15,942 2 . ± . ? +12Ceva’s Theorem 15,279 2 . ± .

05 0 .

99 +17Bezout’s Theorem 14,909 2 . ± .

06 0 .

998 +15Reals Uncountable 14,574 2 . ± .

05 0 .

996 +14Int. Value Theorem 14,467 2 . ± .

06 0 .

998 +12Quadratic Reciprocity 14,397 2 . ± .

06 0 .

998 +13Triangle Inequality 13,657 2 . ± .

06 0 .

998 +15Leibniz π . ± .

05 0 .

986 +16Pythagorean Triples 13,254 2 . ± . ? +14Rationals Denumerable 13,108 2 . ± .

06 0 .

998 +13Isosceles Triangle 13,055 2 . ± .

05 0 .

995 +15Div 3 Rule 13,037 2 . ± .

06 0 .

998 +14Inclusion-Exclusion 12,886 2 . ± . ? +16Cauchy-Schwarz 12,647 2 . ± . ? +14Four Color Theorem 12,407 1 . ± .

06 0 .

989 +10Factor & Remainder 11,815 2 . ± .

05 0 .

997 +13Birthday Paradox 11,692 2 . ± . ? +14Liouville’s Theorem 11,645 2 . ± . ? +13Cayley-Hamilton 11,407 2 . ± . ? +15F.T. Arithmetic 11,362 2 . ± . ? +12Cubic Solution 11,271 1 . ± .

05 0 .

986 +17GCD Algorithm 10,792 2 . ± .

07 0 .

932 +14Cramer’s Rule 10,613 2 . ± . ? +15Subgroup Order 10,583 2 . ± . ? +14Mean Value Theorem 10,168 2 . ± . ? +13Ramsey’s Theorem 7,747 2 . ± .

09 0 .

997 +13Schroeder-Bernstein 1,331 2 . ± .

19 0 .

987 +14Triangle Angles 739 2 . ± .

19 0 .

99 +11Powerset Theorem 282 2 . ± .

32 0 .

992 +10Prime Squares 250 2 . ± .

29 0 .

983 +10Pascal’s Hexagon 150 2 . ± .

34 0 .

992 +9Induction Principle 40 — 0 .

955 n.d.

Human

Euclid’s

Geometry

475 1 . ± . ? +21Apollonius’s Conics

446 2 . ± . ? +12Spinoza’s Ethics

572 2 . ± .

10 0 .

906 +11Wiles’s FLT 142 3 . ± .

72 0 .

941 +5Table 1: The statistics of dependence and implication in machine- and human-proved theorems (F.T.:“Fundamental Theorem”; FLiT: “Fermat’s Little Theorem”). Both are characterized by high levels ofmodularity, and a long, power-law tail associated with assembly-and-tinkering construction. Over the entiredataset, machine proofs have a α equal to 2 . ± . f : average degree of belief in ﬁnal theorem at one-steperror rate of 10 − ; ? indicates f > . L : log-likelihood penalty to within-module ﬂip at β equal tounity. Networks are truncated to the ﬁrst depth expansion that produces more than 10,000 nodes, wherepossible otherwise to maximum depth. 6 iteri & DeDeo Figure 2: Distribution of in- (+/dashed ﬁt) and out-degrees ( · /solid ﬁt) for nodes in the automated proofof the Four Color Theorem and for G odel’s First Incompleteness Theorem. While any node depends on asmall number of others, following an exponential distribution, the usage of a node in further claims follows aheavy-tailed distribution, with power-law index α around two.Fig. 3a shows epistemic phase transitions in action for three proofs; Cantor’s theorem on the uncountabilityof the Reals (Coq, N = 14 , N = 12 , Geometry with dependencies taken from theoriginal Greek text ( N = 475). For simplicity in this case, we have set β dep equal to β imp , and p prior equal to0 .

75. We plot three quantities: the average degree of belief over all steps of the proof, the average degreeof belief in the ﬁnal theorem, and the average degree of belief in the axioms. The three proofs in questionshow diﬀerent certainty structures (for example, belief in the full proof lags that of both the theorem andaxioms in the Euclidean case, while the reverse is true for the Four Color Theorem and the uncountability ofthe Reals), but share an overall pattern. At cognitively-plausible error rates, the graph structure leads to asharp transition where high levels of certainty emerge even when when error rates are at levels that wouldinvalidate proofs made by deductive reasoning alone.Column three of Table 1 shows f , the average degree of belief in the theorem ( i.e. , the terminal node) at aone-step error rate (cid:15) of 10 − . The majority reach near-unity levels of f that exceed the one-step conﬁdence.There are a few cases where this does not happen ( e.g. , Desargues’s Theorem); this appears to be due, inpart, to the presence of nodes just below the ﬁnal theorem that have both few dependents and no otherimplications—these dangling assumptions participate only weakly in the larger network of justiﬁcation.Fig. 3b shows the eﬀect of shifting p prior . Past the critical point, even weak priors can lead to the transitionto deductive certainty; in the physics-style language of phase transitions, this is the ﬁnite-size analogue of adivergence in the susceptibility. Failure in the case of weak priors is due to the emergence of domain walls, i.e. , localized parts of the network that freeze into all-false or all-true states. This can lead either to (1) acascade into the all-false condition driven by the small-number statistics of the ﬂuctuations, or (2) a long-livedmetastable state because interconnections are insuﬃciently strong to generate global consensus. Modularstructure, which we discuss now, allows a practicing reasoner to escape the metastable state.In particular, the resolution of our data allows us to characterize how modular structure creates topological“ﬁrewalls” where diﬀerent parts of a proof to decouple from each other. We compute the relative log-likelihoodpenalty to ﬂip all the nodes in a module to the opposite truth value, versus, on the other, ﬂipping the same7 iteri & DeDeo (a)(b)Figure 3: Top: Epistemic phase transitions in three theorems: Cantor’s theorem on the uncountability of theReals, the Four Color Theorem, and Theorem IX.36 (the form of perfect numbers) in Euclid’s Geometry. Solidlines indicate average degree of belief over all steps of the proof; dashed lines, in the theorem itself; dottedlines, in the axioms. Bottom: the relationship between prior and posterior, after equilibrating to the heuristicmodel, at an inference error rate of 0 .

01. This error rate puts all three proofs past the phase transition point,and this means that even weak priors lead to near-unity degrees of belief. Remnant uncertainty at priors closeto 0 . i.e. , where modules fall separately into all-true or all-false states.8 iteri & DeDeo Figure 4: Average degree of belief in four theorems and their preconditions, as a function of abductive anddeductive error rates. Network structure leads to levels of conﬁdence far in excess of what can be expectedon the basis of a linear derivation chain. For ﬁxed deductive (abductive), but rising abductive (deductive)conﬁdence, contours turn over, leading to an abductive paradox driven by an imbalance in the two modes ofreasoning.number of nodes randomly chosen across the whole graph. We characterize this using ∆ L , the log-likelihoodpenalty per nodes ﬂipped, with the number of nodes set to ten and β set to unity.At high error rates, ﬁrewalls are fragile because there is little opportunity for order to propagate at anydistance. However, as error decreases and order emerges, the distinct eﬀects of within-module versus cross-module ﬂips become apparent: the tighter connections between nodes within a module makes them easier toshift to the opposite state. As mathematicians increase their conﬁdence their conﬁdence in a proof, theyﬁnd they can ﬁrst achieve conﬁdence in a particular module of the derivation even in the absence of strongbeliefs about the truth in other places. This means that a modular proof strategy is easier than one thatinvolves diﬀerent parts of the system. Table 1 we list the ∆ L values, where a positive value indicates thatwithin-module ﬂips are more likely than cross-module ones. Values are around +10, corresponding to anoverwhelming preference (at the e +100 level) for within-, rather than cross-, module ﬂips. Fig. 4 shows four examples of our ﬁnal result: that, at a ﬁxed level of deductive power, increasing abductivepower can eventually lead to a degradation in the ﬁnal degree of belief. In each case, for deductive (orabductive) certainty beyond the transition point, we see the contours of contrast belief turn over. A vertical(or horizontal) line drawn past the (rough) EPT point of (cid:15) equal to 0 . iteri & DeDeo Our account of the emergence of mathematical belief depends on the use of paths that go both against andwith the deductive grain to generate an epistemic phase transition. While this poses a challenge to the purelydeductive model, the heuristics that underlie the EPT ﬁt naturally with accounts that balance abduction anddeduction, allow intuition to play a role in the status of a claim without coming to dominate deduction, andallow those intuitions to develop over time and in the course of examining a proof.Consider, as an analogy, the use of computer code in a research project. Suppose I mostly believe somefact A, and I write a complex computer program to check and increase my conﬁdence in that belief. If theprogram produces some output B that contradicts A, then I will likely ﬁrst check the program itself for errors,a move that corresponds to doubting the axioms, or earlier stages of reasoning, in abductive fashion. Later,on reﬂection, I might realize that the output B is actually more intuitive than A; this will now have theopposite eﬀect, and act to increase my conﬁdence in the earlier stages of the code even if I do no furtherchecks. Few theories of belief formation would rule out the analogous process in mathematical reasoning,which is also found in informal accounts by practitioners (19). A more elaborate reﬂection on the relationshipbetween deduction and abduction, in the context of communication and social justiﬁcation, is provided bythe mathematician and computer scientist Scott Aaronson.[A] step-by-step logical deduction tends to be seen as merely the vehicle for dragging thereader or listener, kicking and screaming, toward a gestalt switch in perspective—a switchthat, once you’ve succeeded in making it, makes the statement in question (and more besides)totally obvious and natural and it couldn’t ever have been otherwise. The logical deductionis necessary, but the gestalt switch is the point. This, I think, explains the feeling ofcertainty that mathematicians express once they’ve deeply internalized something—they’renot multiplying out probabilities of an oversight in each step, they’re describing the sceneryfrom a new vantage point that the steps helped them reach (after which the speciﬁc stepscould be modiﬁed or discarded). (40)We emphasize that our results bear on real-world practice, rather than any underlying normative justiﬁcation.A theorem that contains a error in its logic may have little trouble deriving all sorts of (abductively) reasonableconclusions, and thereby lead mathematicians to believe, incorrectly but explosively, in its truth. (It may bethe case that the kinds of errors that invalidate proofs, in ways that can not be ﬁxed, have distinct topologicalstructures that prevent the emergence of an EPT. Such a determination requires a parallel database of invalidproofs.)That the path scaling required for an EPT can be achieved with a network structure associated with tinkeringand reuse suggests that the method of construction may itself aid in the method of justiﬁcation. Such aprocess ﬁts qualitative accounts of how proofs are made. Lakatos’ dialectic model, for example, presented in

Proofs and Refutations , emphasises tinkering by making proof an embedding of the truth of a conjectureinto other areas of mathematics. The proof provides avenues for directed criticism, and both conjecture andproof are co-modiﬁed in response to the incremental introduction of local and global counterexamples. Thestructure of the ﬁnal theorem is a product of this dynamic interplay.Finally, our results allow us to draw some conclusions about the nature of proofs produced without machineaid. In the case of Wiles’ proof of Fermat’s Last Theorem, for example, we see signiﬁcant deviations from the α equal to two power-law tail. This is due, in part, to a thinner network structure in which a text designedfor human communication neglects to explicitly mention its reliance on common axioms or lemmas at everypoint they occur. This leads to a deﬁcit of high-degree nodes whose absence frustrates an epistemic phasetransition because they are no longer available as “Grand Central Stations” that increase the number ofpaths between points. A second example is Spinoza’s Ethics , the one non-mathematical text in our set,which achieves slightly lower levels of overall certainty than others. The

Ethics is devoted to a philosophicalaccount of the nature of matter and knowledge, albeit “in geometric order”, i.e. , in an attempt to parallel thedeductive certainty of the arguments aimed at establishing mathematical facts.We emphasize, however, that there are more commonalities than diﬀerences between the purely human caseand the machine-aided ones, even at the level of quantitative comparison. This suggests that systems likeCoq may be of use not just for veriﬁcation and validation of mathematical claims, but also for their insightinto the nature of mathematical practice and cognition itself.10 iteri & DeDeo

Ever since their invention in a cultural context associated with new forms of justiﬁcation (41), mathematicalproofs have provided us with some of the most explicit examples of human reasoning we have available. Theyare an account, intended for use by other members of the community, of why we ought to believe somethingthat attempts to be immune, by its very explicitness, to every objection.Seen in this way, proofs are a test-bed for justiﬁcation practices more generally. Our results suggest thatthe conﬁdence provided by an epistemic phase transition may also be present, in latent form, in many otherkinds of claims we make in day-to-day life. As noted by Ref. (4), no piece of evidence—even one concerningthe evidence of the senses—is transparent, and we can always be called upon to situate it in the context of alarger argument. This means that people expand claims about the physical world, or normative claims abouthow things ought to be, when they are asked for further justiﬁcation. Those expansions, if constructed via aprocess similar to tinkering and reuse, ought to be able to support the same kinds of phenomena we establishhere.The combination of modularity, abduction, and deduction, as well as the underlying assembly mechanism oftinkering and reuse appears to have the power to generate signiﬁcant levels of certainty even for complex andapparently fragile arguments. The networks we have studied here may be something that we create whencalled upon to justify our beliefs in non-mathematical claims as well.The justiﬁcation of beliefs through reason is a basic task of the human species. Mathematical proofs providean unusual example of what many cultures consider an ultimate standard. Our ﬁndings here suggest thatunderlying features of that justiﬁcation can lead to ﬁrm beliefs, even when we understand ourselves to befallible and limited beings.

Networks for machine proofs are drawn from https://madiot.fr/coq100/ , which lists and locates Coqformalizations of famous theorems; networks for the human proofs are built from standard editions.

Dependency networks for machine proofs are constructed from the abstract syntax trees of Gallina terms,where Gallina is Coq’s speciﬁcation language. Terms in Gallina correspond to proofs of the speciﬁcationgiven by the term’s type. Since our dependency graphs are derived from such terms, the existence of nodecan be interpreted as asserting the existence of inhabitants of a particular type; e.g. , “X is an object of type‘Two is Even’.” Constructing X in this case corresponds to having a proof of the proposition that two is even,via the Curry-Howard Correspondence. Types in Coq are based on the Calculus of Inductive Constructions,a dependently typed lambda calculus. Hence types themselves can be parameterized by terms. For example,a node in a Coq network might be a function F that takes a natural number X, a proof that X is even, andreturns a proof that X+X is even.Here we will take such a proof that 4 is even and show an example transformation from Coq syntax to areiﬁed tree to a directed acyclic graph.First we deﬁne evenness inductively — all even numbers are either zero or two greater than an even number. Inductive ev : nat -> Prop :=| ev_0 : ev 0| ev_SS : forall n : nat, ev n -> ev (S (S n)).Theorem ev_2 : ev 2.Proof. apply (ev_SS 0 ev_0). Qed.

Then we prove that two even numbers sum to an even number by induction on the proof that the ﬁrstargument is even. This is combined with the previous proof that two is even into a proof that four is even.

Theorem add_even_even :forall {n m : nat}, ev m -> ev n -> ev (m + n).Proof. iteri & DeDeo intros n m Hm Hn.induction Hm.{ simpl. apply Hn. }{ simpl. apply ev_SS. apply IHHm. }Qed.Theorem ev_4 : ev 4.Proof.apply (add_even_even ev_2 ev_2).Qed.PrintAST ev_4 with depth 1. In the last line we call our fork of University of Washington’s CoqAST plugin ( https://github.com/scottviteri/CoqAST ). We substitute type constructor names for indexes into constructors, such as “S” for“(Constr nat 1)”. To prevent blow-up in output size, we do not expand axioms or deﬁnitions of inductivetypes. {Lisp}(DefinitionTop.ev_4(App Top.add_even_even(App S (App S O)) ;2(App S (App S O))Top.ev_2Top.ev_2))

If we print the AST to depth 2, we get the following: {Lisp}(Definition Top.ev_4(App Top.add_even_even(App S (App S O))(App S (App S O))Top.ev_2Top.ev_2))(Definition Top.add_even_even(Lambda n_2 nat(Lambda m_22 nat(Lambda Hm_222 (App ev m_22)(Lambda Hn_2222 (App ev n_2)(App Top.ev_ind...))))))(Definition Top.ev_2 (App ev_SS O ev_0))

When we print to a greater depth, we look for each deﬁnition that has not been elaborated and add itsdeﬁnition as a top level tree, as demonstrated by Top.ev_4 and Top.add_even_even above. We assemble theacyclic graph iteratively from these trees. We take every depth two subtree starting from the leaves, and checkif there is a match in the graph that has been built up so far. If there is no match, then the subtree is added tothe graph as a node with children. Inductively, no identical pieces of the tree will be added to the graph twice.We add numbers to the names of new variables bindings to ensure we do have two identically-named variablesin the same scope, which might be otherwise be falsely uniﬁed during graph creation. This process turns thetree into a directed acyclic graph that ﬂows from axioms and deﬁnitions to increasingly high-level theorems.The source code for this process is hosted at https://github.com/scottviteri/ManipulateProofTrees .Fig. 5 shows the graph of the proof that four is even.Networks for the human proofs are constructed by hand, using the references by the author ( i.e. , we includea dependency only when it is explicitly named). For example, Proposition 9 of Book I (“I.9”) in Euclid’s12 iteri & DeDeo

Figure 5: The proof that 4 is even, represented as a directed acyclic graph

Geometry depends on Propositions I.1, I.3, and 1.8. Similarly, coding of Wiles’ proof of Fermat’s LastTheorem uses only Wiles’ explicit remarks, e.g. , in phrases such as “the ﬁrst two conditions [of Theorem 3.1]can be achieved using Lemma 1.12”.

Each node in the proof tree is given a binary truth value. We model the (fallible) steps between claims in amaximum-entropy fashion that ﬁxes the average error rate of a deductive step but leaves the system otherwiseunconstrained (42). This corresponds to the Ising model, P ( { t i } ) ∝ exp  β X i,j J ij t i t j  , (1)where t i is the truth value of claim i (zero or one), the matrix J ij is non-zero when there is an evidentiarylink between them ( i.e. , when i invokes the truth of j as part of its justiﬁcation), and β governs the reliabilityof the connection between i and j ; i.e. , the extent to which the truth of j given i is believed to be correctlyinferred. We measure the perceived truth as the time-average of the node; i.e. , if the heuristic observerperceives the node to be true 70% of the time, the overall degree of belief is 0 . iteri & DeDeo of its neighbours, and the strength of that inﬂuence depends upon β . We change this rule to account forthe diﬀerential impact of dependencies and implications, i.e. , a diﬀerent value of β depending on whetherthe coupling is from i to j or from j to i . This leads to the asymmetric Ising model, where the eﬀect of A on B (all other things being equal) may not equal the eﬀect of B on A , used in studies of updating ingame-theoretic contexts (43).We write the strength of a dependence as β dep , and the strength of an implication is β imp ( i.e. , the extent towhich belief in a claim derivable from A abductively increases conﬁdence in A ). We begin our simulationswith a (weak) bias in favor of truth, here a weakly charitable predisposition for the reader to consider theproof more likely to be true than false at the level p prior before considering evidentiary links. (It is possibleto consider scaling β dep with n , the number of dependencies, to capture the idea that a claim will cite onlythose things necessary for the proof; because the distribution of dependencies is not very wide, however, thisamounts in practice to a simple rescaling.)Finally, ﬁrewall strength, ∆ L is deﬁned as1 | M | X i ∈ M ∆ E i − h ∆ E r i ! (2)where M is the set of all nodes with assigned modules, ∆ E i is the change in energy (log-likelihood) whenall nodes in module i are ﬂipped to the opposite state, and h ∆ E r i is the expectation value of the change inenergy when the same number of nodes are chosen randomly from all nodes with module assignments. Theseare computed from simulations with the prior set to 0 .

5, which leads to modules freezing in to opposite states;however, qualitative results are invariant to tilting the prior in favor of truth. We normalize this quantity tothe total number of nodes in the module to get the per-node penalty; this enables us to compare ﬁrewallstrengths across graphs with diﬀerent numbers of nodes.

We thank Jeremy Avigad, Kevin Zollman, Scott Aaronson, and Cait Lamberton for helpful discussions, andKent Chang for assistance with data entry.

References [1] George Lakoﬀ and Rafael E. Núñez.

Where Mathematics Comes from . Basic Books, NY, NY, USA,2000.[2] Robert L. Goldstone, Tyler Marghetis, Erik Weitnauer, Erin R. Ottmar, and David Landy. Adaptingperception, action, and technology for mathematical reasoning.

Current Directions in PsychologicalScience , 26(5):434–441, 2017.[3] Marie Amalric and Stanislas Dehaene. Origins of the brain networks for advanced mathematics in expertmathematicians.

Proceedings of the National Academy of Sciences , 113(18):4909–4917, 2016.[4] Hugo Mercier and Dan Sperber.

The Enigma of Reason . Harvard University Press, Cambridge, MA,2017.[5] Alan Mathison Turing. On computable numbers, with an application to the Entscheidungsproblem.

Proceedings of the London Mathematical Society , 2(1):230–265, 1937.[6] David Hume.

Treatise Of Human Nature . John Noon, Cheapside, London, UK, 1738. . Book 1, Part 4,Section 1, “Of Skepticism with Regard to Reason”.[7] Jeremy Avigad. Reliability of mathematical inference.

Synthese , pages 1–23, 2020.[8] Henri Poincaré. Mathematical creation.

The Monist , 20(3):321–335, 1910.[9] G. Polya.

Mathematics and Plausible Reasoning: Induction and analogy in mathematics , volume 2.Princeton University Press, Princeton, NJ, 1954.[10] Atocha Aliseda Llera.

Seeking Explanations: Abduction in Logic, Philosophy of Science and ArtiﬁcialIntelligence . PhD thesis, Stanford University, 1997.[11] I. Lakatos.

Proofs and Refutations: The Logic of Mathematical Discovery . Cambridge University Press,Cambridge, UK, 2015. Edited by J. Worrall and E. Zahar.14 iteri & DeDeo [12] B. Thurston. On proof and progress in mathematics. In T. Tymoczko, editor,

New Directions in thePhilosophy of Mathematics: An Anthology , pages 337–355. Princeton University Press, Princeton, NJ,USA, 1998.[13] H. Eugene Stanley.

Phase Transitions and Critical Phenomena . Oxford University Press, Oxford, UK,1971.[14] Terrence Tao. On “local” and “global” errors in mathematical papers, and how to detect them, 2012.Available at http://bit.ly/robust_dag ; last accessed 21 August 2019.[15] John von Neumann. Probabilistic logics and the synthesis of reliable organisms from unreliable compo-nents.

Automata Studies , 34:43–98, 1956.[16] Penelope Maddy. Believing the axioms. I.

The Journal of Symbolic Logic , 53(2):481–511, 1988.[17] David Corﬁeld.

Towards a Philosophy of Real Mathematics . Cambridge University Press, Cambridge,UK, 2003. Chapter 4.[18] Alfred North Whitehead and Bertrand Russell.

Principia Mathematica , volume 1. Cambridge UniversityPress, Cambridge, UK, 1910.[19] C. Villani.

Birth of a theorem: a mathematical adventure . Farrar, Straus and Giroux, New York, NY,2012.[20] C. S. Peirce.

Collected Papers of Charles Sanders Peirce , volume 7. Harvard University Press, Cambridge,MA, 1935. Edited by C. Hartshorne, P. Weiss, and A.W. Bucks. Paragraph 218.[21] T.A. Sebeok. One, two, three spells U B E R T Y. In U. Eco and T.A. Sebeok, editors,

The Sign ofThree: Dupin, Holmes, Peirce , pages 1–10. Indiana University Press, Bloomington, IN, USA, 1988.[22] Fernando Zalamea.

Synthetic Philosophy of Contemporary Mathematics . Urbanomic, Falmouth, UK,2012. Translated by Lucca Fraiser.[23] Paul Thagard. Explanatory coherence.

Behavioral and Brain Sciences , 12(3):435–467, 1989.[24] Frank C. Keil. Explanation and understanding.

Annual Review of Psychology , 57(1):227–254, 2006.[25] A. Gl ockner and T. Betsch. Modeling option and strategy choices with connectionist networks: Towardsan integrative model of automatic and deliberate decision making.

Judgment and Decision Making ,3:215–228, 2008.[26] Seth Rosenberg, David Hammer, and Jessica Phelan. Multiple epistemological coherences in an eighth-grade discussion of the rock cycle.

The Journal of the Learning Sciences , 15(2):261–292, 2006.[27] William Bialek, Andrea Cavagna, Irene Giardina, Thierry Mora, Edmondo Silvestri, Massimiliano Viale,and Aleksandra M Walczak. Statistical mechanics for natural ﬂocks of birds.

Proceedings of the NationalAcademy of Sciences , 109(13):4786–4791, 2012.[28] Gašper Tkačik, Olivier Marre, Thierry Mora, Dario Amodei, Michael J Berry II, and William Bialek.The simplest maximum entropy model for collective behavior in a neural network.

Journal of StatisticalMechanics: Theory and Experiment , 2013(03):P03011, 2013.[29] Paul Thagard and Karsten Verbeurgt. Coherence as constraint satisfaction.

Cognitive Science , 22(1):1 –24, 1998.[30] Yuval Gefen, Amnon Aharony, Yonathan Shapir, and Benoit B Mandelbrot. Phase transitions on fractals.II. Sierpinski gaskets.

Journal of Physics A: Mathematical and General , 17(2):435, 1984.[31] Simon DeDeo and David C Krakauer. Dynamics and processing in ﬁnite self-similar networks.

Journalof the Royal Society Interface , 9(74):2131–2144, 2012.[32] Jeremy Avigad. Modularity in mathematics.

The Review of Symbolic Logic , pages 1–33, 2018.[33] Mark EJ Newman and Michelle Girvan. Finding and evaluating community structure in networks.

Physical review E , 69(2):026113, 2004.[34] Abhishek Anand, Simon Boulier, Cyril Cohen, Matthieu Sozeau, and Nicolas Tabareau. Towards certiﬁedmeta-programming with typed template-coq. In Jeremy Avigad and Assia Mahboubi, editors,

InteractiveTheorem Proving , pages 20–39, Cham, 2018. Springer International Publishing.[35] Andrew Wiles. Modular elliptic curves and Fermat’s last theorem.

Annals of mathematics , 141(3):443–551,1995.[36] Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. Power-law distributions in empirical data.

SIAM review , 51(4):661–703, 2009. 15 iteri & DeDeo [37] Pavel L Krapivsky and Sidney Redner. Network growth by copying.

Physical Review E , 71(3):036118,2005.[38] P. L. Krapivsky, S. Redner, and F. Leyvraz. Connectivity of growing random networks.

Phys. Rev. Lett. ,85:4629–4632, Nov 2000.[39] R. Solé and S. Valverde. Evolving complexity: how tinkering shapes cells, software and ecologicalnetworks. arXiv , 1907.05528, 2019.[40] Scott Aaronson, 2020. Personal communication by e-mail, 19 January.[41] R. N. Bellah.

Religion in Human Evolution: From the Paleolithic to the Axial Age . Harvard UniversityPress, Cambridge, MA, 2011.[42] Edwin T Jaynes. Information theory and statistical mechanics.

Physical Review , 106(4):620, 1957.[43] Serge Galam and Bernard Walliser. Ising model versus normal form game.