1 Introduction

It is commonplace to observe that mathematics is the realm of knowledge distinguished by a clear agreement on right and wrong answers. Historians and philosophers know, of course, that this image should be qualified: right or wrong answers to mathematical questions may change over time, and one’s philosophical stance may affect what they may consider as mathematically right or wrong. Nevertheless, there is something about mathematics that is more amenable to consensus than other realms of knowledge. The purpose of this paper is to single out what it is that is so consensual in contemporary academic mathematics, relativize this consensus historically, and try to explain how it came about.

I find this task important, because the unqualified and monolithic view of mathematics as a realm of absolute consensus supports two very problematic public images of mathematics. Inspired by Paul Ernest’s ongoing discussion of overvaluation of mathematics (e.g. 2020), I note two opposite trends. One is the overvaluation of mathematics, which portrays it as the ideal form of knowledge, as a model for all other forms of knowledge, and as something that improves all kinds of knowledge that it touches. According to this view, ‘mathematized’ is always better. The opposite, undervaluation of mathematics, views it as meaningless and inhuman drudgery, an empty mechanical game with symbols, which sucks the soul out of everything it interferes with. Both these images are supported by the unqualified view that mathematics is exceptionally consensual. In the context of overvaluation, consensus proves that it is the best kind of knowledge. In the context of undervaluation, it proves that it is a mindless following of rules.

In order to help counter both these trends, a realistic, sensitive and qualified story needs to be told about mathematical consensus. However, I warn the reader in advance that the story that follows is not a rigorous analysis, but rather an outline of a research program. Every aspect of the story that I tell below can and should be challenged until it is verified, refuted or refined by more careful anthropological, historical, sociological and philosophical work. I hope to draw the attention of the research community to this research program in order to encourage collaborative work around it.

The paper is constructed as follows. In Sect. 2, I try to pinpoint what it is exactly that mathematicians agree on: the validity of mathematical proofs. In Sect. 3, I provide a model for how this consensus is achieved, and, by means of some examples, I outline the limitations of this consensus. In Sects. 4 and 5, I try to relativize this consensus historically, and show that it in fact consolidated only around the turn of the 20th century. In Sect. 6, I try to explain the emergence of consensus by means of the formalization of mathematics that took place around the same time, while acknowledging the difficulties of such an explanation. Finally, in the last section, I conclude the argument, evaluate the weaknesses of my evidence, and call for a more rigorous inter-disciplinary analysis.

2 What do mathematicians agree on?

There are many things that mathematicians do not agree on, or at least do not agree on more than their colleagues in other disciplines, even if we restrict our attention, as we do here, to professional research mathematicians. Beauty, originality, importance and other measures of quality are not, I believe, more consensual among mathematicians than among other scientists. In fact, and contrary to common perceptions, even truth is not subject to consensus among mathematicians. Indeed, mathematicians have different conceptions of mathematical truth, even if these conceptions are not always as well-articulated as those developed by logicians and philosophers. A common example is disagreement about the truth of the axiom of choice. Mathematicians may even disagree whether the very question of the truth of an axiom makes any sense.

However, there is something that mathematicians do agree on in an exceptional manner – and from this point on I am focusing on pure mathematics, as the situation of applied mathematics may be somewhat different. They tend to agree much more than experts in other domains on whether a given argument or proof is valid. As we will shortly demonstrate, this consensus is not absolute and is not to be taken for granted. There are conditions of possibility that must be met for it to hold. This consensus is, nonetheless, exceptional, when compared to the agreement over the validity of arguments in other disciplines. Wiles’ proof of Fermat’s last theorem is a case in point. Within the space of two years (1993–1995) the proof was presented, challenged and fixed. In any other area of science, such a deep and broad theoretical intervention would require many years of processing and deliberation to become established, and even then would likely to be continually challenged and revised.

This kind of consensus holds even among mathematicians with different philosophical or foundational stances. For example, mathematicians with sufficient training in logic and foundations would be very likely to agree on the validity of proofs within different mathematical frameworks (e.g. classical and intuitionistic), regardless of their beliefs concerning these frameworks. An intuitionist may find classical mathematics pointless or meaningless, but would still be able to agree with his colleagues whether, within a classical perspective (e.g. first order ZFC), a given proof is valid or a given exam answer is correct – at least within the limits discussed below. While mainstream mathematicians are usually not proficient in identifying whether a proof is valid in a given formalized intuitionist framework, they would know how to pick up such a skill, in the same way as they would if they wanted to venture into a new area of mathematics. It is this specific kind of consensus that I wish to discuss in this paper.

Two further clarifications are in order for the kind of consensus or disagreement that I am concerned with here. First, I am not interested in cases where a single mathematician endorses a proof which is rejected by the entire community. The supporters or detractors of an argument may be few, but not completely isolated or excluded from the wider community, for the disagreement to be of interest for this paper.

Second, consensus about the validity of a proof is not considered here as eternal. There are many cases (including contemporary ones) where a longstanding proof was challenged after many years of acceptance. However, once such a proof is actively and publicly challenged, mathematicians tend to reach a new consensus about the validity of the proof faster and more robustly than in other disciplines.

An important empirical evaluation of mathematical consensus about the validity of proofs was discussed in Weber (2008), Inglis and Alcock 2012 and Inglis et al. (2013). In the experiments reported in these papers, where mathematicians were asked to evaluate the validity of a proof, the level of consensus was less than 75%. These results and their analysis raise, explicitly or implicitly, some of the conditions required for consensus regarding the validity of proofs. First, to reach consensus, the validators of the proofs need to be experts in the relevant field. Second, the consensus depends on the context (whether the argument belongs in a journal, textbook, classroom discussion…). Finally, and most importantly, consensus is not a spontaneous occurrence, but depends on exchange between the various validators, which, in turn, requires time, patience, trust and good will. We turn now to an analysis of this process.

3 How do mathematicians reach their consensus?

It is only via an intersubjective exchange that mathematicians can reach high levels of consensus regarding the validity of proofs. But we still need to figure out how they reach it. To answer this question, I propose the following model.

Mathematician P (prover) suggests a proof. Mathematician C (critic) challenges it (for an elaborate review of dialogical approaches to logical deduction see Dutilh Novaes 2021). The challenge is of the form “I don’t quite see how you justify this statement in the proof”. The answer may draw on a large variety of tools: known theorems, analogies to known similar cases, explication of implicit steps, reference to a diagram, etc. (see Avigad 2020, Sect. 4, for a more detailed discussion). After a while, C may be persuaded that the proof is valid, P may be convinced that it is not, or C will require a re-write of some aspects of the proof. Several iterations may follow, at the end of which, usually (but not always), the prover and critic will reach an agreement concerning the validity of the proof.

The story however, need not end there. Suppose P and C agree that the proof is valid. P will triumphantly exclaim: “finally, you agree that my proof is valid”, whereas C may disdainfully retort: “no, your proof is still wrong; it is the new proof, which was devised through this exchange, that is valid”. What we conclude from this mock exchange is that the questions of the validity of a proof and of the individuation of a proof may intertwine. Agreeing on whether a proof is valid may depend on whether the proof is perceived as the same as or as different from another proof.

The above situation is not just idle armchair speculation. The limits of consensus concerning the validity of proofs and its underlying conditions can be gleaned from real-life examples. I will provide below a few recent examples of the problem of individuation of an argument, the problem of disagreement about the validity of new arguments, and examples that combine both aspects.

First, the case of category theorist Olivia Caramello demonstrates the problem of individuating a new proof with respect to previous arguments and knowledge, without raising the issue of validity. Caramello was criticized by some of her colleagues for taking credit for theorems that were, according to the critics, essentially well-known in the community. She challenged her critics by asking “can you give a precise reference of a theorem … that I attributed to myself when … it was proved by someone else…?”. Peter Johnstone, her former PhD supervisor, answered: “originality is almost never a simple matter. The point at which something emerges from the fog of ‘folk-knowledge’ and becomes a clear-cut theorem is very hard to pin down” (Caramello, n.d. 2010; for a normative discussion of the case, see Rittberg et al. 2020).

Next, for a case of an unsettled disagreement about the validity of a proof, which is not entangled with the question of their individuation, the following quotation is exemplary: “It had been anticipated that O. Gabber would be a co-author of the present paper. He preferred to withdraw, so as not to be co-responsible for the errors and inaccuracies in it. He is no less responsible for many of the ideas that we exploit…” (Beilinson et al. 1982, 7, my translation; the second edition, from 2018, with errata and addenda, does include Gabber as a co-author). In terms of our negotiation model above, this is an example of a curtailed exchange, which left a co-author unable to endorse the book in its entirety at the time of its first publication.

Perhaps the most famous contemporary disagreement about the validity of a proof (again, not entangled with the problem of individuation), is Shinichi Mochizuki’s purported proof from 2012 of the abc conjecture. The majority of the mathematical community finds the proof impenetrable or outright invalid, but a circle of supporters is still convinced. In 2018, the efforts of two critics, Peter Scholze and Jakob Stix, culminated in pointing out a corollary whose proof doesn’t work. Mochizuki, however, claims that they simply misunderstand the corollary (Klarreich 2018). As one commentator put it, Scholze and Stix “appeal to ‘certain radical simplifications’ that seem to get the heart of the matter, but they are also aware that ‘such simplifications [might] strip away all the interesting mathematics that forms the core of Mochizuki’s proof’. … It is this that Mochizuki condemns as illicit, and in his own support, he offers a number of examples that, he claims, lead to incorrect results if so treated. But Mochizuki, in defending himself, again uses some idiosyncratic definitions for common constructions in category theory, while still using standard terminology.” (Roberts 2019). This case is an example of breaking the negotiation where one would usually expect the author, or one of his supporters, to re-write the proof in a more accessible language, rather than just support it with auxiliary explanations, as Mochizuki did. It also shows how tentative and fragile the relevant discursive norms are. Mochizuki’s purpose is to introduce a new and superior mathematical language. Would a translation to the old language not defeat Mochizuki’s purpose? And are critics even entitled to expect the author to adapt to them, rather than adapt themselves to an acknowledged author with such an acclaimed track record as Mochizuki’s?

Finally, in several examples, the problem of consensus and individuation of proof are intertwined. One example is the proof of Arnold’s conjecture in symplectic geometry, published by Kenji Fukaya and Kaoru Ono in 1996. The difficult proof was challenged in a lengthy online discussion in 2012. At the end of this discussion, Dusa McDuff and Katrin Wehrheim published a revised proof. For Fukaya, “the papers [that McDuff and Wehrheim] wrote do not contain new and significant ideas. There is of course some difference … However, the difference is only on a minor technicality”. Helmut Hofer, who is a veteran expert in the area, however, had a different opinion: “Overall, [Fukaya’s approach] worked, but … as a result of this discussion … there were a few hundred pages produced explaining the original results. So there was definitely a need for the explanation” (Hartnett 2017). Here we see how the question whether the original proof was valid depends on whether the original and new proofs are different or the same.

A similar, more famous example, where opinions span a similar divide are Perelman’s solution of the Poincaré conjecture from 2002, challenged by Xi-Ping Zhu and Huai-Dong Cau in 2006. Another example is Givental’s proof of the mirror conjecture in 1996, challenged by Bong Lian, Kefeng Liu and Shing-Tung Yau in 1997 (both examples are covered in Nasar and Gruber 2006).

Note that in almost all the examples I mentioned above, the dispute is not between some faceless P and C, but between Euro-American mathematicians and Asian mathematicians, or between men and women. I cannot say if this anecdotal evidence captures a statistical trend (i.e. high profile disagreements are more likely to emerge across social divides than between members of the same social group), but the sociological dimension should not be overlooked when discussing disagreements in mathematics. Indeed, when disputes are overdetermined by social divides and disputants insist on challenging power differentials, the ensuing discussions may attract more community engagement and visibility, rather than end up successfully suppressed.

4 A historical view – before early modern european mathematics

If we were to look for disputes about the validity of proofs in other disciplines, from physics to the humanities, our sample would easily be much larger and much more dramatic. This testifies to the fact that even within the limits on contemporary mathematical consensus, mathematics is still much more consensual than other sciences where the validity of arguments is concerned. The next question, then, is whether this consensus is a trans-historical phenomenon, or something that should be considered as modern innovation.

The first thing to note is that the questions cannot even be formulated in the terms stated above trans-historically. Indeed, some pre-modern mathematical cultures, such as the Chinese and Indian cultures, were not centered around proofs. This does not mean that these cultures did not have proofs – that would be blatantly false (see e.g. Chemla 2010, Srinivas 2005). However, these mathematical cultures did not focus on proofs, and therefore framing the question of consensus in these cultures around the validity of proofs would be anachronistic.

So let us turn to the first historical mathematical culture that is centered around proofs: classical Greek geometry. Let me note first that I restrict the discussion only to agreement among practicing mathematical writers, ignoring critiques from philosophers (although the line between the two is not quite clear outside the Hellenistic mathematical peak in the generation following Archimedes, as explained in Netz 2022). Second, we focus only on geometry. Indeed, number theory has a somewhat different character, and is often less proof-centric, and practical mathematics (or logistics) is hardly concerned with proofs at all, except when discussed by a handful of syncretic authors, such as Hero of Alexandria (see, e.g., Asper 2008).

The situation concerning the validity of arguments among mathematical writers in Greek geometry is not straightforward. The following two quotations from Reviel Netz appear to be contradictory. First, he writes that “There are considerable areas where mathematics simply does not allow this kind of polemical exchange [typical of philosophers]” (Netz 1999, 310). But he also writes, a few years later, that “the space of communication [between mathematicians] is an arena for confrontation, rather than for solidarity. The relation envisaged between works is that of polemic” (Netz 2004, 62).

How to square out this apparent contradiction? There are two ways around it, which depend on two different interpretations developed along Netz’ career. First, following the argument of Netz (1999), we find that the practice of classical Greek geometry is highly regulated. It depends on a very rigid language with limited vocabulary combined into carefully repeated phrases (‘formulas’ in Netz’ terms, but not to be confused with symbolic formulas). Second, Greek geometric practice is further regulated by a highly disciplined use of diagrams. This is enough to guarantee that around a certain core (more or less co-extensive with Euclid’s Elements and perhaps the theory of conic sections) no internal disagreements arise concerning the validity of proofs. However, when this core is exceeded in favor of innovative mathematical objects and methods, which go beyond the existing rigid language and diagrammatic practices, disagreements readily emerge.

Second, following the interpretation of Netz (2020), we observe that the Greek cultural trend of literary canonization turned Euclid’s Elements into a treatise that defines geometry to such an extent that the very notion of mathematical validity was reconstructed according to Euclid’s standards. Within mathematical circles, Euclid’s proofs could not be refuted almost ‘by definition’, as Euclid’s work was the very model of valid proofs. But outside the scope of arguments that could be validated by extrapolation from Euclidean practices, polemic and disagreement reigned.

Regardless of how we interpret Netz’ characterizations of the two strata of Greek mathematics, we should note that singling out validity as a bone of contention among Greek mathematicians is a difficult task. Mathematical polemics revolve around mathematical virtues that span anything from what we would call ‘style’ to failed deductive reasoning (as the examples in the first part of Netz 2004 show). Focusing on the validity of arguments in a stricter sense, then, might be somewhat anachronistic, although sometimes analytically extractable from Greek debates. I note that the situations in Renaissance and early modern European mathematics – sometimes also in later European mathematics – is similar in that respect (e.g. the Tartaglia-Cardano debate, as documented in Acampora 2000, which involved many disputes about mathematics and not just precedence; the Wallis-Hobbes debate, presented in Jesseph 1999; and debates in the 19th century French academy of Sciences, analyzed by Ehrhardt 2010; 2011).

In order to understand how disagreement works outside the core of Euclidean geometry, consider the attempts to prove the parallel postulate, which start in Hellenistic times, continue through the Arabic middle ages, and persist in early modern and modern Europe (see Rosenfeld 1988, Ch. 2, for a survey). The standard historiography is that all these proofs are wrong, because they all rely on implicit axioms or spatial intuitions. This historiography, however, is highly problematic. Indeed, all Euclidean proofs depend on implicit axioms and spatial intuitions!

The Euclidean axiomatization is not, and never pretended to be, exhaustive in the sense of modern formal-axiomatic systems. As mathematical authors observed since antiquity, one could, and perhaps should, add many axioms to Euclid’s list (De Risi 2016). Moreover, Euclidean reasoning depends on some well-regimented observations of diagrams. So declaring a proof invalid simply because it involves implicit assumptions or spatial intuitions is anachronistic. It would be more reasonable to argue that the assumptions or intuitions required for proving the fifth postulate go beyond those used in Euclid’s Elements (although this too is problematic, as some similar moves are arguably implicit in Euclid’s work).

Now, drawing the line of validity around the assumptions and intuitions reconstructed as implicitly used in the Elements is rather arbitrary – and yet this arbitrary course was the course taken by many mathematicians, and one which continued to dominate up to the revision of the meaning of axioms culminating in Hilbert’s program. However, being arbitrary, this arbitrary line left enough room for discord. Again, the problem of individuation arises: is this implicit assumption the same as or different from that implicit assumption in Euclid’s work? Or should we allow it based on some analogy? Or perhaps anyone who acknowledges this Euclidean assumption must accept that assumption as well? And can we even agree on what are the assumptions implicit in Euclid’s work? Or should we justify our assumptions independently of Euclid? Given these ambiguities, it is easy to understand how post-Euclidean disputes can emerge.

I have not discussed medieval Arabic mathematics here, but Berggren’s account of Arabic mathematics provides us with good guidelines for how to think more generally about proof, rigor and consensus in pre-modern mathematical cultures. In fact, this characterization should also be kept in mind when we turn, next, to the question of consensus in early modern and modern Europe.

In the instances of root extraction, the binomial theorem, and in the numerical methods of the astronomical handbooks, we have seen mathematics flourishing in the form of methods and techniques rather than that of theorems and proofs. We have seen in the search for a proper construction of the heptagon, in the attempts to recast the theory of proportion, and in the controversy over the volume of a paraboloid evidence that satisfactory proofs arise from an historical process, where rigour is achieved not by a stroke of genius but by a dialogue of mathematicians with each other and with their forebears. Finally we have seen in the example of the value of π a controversy not between rigour and intuition but a conflict in which both sides relied on arguments neither of which were rigorous but both of which were, in different ways, convincing (Berggren 1990, 47).

5 A historical view – early modern and modern european mathematics

Early modern and modern European mathematics, well into the 19th century and even the early 20th century, is still an arena for disputes concerning the validity of arguments. The best known areas of debate are, of course, infinitesimal methods (see, for example, the classical Boyer 1959, as well as Schubring 2005, which integrates the story of infinitesimals with that of negative numbers) and infinite series (e.g. Ferraro 2008; his narrative ends in 1820, but the disputes live well into the 20th century with people like Heaviside). These disputes are very well known, so I will not get into details, only qualify them for the context of this discussion.

First, I believe it is wrong to equate these historical disputes with the foundational disputes between, say, classicists and intuitionists or constructivists today. Indeed, as we observed above, mathematicians trained in foundations are very likely to agree on the validity of arguments within each given framework, even if they think that the given framework is pointless or even false in some sense. This is not quite the situation in infinitesimal calculus, even in the 19th century. While each of the different approaches to calculus had its ‘danger zones’, where it was known to be at risk of invoking contradictions, it was never consensually clear which inferences should or should not be allowed, even within a specific approach. The doubts and ambiguities in the formulation of some of the main approaches, such as Cavalieri’s indivisibles, Newton’s fluxions, Leibniz’s differentials, and the multifarious tradition of limits, testify to these tensions (see e.g. Mancosu 1999; Schubring 2008).

Where arguments could be translated to the mathematical gold standard, namely the Eudoxian-Euclidean method of exhaustion and contradiction, mathematicians had an external control on the validity of results. However, it was never perfectly clear how to translate this gold standard into clear articulations of the validity of arguments within any of the approaches to the new calculus, without substantially reducing their productivity. Indeed, while infinitesimal approximations of finite order could be translated to classical geometric arguments, once infinite series are factored in, the translation tends to break down, especially as questions of convergence can often be set aside without necessarily violating the truth of the results (where truth is evaluated either in terms of calculations or of alternative Euclidean arguments).

It would also be wrong to qualify the approach to infinitesimal methods as heuristics or shortcuts that should be validated by other means. While some authors explicitly framed the new infinitesimal methods in precisely this way, others were quite explicit in claiming that infinitesimal arguments are valid in themselves, independently of Euclidean validation (e.g. Bella 2018, 156–158 for Wallis, 164–165 for Barrow). Anyway, as mentioned above, once infinite series become an inseparable part of the arsenal of the new calculus, the ‘heuristic’ or ‘shortcut’ justifications – which do not quite disappear – degenerate into an untenable alibi.

We note further that a consensus regarding the validity of arguments was not only unattainable in ‘real time’, by the indigenous practitioners of these mathematical cultures, but sometimes remains unattainable even in hindsight. A case in point is Cauchy’s proof that a convergent sum of a series of continuous functions is itself continuous. Scholars still argue whether, in Cauchy’s own terms (rather than contemporary, anachronistic terms), this proof should be considered valid or not (see Schubring 2008, sections VI.6.5-6). There isn’t now, as there was not at the time, a clear consensus on what Cauchy’s own terms are (I endorse Schubring’s interpretation that they were a sort of compromise), and which inferences should be considered as valid according to these terms.

But infinities and infinitesimals were not the only sources of disagreement about the validity of proofs. Another area of dispute is that which has become known in the literature under the title purity of method. Here the question is, for example, whether a proof of an arithmetical statement can legitimately involve geometric arguments and vice versa (this question extends, of course, to other mathematical domains as well). Perhaps the most famous example here is Bolzano’s proof of the intermediate value theorem without relying on geometric intuition, but he was far from unique in this respect (see further examples in Detlefsen 2008, Sect. 7.2; Detlefsen and Arana 2011, Sect. 2; Ferraro and Panza 2012 for the case of Lagrange; Wagner 2016 for the case of Wronski). Here, again, one may suggest that such disputes are akin to disputes between different philosophical approaches to mathematics, whereas within each approach the validity of an argument would be consensual. This, however, assumes that there is a consensus regarding the limits and scope of each mathematical domain and which concepts or methods lie ‘purely’ within each domain – a consensus that was lacking then as it is now.

Another area of dispute revolved around algebraic generality. First, the meaning of ‘general’ is not as straightforward as it might seem today. For example, right after arguing for the general expressibility of functions in terms of power series, Lagrange writes: “This demonstration is general and rigorous as long as x and i remain variable [indéterminées], but it ceases to be so, if one assigns x determinate values [valeurs déterminées]” (Lagrange 1797, 8). There are two ways to understand this statement. The more straightforward one is to read ‘general’ as what we would call today generic (see, for example, Hawkins 1977, 121–126). The second is more oblique, but, I believe, no less relevant. Following Ferraro (2004, 36), we observe that the notion of a variable in the 18th century was not strictly reducible to the values it could take. In a certain sense, it was an entity that was different from its values, and what applies to the variable, therefore, need not apply to all its values (the interpretation of Viète in Hawkins 1977, 122 also supports this view). This gray area between algebraic terms and their values made consensus difficult to attain (for a contemporary take on the tension between formal and set-theoretic variables, see Wagner 2009).

This tension manifested itself also in what we would consider today as purely finitary-algebraic contexts (rather than infinitary-analytic contexts, as in the case of Lagrange). For example, the problem of algebraic general representation led to difficulties around proofs of the solubility or insolubility of quintic equations. Indeed, Peacock writes about Abel’s proof of the insolubility of the quintic that “though some parts of it are obscure and not perfectly conclusive, yet it is, perhaps, as satisfactory, upon the whole, as the nature of the subject will allow us to expect” (Peacock 1834, 311, quoted in Suhr 2021, 16; note that Peacock is concerned here with a part of the proof which is more or less generally accepted today, as opposed to another part of Abel’s argument, which was criticized by Hamilton; see Suhr 2021, Sect. 3, for details). The discussions around proposed solutions of the quintic by means of radicals and proofs of the impossibility of such solutions testify to the difficulties of reaching consensus regarding the generality of algebraic expressions and their relation to singular cases (Suhr 2021, especially 30–31).

Finally, and in a more familiar vein, another source of disagreement concerning the validity of arguments was the issue of conceptual scope. I will note here only two classical examples. The first is the vibrating string (Youschkevitch 1976, Sect. 9; Bottazzini 1986, Ch. 1), where the scope of what would constitute a relevant description of the initial position of a string (a single analytic expression, any continuous – though not necessarily differentiable – shape, or an infinite sum of trigonometric functions) instigated a debate around the validity of proposed solutions. Note that the dispute is not (only) about empirical adequacy, but also about what constitutes a general description of a mathematical curve and whether a non-differentiable function can serve as an initial condition to a differential equation.

The second example is the famous discussion of the Euler characteristic theorem for polyhedra, and the questions of which objects should count as polyhedra and how one should count a polyhedron’s faces and sides (Lakatos 1976). The validity of arguments depends, of course, on such articulations. Note that a simply pluralist position (all definitions are equally fine, one should evaluate each argument for each definition separately) is anachronistic. Indeed, which objects legitimately constitute a polyhedron, face and side was a geometric, topological and algebraic question that required internal justification, not simply unequivocal definitions (I refer especially to Lakatos’ discussion in pp. 16–18 and the footnotes therein). The definition of polyhedra was viewed as an expression of an underlying essence, and therefore the validity of the argument could not be decoupled from the quest for the ‘right’ definition.

This, of course, is not an exhaustive view of the grounds for and kinds of disagreements about validity in early modern and modern mathematics. My purpose is only to convince that disagreement concerning the validity of proofs was more deeply entrenched in that mathematical culture than we would expect from our acquaintance with contemporary mathematics. Goldstein (2013) argues that, at least in the context of Mersenne’s 17th century correspondence circle, such controversies were part and parcel of mathematics as a “normal science”, rather than something that nibbled at it from the margins. In fact, Berggren’s quotation from the end of the previous section is not out of place for early modern and modern European mathematical culture, and the separation between this section and the preceding one should now feel at least a little artificial.

6 Explaining contemporary consensus

If the contemporary consensus about the validity of mathematical proofs is indeed an innovation compared to previous mathematical cultures, the following question necessarily arises: how did mathematicians transition from the earlier, more dissensual mathematical culture, to the contemporary culture, where consensus about the validity of proofs is so much more prevalent?

If we date a qualitative increase in consensus concerning the validity of mathematical proofs to somewhere around the turn of the twentieth century (give or take three or four decades), we should find what it is that changed in mathematical practice, which could account for the emergence of consensus. The obvious suspect is clear: mathematical formalization. However, this easy answer has quite a few problems.

The first problem is that formalization is a rather vague term. If we consider the developments leading from the 19th to the 20th century, we can observe an evolving functional analytic formalization, encompassing a refined expression of complicated functions by indices and iterative formulations; an algebraic formalization of abstract structures with their axiomatic definitions; the formalization of logic (propositional and predicate calculus); a non-axiomatic, naïve set theoretical formalization; and, finally, a Hilbertian formalization, which, like the algebraic formalization, is axiomatic, but where axioms do not define structures within mathematics, but mathematical systems as such. It is hard to tell how to divide the labor of consensus building between these often interconnected kinds of formalization. For example, it is clear that a full axiomatic set theory was required for resolving the foundational antinomies while allowing the preservation of Cantorian mathematics, but it’s not at all clear whether this heavy machinery was necessary to guarantee the mundane sort of consensus about the validity of proofs in the everyday work of mathematicians.

The second problem is that formalization is not a straightforward process. There is not just one way to formalize an argument (see Tanswell 2018), and an informal argument may be interpreted in formally disparate, sometimes even inconsistent ways. How can a practice, which is in itself not consensual, generate consensus about the validity of mathematical proofs?

Finally, a major concern for explaining consensus in terms of formalization is that we hardly ever formalize completely. Even in research publications, the level of formalization is rather limited (we hardly ever go all the way down to axioms, hardly ever reconstruct everything as sets, hardly ever even properly explicate all functional expressions). A full-fledged formalization in terms of some Hilbertian formal system is out of the question, except anecdotally (with automated proof checkers, it’s becoming less anecdotal, but the consensus about mathematical proofs does not depend on this recent and still marginal development). Moreover, formalization, when performed by humans, is so tedious and complex, that it is more likely to introduce errors than produce insight (see Rav 2007; 2008). The following 1949 quotation from Francesco Severi, a proponent of the Italian school of algebraic geometry, infamous for their informal methods, is exemplary in this context:

I remember in Pisa something which happened with one of my colleagues from Turin, about 1900, when Lindemann published a false proof of Fermat’s last theorem. My friend, always lively and considered very learned, assured me that Lindemann’s proof was completely correct because he had been able to reduce it to symbols of mathematical logic. Fifteen years later, Lindemann himself published a second note to say that his proof was false. My friend had therefore made a mistake of logic, that is to say, he was deceived even while using one of the most formidable instruments of rigor. Another colleague, an eminent foreign geometer, wrote me recently that he believed in Enriques’ proof up to the time when it was recognized not to work and that from that time on, while still thinking like an Italian geometer, he distrusted all the results until he had been able to put them in algebraic terms. Personally, I believe that our methods, when they are well analyzed, give a confidence as perfect as purely algebraic methods.... In any case, I reply to my colleague with a question: Was the gap in the proof discovered by the algebraists, or rather by us, with our methods? (quoted in Parikh 1991, 24–25).

In order to square the really existing mathematical consensus with such a diffuse, multiple and, most importantly, hardly ever existing practice of formalization, some additional explanations are in order. The first component of the explanation will be based on a juridical metaphor, which, however, should not be taken too seriously or extended beyond its current scope. I do not think of formalization as a juridical process, and do not recommend such thinking. This metaphor, however, may still be of use for a few initial steps in our thinking about formalization (I acknowledge that this metaphor has already been briefly proposed in a similar context in Maddy 1996, 512).

I suggest to think about formalization as vaguely analogous to a hierarchy of courts of appeal. Here are some components of the analogy. First, most disputes are settled out of court or in the lowest juridical instance. Courts of appeal are rarely involved, when taking into consideration the bulk of legal disputes. Analogously, formalization too is a procedure that is rarely used. We usually resolve mathematical disagreements about the validity of arguments by means of informal or semi-formal explanations (as described in the model from Sect. 3 of this paper). A formal re-writing is rarely invoked.

Second, like courts of appeal, formalization is not made of a single instance. There are different kinds of courts (family, labor, criminal…), and they are sometimes associated with distinct courts of appeal. Moreover, the organization of courts of appeal is often hierarchical, where issues that are not resolved in the first instance of appeal can be further appealed in higher instances. Similarly, there is not just one kind of formalization, and we never formalize an argument all at once. In those rare cases where formalization is involved in settling a dispute concerning the validity of arguments, we proceed along a hierarchy of levels of formalization that mediate the ‘everyday’ mathematical language with the ‘supreme court’ of a fully formalized proof aligned with a fully regimented formal system.

Third, a court of appeal does not look at the entire case discussed in the lower instance. The appellant presents the court with specific components of the dispute that they wish to re-open for debate. Similarly, the kind of formalization that is involved in settling disputes concerning the validity of mathematical proofs does not encompass an entire proof, but restricts itself to specific components that merit this kind of treatment.

Fourth, in a hierarchy of courts of appeal, the higher instances are more authoritative than the lower ones. Similarly, when a proof is partly formalized (that is, part of the proof is re-written in a somewhat more formal dialect), this formalization is more authoritative than the less formal methods used so far. If we have a good explanation for why something should be the case in terms of an analogy, diagram or a reliable rule of thumb, but formalization counters this explanation, it’s the former that should yield to the latter, not the other way around (although the very correctness of the formalization can also be challenged).

Finally, even the supreme court does not have the last word. When the final verdict contradicts the dominant politics of justice, the law may be changed so as to justify what the legislator considers to be right. Analogously, what is or isn’t allowed in a formal system may be revised so as to enable what the relevant mathematical community believes to constitute good arguments. Formal systems are evolving products of social negotiation, not something that is instated once and for all.

This narrative should allow us to account for how a hardly-existing practice, implemented only in restricted and partial ways (that is, formalization) can account for a really-existing social fact, which is highly prevalent in our mathematical culture (consensus about the validity of proofs). Of course, if automatic proof-checking ever became mainstream (in terms of our metaphor: developed from a highly specialized and technical practice into a staple of the “legal hierarchy” of mathematical verification), this narrative might change in rather dramatic ways. But, either way, this would not resolve the issue of the plurality of formalizations, which depend on our interpretive and contestable translation of informal (or less formal) statements into formal (or more formal) ones.

This last objection, however, is easier to settle. We can argue whether an informal argument is properly captured by one of several – potentially inconsistent – formal reconstructions. We may continue to argue, as Wittgenstein would say, “until you are black in the face”. But once we have our competing formal interpretations, we are likely to reach consensus about the validity of each of the respective formalized arguments, even if we cannot agree which of them captures the intended meaning of the underlying informal argument. This kind of disagreement, I believe, is a feature of mathematical practice, which needn’t be denied, as it complements the uncertainty about the individuation of proofs discussed in Sect. 3 above. Nevertheless, it does not contradict the capacity of mathematicians to disambiguate arguments into consensus by means of partial formalizations – a capacity that other disciplines and earlier mathematical cultures share only to a very limited extent.

It should be clear by now that what I am suggesting is not a derivation-indicator theory of mathematical proof (Azzouni 2004). Formalization, in my narrative, is a tool to resolve disputes. I do not believe, however, that a mathematical proof is proof by virtue of indicating a formal derivation. Wittgenstein’s shadow looms too heavily on my thinking to believe such a thing (Wagner 2017, Chapter 3).

But this is still not the whole story about how formalization contributes to consensus in mathematics. I believe that even outside the practice of partial formalization for the resolution of disputes concerning the validity of mathematical proofs formalization plays other constitutive roles in the formation of mathematical practices that are conductive of consensus.

The first impact of formalization on mathematical practice is that it allows mathematicians to agree to disagree on axioms and inference rules. The greatest victory of Hilbert’s program, I believe, is precisely that. Even if we can’t prove the consistency of formal systems, we can relativize our arguments to specific formal systems, and reach agreements on their relative validity, while suspending the question of which system is ‘true’ (in whatever sense of truth). This is precisely the kind of consensus between, for example, constructivists or intuitionists on the one hand, and full-fledged classicists on the other, about the validity of a given proof relative to a given framework as discussed above.

This observation is just a starting point of an overhaul of mathematics initiated by formalization, which is not obstructed by the fact that we hardly ever formalize. The next observation is that many mathematical concepts were redefined and broken down by formalization. Take the continuum, for example. Well into the 19th century, it was clear to many mathematicians (following a tradition going back to Aristotle’s Physics VI) that the line is not the collection of its points. There may be points on a line, but a line did not consist of its points. It was more than that. However, with the set theoretical reconfiguration of mathematics (in its naïve form in this case, following developments in Fourier analysis and the arithmetization of mathematics) the line was reduced to a collection of points. This enabled the transformation of the notion of function from that of motion, natural relation, or formula, into an arbitrary mapping of points to points. In turn, this opened the way to breaking down the notion of smoothness into a plethora of distinct notions, and enabled the industry of monster-functions toward the end of the 19th century (see Bottazzini 1986).

I claim that such a revamping of mathematical objects according to the standards of emergent kinds of formalization rendered mathematical disputes that involved these objects amenable to resolution by formalization. It is as if mathematical objects were adapted to the emergent framework in order to guarantee mathematical consensus. Of course, in reality, this was a circular feedback process, rather than a unidirectional flow.

Another way in which formalization reorganized mathematics is by removing from mathematics everything that could not be formalized. Many questions, which used to be mathematical, but could not be formalized in the emergent frameworks, were exiled to philosophy (e.g. What is a number? Which claims could/should serve as axioms?) or natural science (e.g. is the world continuous or atomic? Euclidean or non-Euclidean?). Mathematics kept only formalizable echoes of these questions (Heintz 2000, 271, hints at this point in the context of an interesting study on the social functions of proofs in mathematics). This rids mathematics of pesky arguments that formalization could not arbitrate. So mathematics was reduced to a realm of arguments that could, in principle, be settled by formalization, becoming a highly consensual discipline – at least as far as the validity of proofs is concerned. This is perhaps why mathematicians sometimes ‘retreat’ to formalist positions when challenged by philosophical problems they do not wish to engage (see Hersh 1997, 39–40).

Finally, mathematics also annexed to its realm new domains that would appear to belong to other sciences. In a sense, throughout the 20th century, mathematics imported almost all aspects of other sciences that could be formalized. For example, mathematics took over the formalizable part of game theory, which is native to psychology and economics, and the formalizable part of computational complexity, which, in practical terms, is a problem of engineering. In this way, mathematics did not only restrict itself to what could be formalized, and hence amenable to consensus by formalization, it also extended itself to more or less everything that could be formalized, and hence consensually arbitrated by formalization, rendering itself practically co-extensive with this specific kind of consensus about the validity of arguments.

The two sides of this revision of the boundaries of mathematics around what could be formalized did not only make mathematics exceptionally consensual about the validity of proofs, it also made it the only realm of knowledge that could obtain consensus via formalization, by taking over the formalizable as its rightful domain. If we think about the impact of formalization in this way, we see how formalization, even if it hardly ever takes place in real mathematical life, can explain something as real and commonplace as mathematical consensus.

7 Conclusion and open questions

This paper argues that contemporary research mathematics is unique in its consensus about the validity of its proofs. I proposed a model for how this consensus is reached, and pointed out the limitations of this consensus. Then I argued that this consensus is not only exceptional compared to other fields of knowledge, but also compared to the history of mathematics. Since consensus seems to emerge around the turn of the 20th century, I conjectured that it may be due to formalization. I discussed the problems of this explanation, and suggested ways around them.

However, the evidence brought in this paper is not sufficient to validate its argument. This paper is meant to open a discussion and trace out a research program, which will hopefully attract the attention of scholars in the fields of history of mathematics, its anthropology and sociology, and the philosophy of mathematical practice. In order to clarify the work that I believe should be done, I present the following critiques of my argument and what might be done to address them.

First, my observations about the contemporary consensus about the validity of mathematical arguments depend on experience, ad-hoc observations, and a handful of relatively high-profile case studies. This is no substitute for proper ethnographic work. To substantiate, refute or refine my observation and model, we need methodologically rigorous interviews with mathematicians or participant observations. We also need a philosophically careful articulation of the kinds of disagreements in mathematics in order to disentangle the question of validity of arguments from other dimensions of disagreement and to enable comparisons with other disciplines (including hybrids, such as applied mathematics, which may foster different standards and dynamics, as Bloor’s 2011 study of aerodynamics suggests).

Second, my historical overview depends, again, on some high-profile case studies. Here, I may be charged with cherry picking. In order to evaluate the scope and kinds of disagreements in the history of mathematics, we need the careful philosophical articulation mentioned above applied to a rigorous empirical review of mathematical journals or correspondences between mathematicians (for the Greek context, Michalis Sialaros is working on a synthetic review of all surviving Greek mathematical disputes, but for other mathematical cultures a sampling scheme or a digital corpus analysis will be necessary).

Third, in order to evaluate my conjecture that formalization explains the kinds of mathematical consensus available today, we need, again, a rigorous empirical corpus analysis or sampling, which would be sensitive enough to distinguish between different kinds of formalization and their impact on different fields of mathematics in the decades around the turn of the 20th century. Based on this analysis, philosophers of mathematical practice may be able to elucidate the impact of formalization on consensus.

Finally, while formalization is presented here as an explanans for consensus, we should dig deeper, and challenge formalization as an explanandum as well. Indeed, in early modernity, when mathematical innovation was often initiated by independent scholars and amateurs, public dispute served as a means to attract attention and promote one’s work. But when pure mathematics became established as a university discipline in the 19th century, it needed a new strategy to justify its position, falling back on its old and distinct reputation as a realm of Euclidean-like epistemological certainty (some background for this process is available in Schneider 1981 and Schubring 1981). Since it was not, in fact, so epistemologically robust, it had to reinvent itself in a more robust manner. To the extent that this narrative holds, we need to understand the institutional and social constraints that brought it about.

Eventually, the resulting picture may be compared and contrasted with less academic forms of mathematical knowledge (e.g. in industry, policy making and education) and with neighboring disciplines. This will allow us to draw a faithful picture of one of the aspects that makes mathematics appear to be so special: its consensus.

Once we have a sound evaluation and explanation of contemporary mathematical consensus, we could de-mystify the epistemological success of mathematics, and contribute to a more realistic image of mathematics in the contemporary realm of knowledge. A better understanding of how mathematics achieves this success will help us renegotiate the social image of mathematics, counter prevalent forms of its over- and under-valuation, and promote a more responsible use of mathematics in our social-scientific-political form of life.