1 From tacking by conjunction to genuine confirmation

Tacking by conjunction is a deep problem of orthodox Bayesian confirmation theory. It is based on the insight that to each hypothesis H that is confirmed by a piece of evidence E one can ‘tack’ an irrelevant hypothesis X so that H∧X is also confirmed by E, in the orthodox Bayesian sense of “confirmation” as probability-raising. In what follows we let Hi range over hypotheses and Ei over pieces of evidence. We always assume that Hi and Ei have non-extreme prior probabilities (P(Hi), P(Ei) ∉ {0, 1}; “P” for probability). By definition, E confirms H iff P(H|E) > P(H), and a proposition A is probabilistically independent from B iff P(A∧B) = P(A)·P(B). Now, tacking by conjunction rests on the following facts that are well-known and easily provable:

The problem of tacking by conjunction (Lakatos, 1970, 46; Glymour, 1981, 67):

  1. (i).

    General case: If E confirms H, then E confirms H∧X, for any conjunct X (‘tacked on’ H) that is probabilistically independent from H and E∧H.

  2. (ii).

    General strict case: If H entails E, then E confirms H∧X, provided P(H∧X) > 0.

To illustrate, according to the general tacking by conjunction problem, Kepler’s empirical laws of elliptic planetary orbits (E) confirm Newtonian gravitation mechanics (H) in conjunction with creationism (X), since Kepler’s laws are entailed by Newtonian gravitation mechanics. Arguably, these sorts of ‘confirmation’ do not accord will the intuitive notion of scientific confirmation. But intuitions are divided: many Bayesian philosophers accept this consequence of their confirmation account.

There is, however, a special case of tacking by conjunction that is particularly counterintuitive. Here the irrelevant hypothesis is directly tacked to the evidence.

Tacking by conjunction - special case:

E confirms E∧X, for every arbitrary X, provided P(E∧X) > 0.

Example 1: a version of Goodman’s paradox: “All observed emeralds are green” confirms “All emeralds are green if observed and blue if not observed”.

Example 2: “I feel good” confirms “I feel good and you feel bad”.

Most philosophers of science would agree that a Goodman-hypothesis that projects the opposite of the evidence to the future should not be called as being ‘confirmed’ by the evidence, and the same applies to the other example. Yet the probability of the hypotheses in these two examples is raised by the evidence. Schurz (2014a) calls this type of probability-raising pseudo-confirmation.

Existing Bayesian solution proposals try to soften the negative impact of this result by showing that although H∧X is confirmed by E, it is so only to a lower degree (cf. Crupi & Tentori, 2010; Fitelson, 2002; Hawthorne & Fitelson, 2004). Although this solution proposal provides an important insight, it suffers from three drawbacksFootnote 1:

  1. (1).

    In regard to the general tacking problem, the proposal does not distinguish between a diminished confirmation due to a partially defective (irrelevant) hypothesis from the weak confirmation of an impeccable hypotheses. For illustration, compare the confirmation of “The dice showed 6” with “The dice showed 4 or 6 and tomorrow it will rain” by the evidence that the dice showed an even number.

  2. (2).

    In application to the special case of the tacking problem in which X is directly tacked to E, one would intuitively expect that “E∧X” is not confirmed at all, but it counts as confirmed according to ‘diminished confirmation’ proposal.

  3. (3).

    The diminished-confirmation proposal is measure-sensitive in the sense that it holds only for some of the prominent Bayesian confirmation measures, but is violated for others (cf. Schippers & Schurz, 2020, see observations 5, 8 and 10).

The idea of genuine confirmation is based on the following observation: E increases the probability of E∧X only because E is a content element of E∧X and increases its own probability to 1 (P(E|E) = 1), but E does not increase the probability of the content element X that logically transcends E, in the sense that X is not entailed by E. Gemes and Earman (Earman, 1992, 98n5) have called this type of pseudo-confirmation “confirmation by (mere) content-cutting”. To avoid this problem one has to require that for E to count as genuine confirmation of a hypothesis, E has to confirm those content parts of the hypothesis that transcend the evidence (Schurz, 2014a).

2 Genuine confirmation: Definition and applications

The notion of genuine confirmation is based on the notions of a content element and content part. A definition of this notion for predicate languages has been given in Schurz (2014a, def. (5)) and Schippers and Schurz (2017, def. 4.2) as follows:

Definition 1

1.1 C is a content element of (hypothesis) H iff

(i). H logically entails C (H |== C),

(ii). C is elementary in the sense that C is not L(ogically) equivalent with a conjunction C1∧C2 of conjuncts both of which are shorter than C, and

(iii). no predicate or subformula in C is replaceable (on some of its occurrences) by an arbitrary new predicate or subformula, salva validitate of the entailment of C by H.

1.2 A content part of H is a non-redundant conjunction of content elements of H.

The following properties of content elements are important:

  1. (1).

    The set of content elements CE(H) of a hypothesis H preserves H’s logical content: CE(H) ==| |== H.

  2. (2).

    The shortness criterion in def. 1(ii) is related to the concept of minimal description length in machine learning (Grünwald, 2000). The criterion is relativized to a language with ¬, ∧, ∨,∃ and ∀ as logical primitives; defined symbols are eliminated.Footnote 2

  3. (3).

    Condition (iii) of def. 1 excludes irrelevant disjunctive weakenings (p |== p∨q) as content elements. For example, every hypothesis H confirmed by E has H∨¬E as an (irrelevant) logical consequence that is disconfirmed by E; but this consequence doesn’t count as a content part, because “¬E” is replaceable in “H |== H∨¬E” salva validitate by any other formula. Therefore, (H∨E)∧(H∨¬E) is not an admissible conjunctive decomposition of H. This avoids the Popper and Miller (1983) objection to inductive confirmation (Schurz, 2014b, 320).

$$ Examples\ of\ content\ elements:\mathrm{CE}\left(\mathrm{p}\right)=\mathrm{CE}\left(\left(\mathrm{p}\vee \mathrm{q}\right)\wedge \left(\mathrm{p}\vee \neg \mathrm{q}\right)\right)=\left\{\mathrm{p}\right\},\mathrm{CE}\left(\mathrm{p}\wedge \mathrm{q}\right)=\mathrm{CE}\left(\mathrm{p}\wedge \left(\mathrm{p}\to \mathrm{q}\right)\right)=\left\{\mathrm{p},\mathrm{q}\right\},\mathrm{CE}\left(\left(\mathrm{p}\to \mathrm{q}\right)\wedge \left(\mathrm{q}\to \mathrm{r}\right)\right)=\left\{\neg \mathrm{p}\vee \mathrm{q},\neg \mathrm{q}\vee \mathrm{r},\neg \mathrm{p}\vee \mathrm{r}\right\},\mathrm{CE}\left(\mathrm{Fa}\right)=\left\{\mathrm{Fa},\exists \mathrm{xFx}\right\},\mathrm{CE}\left(\forall \mathrm{xFx}\right)=\left\{\forall \mathrm{xFx}\right\}\cup \left\{{\mathrm{Fa}}_{\mathrm{i}}:{\mathrm{a}}_{\mathrm{i}}\ \mathrm{an}\ \mathrm{individual}\ \mathrm{constant}\right\}\cup \left\{\exists \mathrm{xFx}\right\}. $$

Other technical definitions of content elements are possible - examples are Friedman's (1974) “independently acceptable elements”, Gemes' (1993) “content parts” and Fine's (2017) “verifiers”. The technical details don’t matter as long as the core idea is captured, namely the decomposition of a hypothesis into a set of smallest content elements.

The notion of genuine confirmation, henceforth abbreviated as GC, has been explicated in three versions: full (qualitative) GC, partial (qualitative) GC and quantitative GC (cf. Schippers & Schurz, 2020):

Definition 2

Assume E does not entail H.Footnote 3 Then:

2.1 Full (qualitative) GC: E fully genuinely confirms H iff (i) P(C|E) > P(C) holds for all E-transcending content parts C of H.

2.2 Partial (qualitative) GC: E partially genuinely confirms H iff (i) E fully genuinely confirms at least some E-transcending content part of H, and (ii) E does not diminish the probability of any content part of H.

2.3 Quantitative GC: The degree of GC that E provides for H is the sum of the confirmation degrees, conf(E,C), over all pairwise non-equivalent E-transcending content parts C of H, divided by their number (where “conf(E,H)” is one of the standard Bayesian confirmation measures).

Note that the notion of genuine confirmation has to be formulated not merely in terms of content elements, but in terms of content parts, as it may be that H = H1∧H2 and the evidence E confirms both H1 and H2, but disconfirms the conjunction H1∧H2.

In Schippers and Schurz (2020) it is shown that genuine confirmation has a number of attractive features that will be briefly summarized. Two further important applications of this notion, the handling of post facto speculations and in particular the enabling of Bayesian convergence to certainty, are elaborated in the remainder of this paper.

Applications of quantitative GC

Quantitative GC solves the problem of measure sensitivity mentioned above. Schippers and Schurz (2020) demonstrate that the diminishing effect on the genuine confirmation of hypotheses containing irrelevant conjuncts holds for the quantitative GC-measures obtained with all pertinent quantitative Bayesian confirmation measures (ibid., see observations 6, 9 and 11). Also, note that partial GC implies positive quantitative GC; thus the qualitative and quantitative notion of GC are in coherence.

Applications of partial (qualitative) GC

Partial (qualitative) genuine confirmation rules out the special case of tacking by conjunction in which the irrelevant hypothesis X is directly tacked on the evidence. This includes the important subcase of Goodman-type counter-inductive generalizations. To make this precise, let “Ox” stand for “x is observed”, “Ex” for “x is an emerald”, “Gx” for “x is green” and “Bx” for “x is blue”. So the evidence E is L(ogically) equivalent with ∀x(Ex∧Ox→Gx). Now, let “I” stand for the inductive projection of E, “if x is an unobserved emerald, it is green” (Ex∧¬Ox→Gx) and “CI” for the respective counter-inductive projection, “if x is an unobserved emerald, then it is blue” (Ex∧¬Ox→Bx). So the inductive generalization of the evidence E is L-equivalent with E∧I and the counter-inductive (Goodman-type) generalization of E is L-equivalent with E∧CI. This representation of the inductive and counter-inductive generalization of E makes it clear why they can be subsumed under the special case of tacking by-conjunction. Although the probability of both hypotheses is raised by E, none of the two is genuinely confirmed by E, even not partially, because E neither confirms I nor CI, as long as the probability measure P does not satisfy special inductive principles that go beyond the basic probability axioms. On the other hand, if P satisfies additional inductive principles, such as de Finetti’s principle of exchangeability (invariance of P under permutation of individual constants), then P(I|E) > P(I) and P(CI|E) < P(CI) holds, thus E genuinely confirms E∧I (since E confirms I) and genuinely disconfirms E∧CI (since E disconfirms CI).

For partial GC is it not only required that the hypotheses H must have an E-transcending content part C whose probability is raised by E, but stronger, that C must be fully confirmed by E (see definition 2.2).Footnote 4 The idea underlying this requirement is that H must have at least one E-transcending content part that can be sustainably confirmed in the sense of enabling convergence to certainty (section 4), and this requires the full GC of this content part. An important application of partial GC that would not be possible without this requirement is the treatment of so-called ‘pure post-facto speculations’ explained in section 3.

Applications of full (qualitative) GC

Full GC seems to be a rather strong condition. Nevertheless it turns out that this condition has important applications. Full GC guarantees that the probability increase that E conveys to H spreads to all content parts of H. This is neither true for ordinary Bayesian confirmation nor for merely partial GC. For practical applications to scientific knowledge this feature is of obvious importance. Scientists base their predictions on those hypotheses that are well confirmed, which means that they infer relevant consequences, i.e. content parts, from them, thereby assuming that these consequences are themselves well confirmed. This assumption presupposes that confirmation is closed under content parts.

The most important application of full GC is its function as a necessary condition for Bayesian convergence to certainty. This application is worked out in sections 4–5.

3 Partial GC and pure post-facto speculations

In this section we explain why pure post-facto speculations are not genuinely confirmed by the evidence towards which they are fitted. In the simplest case, a pure post-facto speculation explains an evidence E in hindsight by a postulated unobservable ‘power’ (ψ) that produced it. More generally speaking, a hypothesis H is a pure post-facto speculation in regard to a given body of evidence E, iff (i) H contains theoretical concepts ψ that are not present in the evidence E, and (ii) whose values have been obtained by fitting the values of a theoretical variable X (of an underlying unfitted background hypothesis Hunfit) towards E so that H implies E, but (iii) Hunfit could have been equally fitted to any other possible evidence of the same type. Condition (iii) is necessary for calling H a pure post-facto speculation; if Hunfit cannot be fitted to some possible alternative evidence E*, then it contains at least some independent (use-novel) empirical content. In this case H is a post-facto speculation that is not a ‘pure’ one, but can be partially genuinely confirmed.

To illustrate how a pure post-facto speculation works, consider the pseudo-explanation of the following fact.

(E):

there is an economic recession

by the hypothesis

(H):

God wants that there is an economic recession and whatever God wants, happens, formalized W(E) ∧ ∀X(W(X) → X),

where “W(X)” stands for “God wants that X” and X is a (logical) second-order variable ranging over propositions. Since H entails E and P(H∧E) > 0, E confirms H in the orthodox Bayesian sense. Of course, there are not only religious kinds but many other kinds of pure post-facto speculations, for example ‘conspiracy’ theories based on various sorts of hidden super-powerful intentional agents.

In the hypothesis H, God’s wish-that-E is the theoretical (latent) parameter that has been obtained by fitting the theoretical variable X (God’s wishes) post-facto towards the evidence E, the economic recession. The unfitted background hypothesis in this example is the hypothesis.

(Hunfit):

There is some X that God wants, and whatever God wants, happens, formalized as ∃XW(X)∧∀X(W(X) → X).

Thus we propose that Hunfit is generated from H by existentially quantifying over H’s theoretical variable. H is the result of fitting Hunfit to E; so we may also write H = Hfit. The fitting operation consists in omitting the existential quantifier and replacing the free variable X by the proposition that one wants to explain. Hunfit is a content part of H, since H entails Hunfit. Observe that Hunfit is not a simple conjunct of H; the considered case is more intricate than a tacking-by-conjunction.

What is important: the unfitted background hypothesis Hunfit could be fitted equally well to any other possible alternative evidence E’ whatsoever. This implies that Hunfit cannot increase the prior probability of any possible alternative evidence; thus P(E|Hunfit) = P(E) and P(E’|Hunfit) = P(E’) (for any E’) must hold (Schurz, 2014a, sec. 3.3). This in turn entails P(Hunfit|E) = P(Hunfit), i.e., Hunfit’s probability cannot be raised by E. Hunfit is a conjunction of two content elements, C1: ∃XW(X) and C2: ∀X(W(X) → X). Since C1 can be fitted equally well to any other possible evidence E’, unconditionally as well as conditionally on C2, it follows that neither the probability of C1 nor that of C2 is raised by E.Footnote 5

In conclusion, neither Hunfit nor its conjuncts are confirmed by E in the ordinary Bayesian sense. In contrast, H is ordinarily confirmed by E. But it is not genuinely confirmed, not even partially. To see this, note that H has the form W(E)∧C2. H’s E-transcending content parts are W(E), C2 and W(E)∧C2. C2 is even not ordinarily confirmed by E. W(E) is not fully genuinely confirmed by E, since its content element C1 is not ordinarily confirmed. For the same reason, W(E)∧C2 is not fully genuinely confirmed by E. So H is not partially genuinely confirmed by E. This means, for example, that creationism is not genuinely confirmed by a list of post-facto ‘explanation’ of the form “Ei happened because Got wanted it and whatever God wants, happens” (for i = 1, 2, …), even not in a partial sense, and even not if this list is very long. This is the intended result.

Only if the fitted hypotheses H is confirmed by a second piece of evidence E* to which Hunfit has not been fitted and which H could have predicted (in the epistemic sense of ‘prediction’), then Hunfit can said to be confirmed via the confirmation of H by E and E*. For obviously it is not possible to fit Hunfit to a particular evidence E and then to confirm the so-obtained H by any other evidence E* whatsoever. The evidence E* is use-novel in the sense of Worrall (2006). Therefore this consideration provides a Bayesian justification of a special version of Worrall’s idea of independent confirmation by use-novel evidence – a version for which it is essential that the hypothesis contains a theoretical concept that is not part of the evidence.Footnote 6

In contrast to post-facto speculations, accepted scientific hypotheses involving theoretical concepts are highly confirmed by use-novel evidence. An example is the chemical explanation of the combustion of inflammable materials such as wood in air by the hypothesis HOx = “wood has carbon which, when ignited, reacts with oxygen in the air, and this reaction underlies combustion” (with HOx,unfit arising from HOx by replacing the theoretical concepts “carbon” and “oxygen” by existentially quantified variables). A variety of use-novel empirical predictions are derivable from H in the given chemical background theory, for example, P* = “combustion of wood produces carbon dioxide and consumes oxygen, which impedes breathing and increases the greenhouse effect”. If P* is confirmed by new evidence E*, then H is genuinely confirmed by {E,E*}. However, also outdated scientific theories that in the light of contemporary evidence are false were often successful at their time, i.e., entailed use-novel empirical consequences and, thus, were not post-hoc. An example is the phlogiston hypothesis of combustion, HPhlog, according to which inflammable materials contain a specific substance named phlogiston which when ignited leaves the wood in the form of a hot flame. Phlogiston theory successfully predicted the processes of calcination (roasting) of metals and of salt-formation of metals in acids, as well as the inversion of these reactions (Ladyman, 2011; Schurz, 2009). Nevertheless the existence of phlogiston was later rejected because of various conflicts with observations, e.g., the fact that some substances gain weight after losing their phlogiston.

The above argument that the probability of Hunfit is raised by use-novel probability-increasing evidence was based on intuition. Probability theory itself does not tell us how the probability increase of a hypothesis by an evidence spreads to its content parts. Based on the above considerations we suggest the following rationality criteria for the spread of the evidence-induced probability increase from H to its E-transcending content elements:

Necessary criteria for spread of probability increase

If H increases E’s probability, then the resulting probability increase of H by E spreads to an E-transcending content element C of H (P(C|E) > P(C)) only if:

(1). C is necessary within H to make E probable, i.e., there exists no conjunction H* of content elements of H that makes E at least equally probable (P(E|H*) ≥ P(E|H)) but does not entail C, and

(2). it is not the case that C is an existential quantification, C = ∃XH(X), and H results from fitting the value of X in H(X) towards E, such that an equally good fitting of H(X) would have been possible for every possible alternative evidence E’.

We finally note that the use-novelty criterion is not a purely philosophical invention. A statistical method corresponding to the use-novelty criterion is cross-validation (Shalev-Shwartz & Ben-David, 2014, sec. 11.2). Here one starts with one (big) data set E, splits E randomly into two disjoint data sets E1 and E2, fits the unfitted hypothesis to E1 and tests the resulting fitted hypothesis with E2. By repeating this procedure and calculating the average probability of E2 conditional on H-fitted-to-E1, one obtains a highly reliable confirmation score. An important domain of this method is curve fitting (applied to statistically correlated variables X, Y with values xi, yi). Here, one approximates a finite set of data points E = {<xi,yi>: 1 ≤ i ≤ n} by an optimal curve Y = f(X) with a remainder dispersion around it as small as possible. It is a well-known fact that any set of data points can be approximated by fitting the parameters ci of a polynomial function H: Y = c0 + c1·X + … + cn·Xn of a sufficiently high degree n. Here, the existential quantification over this function, ∃c0…cnH(c0,…,cn), plays the role of Hunfit. Merely fitting the parameters of Hunfit to the data set E is not enough for confirming it. The approximation success of a high-degree polynomial may also be due to overfitting the data, i.e., Hunfit may have been fitted on random accidentalities of the sample (cf. Hitchcock & Sober, 2004). Only if the curve H with its parameters fitted towards E successfully approximates a use-novel data set E*, one to which its parameters have not been adjusted, then it is genuinely confirmed by E and E*.

4 Full GC and Bayesian convergence to certainty

An important part of Bayesian epistemology are convergence theorems. According to them the conditional probability of a hypotheses can be driven to near certainty, if many confirming pieces of evidence for this hypotheses are accumulated (Earman, 1992, 141ff.). Most versions of Bayesian convergence theorems have been formulated for hypotheses not containing theoretical concepts (or latent variables), typically hypotheses that are obtainable from the evidence by enumerative induction. For example, if P is countably additive, then limn→∞P(p(Fx) = r | (E1,…,En)) = 1, where each Ei is Fai or ¬Fai and F’s frequency limit in the Ei-sequence is r (which is a consequence of the theorem of Gaifman & Snir, 1982). More important, however, are convergence theorems for hypotheses containing theoretical concepts. A well-known convergence result for this case is based on the confirmation by an increasing number of conditionally independent pieces of evidence, as follows:

Theorem 1 - convergence to certainty

Assume an infinite sequence of pieces of evidence E1,E2,… and a hypothesis H satisfying the following conditions (recall that P(H), P(Ei) ∉ {0,1}):

(a). H makes each piece of evidence more probable than ¬H by an amount of at least δ, i.e. for each i ∈ω: P(Ei|H) ≥ P(Ei|¬H) + δ,

(b). each piece of evidence, En, is predecessor-independent conditional on H, in the sense that for every n∈ω, P(En|H∧E1∧…∧Εn-1) = P(En|H),

(c). and likewise En is predecessor-independent conditional on ¬H.

Then:

(i). For all n∈ω: P(H|E1∧…∧En) ≥ \( \frac{\mathrm{P}\left(\mathrm{H}\right)}{\mathrm{P}\left(\mathrm{H}\right)+\mathrm{P}\left(\neg \mathrm{H}\right)\cdotp {\left(1-\updelta \right)}^{\mathrm{n}}} \).

(ii). limn→∞P(H|E1∧…∧En) = 1.

Corollary 1

In the special case where for each i∈ω, P(Ei|H) = p and P(Ei|¬H) = q < p, condition (i) turns into the equality: P(H|E1∧…∧En) = \( \frac{\mathrm{P}\left(\mathrm{H}\right)}{\mathrm{P}\left(\mathrm{H}\right)+\mathrm{P}\left(\neg \mathrm{H}\right)\cdotp {\left(\mathrm{q}/\mathrm{p}\right)}^{\mathrm{n}}} \).

Corollary 2

If the pieces of evidence are mutually independent on H and ¬H, in the sense that P(En| ± H∧Ε} holds for every ±H ∈ {H,¬H}, n∈ω, En and conjunction E of n-1 Ei’s, then results (i) and (ii) of theorem 1 hold for every sequential ordering of the pieces of evidence.

Proof see appendix.

Convergence to certainty in spite of a small prior probability is the ideal case of scientific confirmation. The confirmation of Darwinian evolution theory by multiple pieces of evidence constitutes an example. In theorem 1, the likelihoods P(Ei|H) may be different for the different pieces of evidence and may be even smaller than 0.5; all what is required about them is condition (a). Corollary 1 expresses the special case where the likelihoods are equal and given by p and q; this result is found in Bovens and Hartmann (2003, 62, (3.17) and (3.19).Footnote 7 Theorem 1 is related to Condorcet’s jury theorem for conditionally independent witnesses, where Ei stands for “witness i has reported that H” (Bovens & Hartmann, 2003, present corollary 1 in this context). While theorem 1deals with the special case in which all informants (pieces of evidence Ei) are speaking in favor of H, Condorcet’s theorems considers a majority of ‘correct’ informants (Ei) and a minority of incorrect ones (¬Ei). To handle this case Condorcet’s theorem assumes that P(Ei|H) = P(¬Ei|¬H) > 1/2 > P(¬Ei|H) = P(Ei|¬H), which entails a version of corollary 1 where n stands for the surplus of correct over incorrect informants (cf. the derivation in List, 2004, and in Bovens & Hartmann, 2003, 82).

Surprisingly it turns out that a necessary condition for convergence to certainty is full GC. The existence of only one E-transcending content element of H, call it C, that is not confirmed by any elements of the evidence sequence is sufficient to prevent convergence to certainty. Since C’s probability is not raised by any conjunction of Ei’s, it holds that P(C|E1∧…∧En) = P(C). But P(C|E1∧…∧En) = P(C) is an upper bound of P(H|E1∧…∧En), since H entails C. Therefore, P(H|E1∧…∧En) cannot approach certainty but is forced to stay below P(C), which is small.

Theorem 2- failure of convergence to certainty

Footnote 8 Assume an infinite sequence of pieces of evidence E1,E2,… and a hypothesis H that satisfies condition (a) of theorem 1, but contains an irrelevant evidence-transcending content part C, i.e., one that is not confirmed by any (consecutive) conjunction of pieces of evidence: P(C|E1∧…∧En) = P(C) for all k∈ω and n∈ω. Then:

(i). limn→∞P(H|E1∧…∧En) ≤ P(C), and

(ii). a failure of condition (b) or of condition (c) occurs for some piece of evidence Em+1 with m ≤ |log(P(H)·P(¬C)/P(¬H)·P(C))|/|log(1-δ)|, and occurs infinitely many times thereafter.

Corollary 3

If a hypothesis H satisfies conditions (a), (b) and (c) of theorem 1, then H is fully genuinely confirmed by the evidence sequence, in the sense that every of its content parts is confirmed by at least some conjunction of pieces in the sequence.

Proof see appendix.

Theorem 2 is not in conflict with continuous probability increase in the sense that P(H|En+1∧En∧…∧E1) > P(H|En∧…∧E1). Assume the hypothesis contains an irrelevant E-transcending content part C (in the sense of theorem 2); in the simplest case, it has the form “H∧C”, where H is fully genuinely confirmed by the Ei and the priors of H and C are low. Then H’s probability is continuously increasing conditional on accumulating evidence, however, it does not converge to 1, but to P(C). This is illustrated in Fig. 1.

Fig. 1
figure 1

Continuous probability increase of a hypothesis H that is fully genuinely confirmed by an evidence sequence, and of H tacked-on with an irrelevant conjunct C

In conclusion, full GC is a precondition for sustainable confirmation, in the sense of convergence to certainty conditional on accumulating independent evidence. The proof of theorem 2(i) is obvious, but that of 2(ii) is not trivial. All what can be inferred from theorem 2(i) and theorem 1 is that after some finite number of pieces of evidence, whose upper bond can be calculated from theorem 1(i), condition (b) or condition (c) fails, and thereafter the conjunction of (b) and (c) must fail, not necessarily always, but infinitely many times (since otherwise theorem 1(i) would still hold).Footnote 9 It would be good to know which of the two conditions, (b) or (c), typically fails. Moreover, we would like to know under which strengthened assumptions this condition must fail for each piece of evidence. Both question are answered by the following observation. Whenever the negation of the hypothesis, ¬H, can be decomposed into a partition of finer hypotheses that convey different probabilities to the evidence, and conditional on which the pieces of evidence are independent, then independence of the pieces of evidence conditional on ¬H must fail. This observation is formally proved in theorem 3 for an arbitrary hypothesis; thus the observation applies likewise to the positive hypothesis H. However, in many typical cases the positive hypotheses specifies an approximately causally complete scenario for the evidence, in which case such a partition does not exist. The scenario described by the negated hypotheses, however, will typically be causally incomplete, leaving open several different alternative causes conveying different probabilities to the evidence, in which case conditional independence must fail for ¬H. Here is an informal explanation of this result: Let ¬H split into two disjoint hypotheses N1, N2, i.e. P(E1|¬H) = P(E1|N1\( \overset{\cdotp }{\vee } \)N2) < P(E1) (since P(E1|H) > P(E1)). Assume that P(Ei|N1) is much larger than P(Ei|N2) (for all i). Then P(E2|¬H∧E1) > P(E2|¬H) will hold, because the fact that E1 obtains makes it more probable that N1 and not N2 obtains, which in turn makes E2 more probable. For illustration, assume that H is the above-mentioned hypothesis “God exists and wants a recession”. Then ¬H is equivalent with the exclusive disjunction N1\( \overset{\cdotp }{\vee } \)Ν2, where N1 stands for “God does not exist” and N2 for “God does not want a recession”. Let Ei stand for “there is a recession in country i”. Then the condition Ei makes it more probable that N2 is false but N1 is true, and this diminishes the lowering effect of ¬H (that it is false that God wants a recession) on the probability of a recession in some other country (Ej), i.e. P(E2|¬H∧E1) > P(E2|¬H) holds. This insight is elaborated in the next two theorems.

Theorem 3

Assume an evidence sequence (E1, E2,…) and a hypothesis H (with P(H), P(Ei) ∉ {0,1}) that decomposes analytically into a disjunction of fine-grained hypotheses H1, H2 conveying different probabilities to the pieces of evidence, which in turn are predecessor-independent conditional on H1 and on H2. Or formally: H ↔ H1\( \overset{\cdotp }{\vee } \)H2, P(Ei|H1) > P(Ei|H2) (for all i∈ω), and P(En|Hk∧E1∧…∧Εn-1) = P(En|Hk) for all k ∈{1,2} and n ∈ ω.

Then for all n ≥ 2, P(En|H∧E1∧…∧Εn-1) > P(En|H) holds, i.e., conditional independence fails.

Proof see appendix.

We finally apply theorem 3 to the case of a hypothesis H containing a (strongly) irrelevant content element C: H = H1∧C, where H1 satisfies the conditions of theorem 1 but both C and ¬C are irrelevant to the evidence, conditional on H1 and ¬H1. In this case the negation of the hypothesis, ¬(H1∧C) splits into the finer partition H1∧¬C and ¬H1. While P(Ei|¬H1) < P(Ei) holds, H1∧¬C increases Ei’s probability, and this destroys the independence of the pieces of evidence conditional on ¬(H1∧C).

Theorem 4

Assume an infinite sequence of pieces of evidence E1,E2,… and a hypothesis H1 satisfying the conditions of theorem 1. Consider the hypothesis H = H1∧C, where P(C|H) ∉ {0,1} and C and ¬C are irrelevant to (consecutive) evidence conjunctions conditionally on H1 and on ¬H1, i.e., P(E1∧…∧En|±H1∧±C) = P(E1∧…∧En|±H1) holds for all n∈ω, ±H1 ∈ {H1,¬H1} and ±C ∈ {C,¬C}.

Then:

(i). ¬H decomposes into the partition ¬H1\( \overset{\cdotp }{\vee } \)(H1∧¬C) that satisfies the assumption of theorem 3, and

(ii). ¬H violates conditional independence (condition (c) of theorem 1) for every piece of evidence En with n ≥ 2.

Proof see appendix.

Theorem 4 demands the irrelevance of C conditional on ±H1, which is stronger than the unconditional irrelevance of C that is entailed by the assumption of theorem 2. If C is unconditionally irrelevant but conditionally relevant to the evidence, the conclusion of theorem 4 may fail. In addition, theorem 4 requires the irrelevance of ¬C conditional on ±H1 (which given the assumptions is equivalent with the unconditional irrelevance of C).

5 A generalized convergence-to-certainty theorem: Conclusion and outlook

Often it will be the case that the negation of a given hypothesis decomposes into a long disjunction of possible alternative hypotheses that convey different probabilities to the evidence. Since theorem 4 applies to these cases, this theorem seems to imply a severe restriction of the applicability of standard versions of the convergence-to-certainty theorem 1, and a similar diagnosis applies to Condorcet jury theorems. It would seem that these theorems could no longer be applied to realistic cases of hypotheses.

Fortunately one can devise a generalized version of theorem 1 that is relativized to a given possibly large partition of hypotheses that are assumed to be sufficiently fine-grained to guarantee mutual conditional independence of the pieces of evidence. What one needs to assume in this case is that the hypothesis under consideration conveys a higher likelihood to the pieces of evidence, not only compared to its negation, but compared to any one of the alternative hypotheses:

Theorem 5 - generalized convergence to certainty

Assume an infinite sequence of pieces of evidence E1, E2,… and a hypothesis H1 that belongs to a partition of hypotheses {H1,…,Hm} satisfying the following conditions (where P(Hk), P(Ei) ∉{0,1}):

(a). H1 makes each piece of evidence more probable than every other hypothesis by at least δ (for some δ > 0), i.e., P(Ei|H1) ≥ P(Ei|Hk) + δ for all k > 1 and i∈{1,…,n}, and

(b). the pieces of evidence are mutually independent conditional on every Hk. i.e., P(En|Hk∧E1∧…∧Εn-1) = P(En|Hk) holds for all k ∈ {1,…,m} and n∈ω.

Then the conclusion (i) and (ii) of theorem 1 holds for H = H1.

Corollary 4

In the special case where for each i∈ω, P(Ei|H1) = p and P(Ei|Hk) = q < p for all k∈{2,…,m}, the conclusion of corollary 1 holds for H = H1.

Proof see appendix.

If we apply theorem 5 to hypotheses that are conjunctions of several content elements, H = H1∧…∧Hk, then the smallest partition of competing hypotheses that has to be checked in regard to conditional independence of the evidence is the partition{±H1∧…∧±Hk: ±Hi ∈ {Hi,¬Hi}, 1 ≤ i ≤ k}, which contains 2k elements. In conclusion, for complex hypotheses consisting of many content elements, the preconditions of Bayesian convergence to certainty are much stronger than usually presented.

6 Appendix: Proof of theorems

6.1 Proof of theorem 1 and corollaries 1 and 2

Theorem 1 and corollary 1 follow from theorem 5 and corollary 4 by assuming m = 2 and substituting {H,¬H} for {H1,…,Hm}. Then all conditions of theorem 1 hold. The conclusion of theorem 1 is identical with the conclusion of theorem 5 (for H = H1), and the conclusion of corollary 1 is identical with the conclusion of corollary 4. Corollary 2 is an obvious consequence of theorem 1. Q.E.D.

6.2 Proof of theorem 2 and corollary 3

For theorem 2

Since H entails C, P(H|E1∧…∧En) ≤ P(C|E1∧…∧En) must hold for any n∈ω. By irrelevance of C, P(C|E1∧…∧En) = P(C), which implies P(H|E1∧…∧En) ≤ P(C). So all member of the sequence (P(H|E1∧…∧Ei): i∈ω) are smaller than P(C), which implies (i) limn→∞P(H|E1∧…∧En) ≤ P(C). Thus the consequence (ii) of theorem 1 fails (since by assumption P(C) < 1). Since condition (a) of theorem 1 holds, this entails, by simple Modus Tollens, that either condition (b) or condition (c) must fail for at least some members of the evidence sequence. By exploiting result (i) of theorem 1 we can prove an upper bound of the number m of pieces of evidence that may occur before condition (b) or (c) are violated the first time. The upper bound of this number is given by the equation

  • \( \frac{\mathrm{P}\left(\mathrm{H}\right)}{\mathrm{P}\left(\mathrm{H}\right)+\mathrm{P}\left(\neg \mathrm{H}\right)\cdotp {\left(1-\updelta \right)}^{\mathrm{m}}} \) ≤ P(C), which gives by algebraic transformations:

  • P(H) + P(¬H)·(1-δ)m\( \frac{\mathrm{P}\left(\mathrm{H}\right)}{\mathrm{P}\left(\mathrm{C}\right)} \), thus P(¬H)·(1-δ)m\( \frac{\mathrm{P}\left(\mathrm{H}\right)-\mathrm{P}\left(\mathrm{H}\right)\cdotp \mathrm{P}\left(\mathrm{C}\right)}{\mathrm{P}\left(\mathrm{C}\right)} \), so

  • (1-δ)m\( \frac{\mathrm{P}\left(\mathrm{H}\right)\cdot \mathrm{P}\left(\neg \mathrm{C}\right)}{\mathrm{P}\left(\neg \mathrm{H}\right)\cdot \mathrm{P}\left(\mathrm{C}\right)} \); thus m·log(1-δ) ≥ log\( \left(\frac{\mathrm{P}\left(\mathrm{H}\right)\cdotp \mathrm{P}\left(\neg \mathrm{C}\right)}{\mathrm{P}\left(\neg \mathrm{H}\right)\cdotp \mathrm{P}\left(\mathrm{C}\right)}\right) \).

Since P(C) ≥ P(H), P(¬H) ≥ P(¬C) follows, so \( \frac{\mathrm{P}\left(\mathrm{H}\right)\cdotp \mathrm{P}\left(\neg \mathrm{C}\right)}{\mathrm{P}\left(\neg \mathrm{H}\right)\cdotp \mathrm{P}\left(\mathrm{C}\right)} \) ≤ 1 and the log at the right side is negative; likewise log(1-δ) is negative. Division through the negative log(1-δ) turns ≥ into ≤ and we obtain m ≤ |log \( \frac{\mathrm{P}\left(\mathrm{H}\right)\cdotp \mathrm{P}\left(\neg \mathrm{C}\right)}{\mathrm{P}\left(\neg \mathrm{H}\right)\cdotp \mathrm{P}\left(\mathrm{C}\right)} \) | / |log(1-δ)| (“|…|” for “absolute value”), as announced. After the first failure of condition (b) or (c) at position m+1, the conjunction of (b) and (c) need not fail always, but it must fail infinitely many times. For if it would fail only finitely many times, the proof of theorem 1 and 5 would still go through. To see this, consider the following line in the proof of theorem 5 (annotated by “(by factoring out)”):

$$ \mathrm{P}\left({\mathrm{H}}_1|{\mathrm{E}}_1\wedge \cdots \wedge {\mathrm{E}}_{\mathrm{n}}\right)\ge \frac{\mathrm{h}\cdot \prod \left\{{\mathrm{p}}_{\mathrm{i}}:1\le \mathrm{i}\le \mathrm{n}\right\}}{\mathrm{h}\cdot \Pi \left\{{\mathrm{p}}_{\mathrm{i}}:1\le \mathrm{i}\le \mathrm{n}\right\}+\sum \left\{\mathrm{P}\left({\mathrm{H}}_{\mathrm{r}}\right):1\le \mathrm{r}\le \mathrm{m},\mathrm{r}>1\right\}\cdot \Pi \left\{\left({\mathrm{p}}_{\mathrm{i}}-\delta \right):1\le \mathrm{i}\le \mathrm{n}\right\}} $$

If there were a merely finite subsequence of pieces of evidence for which (b)∧(c) fails, we could re-index the evidence sequence by enumerating all members for which condition (b)∧(c) holds. In the equation for this modified situation, we would have to multiply in the above equation the product Π in the numerator and in the left side of the denumerator by a small number x, with 0 < x ≤ 1, where x is the product of the terms P(En|Hk∧E1∧…∧En-1) for the finitely many En’s for which (b) fails (note that each of these terms must be greater zero, since otherwise the terms P(En*|Hk∧E1∧…∧En*-1) for later n* > n for which (b)∧(c) holds could not be greater than zero). Likewise the right side of the denumerator would have to be multiplied by a small number y, with 0 < y ≤ 1, being the product of the terms P(En|¬Hk∧E1∧…∧En-1) for the finitely many En’s for which (c) fails. This multiplication just amounts to a change of the priors; but since the theorem holds for all priors, it would still go through.

Corollary 3 is an obvious consequence of theorem 2 and the definition of “full genuine confirmation”.

6.3 Proof of theorem 3

Take some n∈ω and consider P(En|H∧E1∧…∧Εn-1). For simplicity we abbreviate En as E2 and E1∧…∧Εn-1 as Ε1. We have to derive P(E2|(H1\( \overset{\cdotp }{\vee } \)H2)∧E1) > P(E2|H1\( \overset{\cdotp }{\vee } \)H2) from our assumptions. By treating “P(−|E1)” as a probability function and simple computation, the left term of this inequality is identical with

  1. (a)

    [P(E2∧Η11) + P(E2∧Η21)] / [P(H1|E1) + P(H2|E1)];

and the right term is identical with

  1. (b)

    [P(E2∧Η1) + P(E2∧Η2)] / [P(H1) + P(H2)].

By the multiplication theorem and since the pieces of evidence are sequentially independent conditional on H1 and on H2, we may transform (a) and (b) equivalently into

(a’) [P(E2|H1)·P(Η11) + P(E2|H2)·P(Η21)] / [P(H1|E1) + P(H2|E1)], and

(b’) [P(E2|H1)·P(Η1) + P(E2|H2)·P(Η2)] / [P(H1) + P(H2)].

We abbreviate hi = P(Hi) and hi* = P(Hi|E1); w = h1/(h1+h2), w* = h1*/(h1*+h2*) (“w” for “weight”). Note that h2/(h1+h2) = 1–w and h2*/(h1*+h2*) = 1–w*. So we can reformulate the terms (a’) and (b’) as follows:

(a”) P(E2|H1)·w* + P(E2|H2)·(1–w*), and

(b”) P(E2|H1)·w + P(E2|H2)·(1–w).

Since (a”) = P(E2|(H1\( \overset{\cdotp }{\vee } \)H2)∧E1) and (b”) = P(E2|H1\( \overset{\cdotp }{\vee } \)H2), we must show that (a”) > (b”). By our assumptions it holds that P(E2|H1) > P(E2|H2) (and thus P(E2|H1) > 0). Therefore we can prove (a”) > (b”) by showing that w* > w. This latter claim follows from our assumption that P(E1|H1) > P(E1|H2), which implies P(H1|E1)/P(H1) > P(H2|E1)/P(H2), i.e. h1*/h1 > h2*/h2, and therefore h1*/h2* > h1/h2. This entails by simple algebra that w* = h1*/(h1*+h2*) > w = h1/(h1+h2). Q.E.D.

6.4 Proof of theorem 4

We prove that the partition (H1∧¬C)\( \overset{\cdotp }{\vee } \)¬H1 of ¬H satisfies the conditions of theorem 3 (thus, the ¬H of theorem 4 instantiates the H of theorem 3). From this it follows that our ¬H violates conditional independence (condition (c) of theorem 1) for every piece of evidence En with n ≥ 2. To show that the conditions of theorem 3 are satisfied, we must prove:

  1. (a).

    P(Ei|H1∧¬C) > P(Ei|¬H1) holds for all i∈ω.,

  2. (b).

    the pieces of evidence are predecessor-independent conditional on ¬H1,

  3. (c).

    they are predecessor-independent conditional on H1∧¬C.

For (a): For all i∈ω: P(Ei|H1∧¬C) = P(Ei|H1) holds by the irrelevance of ¬C conditionally on H1. (Since we required P(H1∧¬C) > 0, P(Ei|H1∧¬C) is defined.) Moreover P(Ei|H1) > P(Ei) > 0 since H1 satisfies the conditions of theorem 1. This implies that P(Ei|H1) > P(Ei|¬H1), and therefore that P(Ei|H1∧¬C) > P(Ei|¬H1).

For (b): This holds since H1 satisfies the conditions of theorem 1.

For (c): We consider P(En|H∧E1∧…∧Εn-1) (for some n∈ω) and abbreviate En as E2 and E1∧…∧Εn-1 as Ε1. We must show that P(E2|H1∧¬C∧E1) = P(E2|H1∧¬C). (Since P(H1∧¬C) > 0 and P(E1|H1∧¬C) = P(E1|H1) > 0 by (a) above, P(H1∧¬C∧E1) > 0 and P(E2|H1∧¬C∧E1) is defined). We have

P(E2|H1∧¬C∧E1) = P(E2∧E1|H1∧¬C) / P(E1|H1∧¬C) by probability calculus (treating “P(−| H1∧¬C)” as a probability function), and

P(E2∧E1|H1∧¬C) = P(E2∧E1|H1) as well as P(E1|H1∧¬C) = P(E1|H1), by irrelevance of ¬C to evidence conjunctions conditional on H1. Therefore:

P(E2|H1∧¬C∧E1) = P(E2∧E1|H1) / P(E1|H1) = …

We continue as follows:

= P(E2|H1∧E1) (by probability calculus)

= P(E2|H1) (since H1 satisfies condition (b) of theorem 1)

= P(E2|H1∧¬C) (by conditional irrelevance of ¬C conditional on H1). Q.E.D.

6.5 Proof of theorem 5

We abbreviate P(Ei|H1) as pi and write Σ{x1,…,xn} and Π{x1,…,xn} for the sum and the product of the numbers x1,…,xn, respectively. We calculate as follows. By Bayes’ theorem:

$$ \mathrm{P}\left({\mathrm{H}}_1|{\mathrm{E}}_1\wedge \cdots \wedge {\mathrm{E}}_{\mathrm{n}}\right)=\mathrm{P}\left({\mathrm{E}}_1\wedge \cdots \wedge {\mathrm{E}}_{\mathrm{n}}|{\mathrm{H}}_1\right)\cdot \mathrm{P}\left({\mathrm{H}}_1\right)/\Sigma \left\{\mathrm{P}\left({\mathrm{E}}_1\wedge \cdots \wedge {\mathrm{E}}_{\mathrm{n}}|{\mathrm{H}}_{\mathrm{r}}\right)\cdot \mathrm{P}\left({\mathrm{H}}_{\mathrm{r}}\right):1\le \mathrm{r}\le \mathrm{m}\right\}=\dots $$

Since P(E1∧…∧En|Hr) = Π{P(Ei|Hr∧E1∧…∧Ei-1): 1 ≤ i ≤ n}, and because of condition (b) of theorem 5, we continue as follows:

$$ {\displaystyle \begin{array}{c}\dots =\frac{\mathrm{h}\cdotp \Pi \left\{{\mathrm{p}}_{\mathrm{i}}:1\le \mathrm{i}\le \mathrm{n}\right\}}{\mathrm{h}\cdotp \Pi \left\{{\mathrm{p}}_{\mathrm{i}}:1\le \mathrm{i}\le \mathrm{n}\right\}+\Sigma \left\{\mathrm{P}\left({\mathrm{H}}_{\mathrm{r}}\right)\cdotp \Pi \left\{\mathrm{P}\left({\mathrm{E}}_{\mathrm{i}}|{\mathrm{H}}_{\mathrm{r}}\right):1\le \mathrm{i}\le \mathrm{n}\right\}:1\le \mathrm{r}\le \mathrm{m},\mathrm{r}>1\right\}}\\ {}\ge \frac{\mathrm{h}\cdotp \Pi \left\{{\mathrm{p}}_{\mathrm{i}}:1\le \mathrm{i}\le \mathrm{n}\right\}}{\mathrm{h}\cdotp \Pi \left\{{\mathrm{p}}_{\mathrm{i}}:1\le \mathrm{i}\le \mathrm{n}\right\}+\Sigma \left\{\mathrm{P}\left({\mathrm{H}}_{\mathrm{r}}\right)\cdotp \Pi \left\{\left({\mathrm{p}}_{\mathrm{i}}-\updelta \right):1\le \mathrm{i}\le \mathrm{n}\right\}:1\le \mathrm{r}\le \mathrm{m},\mathrm{r}>1\right\}}\;\left(\mathrm{from}\ \mathrm{condition}\ \left(\mathrm{a}\right)\right)\\ {}=\frac{\mathrm{h}\cdotp \Pi \left\{{\mathrm{p}}_{\mathrm{i}}:1\le \mathrm{i}\le \mathrm{n}\right\}}{\mathrm{h}\cdotp \Pi \left\{{\mathrm{p}}_{\mathrm{i}}:1\le \mathrm{i}\le \mathrm{n}\right\}+\Sigma \left\{\mathrm{P}\left({\mathrm{H}}_{\mathrm{r}}\right):1\le \mathrm{r}\le \mathrm{m},\mathrm{r}>1\right\}\cdotp \Pi \left\{\left({\mathrm{p}}_{\mathrm{i}}-\updelta \right):1\le \mathrm{i}\le \mathrm{n}\right\}}\;\left(\mathrm{by}\ \mathrm{factoring}\ \mathrm{out}\right)\\ {}=\frac{\mathrm{h}\cdotp \Pi \left\{{\mathrm{p}}_{\mathrm{i}}:1\le \mathrm{i}\le \mathrm{n}\right\}}{\mathrm{h}\cdotp \Pi \left\{{\mathrm{p}}_{\mathrm{i}}:1\le \mathrm{i}\le \mathrm{n}\right\}+\left(1-\mathrm{h}\right)\cdotp \Pi \left\{\left({\mathrm{p}}_{\mathrm{i}}-\updelta \right):1\le \mathrm{i}\le \mathrm{n}\right\}}\\ {}\left(\mathrm{since}\ \Sigma \left\{\mathrm{P}\left({\mathrm{H}}_{\mathrm{r}}\right):1\le \mathrm{r}\le \mathrm{m},\mathrm{r}>1\right\}=\mathrm{P}\left(\neg \mathrm{H}\right)=1-\mathrm{h}\right)\Big)\\ {}=\frac{\mathrm{h}}{\mathrm{h}+\left(1-\mathrm{h}\right)\cdotp \frac{\Pi \left\{\left({\mathrm{p}}_{\mathrm{i}}-\updelta \right):1\le \mathrm{i}\le \mathrm{n}\right\}}{\Pi \left\{{\mathrm{p}}_{\mathrm{i}}:1\le \mathrm{i}\le \mathrm{n}\right\}}}\;\left(\mathrm{by}\ \mathrm{dividing}\ \mathrm{through}\ \Pi \left\{{\mathrm{p}}_{\mathrm{i}}:1\le \mathrm{i}\le \mathrm{n}\right\}\right).\end{array}} $$

Now, \( \frac{\Pi \left\{\left({p}_i-\delta \right):1\le i\le n\right\}}{\Pi \left\{{p}_i:1\le i\le n\right\}} \) ≤ (1-δ)n holds, following from (pi-δ)/pi ≤ 1-δ. This implies conclusion (i) of theorem 1 for H=H1 (= conclusion of theorem 5, first conjunct). Conclusion (ii) of theorem 1 for H = H1 (= conclusion of theorem 5, second conjunct) follows, since limn→∞(1-δ)n = 0.

That corollary 4 holds is seen by inserting the exact values P(Ei|H1) = p and P(Eir) = q < p for all r ∈{2,…,m} into the line of the preceding proof annotated by “(by factoring out)”. This insertion gives us

$$ {\displaystyle \begin{array}{c}\mathrm{P}\left({\mathrm{H}}_1|{\mathrm{E}}_1\wedge \dots \wedge {\mathrm{E}}_{\mathrm{n}}\right)=\frac{\mathrm{h}\cdotp {\mathrm{p}}^{\mathrm{n}}}{\mathrm{h}\cdotp {\mathrm{p}}^{\mathrm{n}}+\Sigma \left\{\mathrm{P}\left({\mathrm{H}}_{\mathrm{r}}\right):1\le \mathrm{r}\le \mathrm{m},\mathrm{r}>1\right\}\cdotp {\mathrm{q}}^{\mathrm{n}}}=\frac{\mathrm{h}}{\mathrm{h}+\left(1-\mathrm{h}\right)\cdotp {\left(\mathrm{q}/\mathrm{p}\right)}^{\mathrm{n}}}\\ {}\left(\mathrm{since}\ \Sigma\ \left\{\mathrm{P}\left({\mathrm{H}}_{\mathrm{r}}\right):1\le \mathrm{r}\le \mathrm{m},\mathrm{r}>1\right\}=\mathrm{P}\left(\neg \mathrm{H}\right)=1-\mathrm{h},\mathrm{and}\ \mathrm{by}\ \mathrm{division}\ \mathrm{through}\ {\mathrm{p}}^{\mathrm{n}}\right)\end{array}} $$

which is the conclusion of corollary 1. Q.E.D.