THE STRENGTH OF TRUTH-THEORIES RICHARD G HECK JR. 1. MOTIVATIONAL REMARKS Tarski's classic paper "The Concept of Truth in Formalized Languages" is nicely representative of the state of logic in the 1930s: It is as much about what one cannot do as it is about what one can do. On the negative (or 'limitative') side, we have Tarski's celebrated theorem on the indefinability of truth. On the positive (or 'constructive') side, we have Tarski's demonstration that, for a large range of theories T , it is possible to add a theory of truth to T in such a way that the resulting theory is not only consistent (if T is) but also fruitful: Within it, we can prove the sorts of meta-mathematical results for which the notion of truth was then already being used. In particular, if we add a theory of truth to Peano arithmetic, PA-if, that is, we add axioms like "A conjunction is true iff both its conjuncts are true", and so forth-then we will be able to prove that PA is consistent by the following sort of argument: The axioms are all true; the rules of inference preserve truth; hence every theorem of PA is true; but some sentences, such as '0 = 1', are false; so some sentences are not theorems of PA; so PA is consistent. Since PA plus a truth-theory proves that PA is consistent, it follows from Gödel's second incompleteness theorem that the former is stronger than the latter. It is tempting, therefore, to want to use this fact to interpret Tarski's famous claim in "The Semantic Conception of Truth" that the metalanguage must be 'essentially richer' than the object langauge (Tarski, 1944, p. 354). As we shall see, however, that would be to confuse a question about expressive power with a question about logical strength. It is possible to formalize a materially adequate theory of truth for the language of set-theory in a meta-theory that is as weak as it is a priori possible for it to be: one interpretable in Robinson arithmetic. If so, then This manuscript dates from about 2009, with some significant updates having been made around 2011. Around then, however, I decided that the paper was becoming unmanageable and that I was trying to do too many things in it. I have therefore exploded the paper into several pieces, which will be published separately. I am putting this version on the web simply because it has been cited in a few different places and so should be publicly available. I should have done this a long time ago. I should have finished the paper a long time ago. But since 2010, my time has largely been devoted to finishing my two books on Frege, and even this draft remains a mess. Terminology and notation are inconsistent, and some of the proofs aren't quite right. So, caveat lector. 1 Tarski's claim about essential richness cannot concern logical strength. Not, at least, if it is to have any hope of being true. One might yet wonder, though, if there is not some way of understanding what a truth-theory buys us in terms of logical strength. And here is where we meet the central motivation for the present paper. It seems to be widely believed that a truth-theory by itself has no logical power at all. The proof of PA's consistency mentioned above depends not just upon the availability of a theory of truth but also upon our extending the induction axioms beyond those of PA to permit semantic vocabulary. If we do not allow 'semantic' induction, then the resulting theory is a conservative extension of PA. Of course, there are other ways of comparing the strength of theories. In particular, it is compatible with a theory T 's being a conservative extension of another theory U that T should not be interpretable in U . But if we take PA and add a truth-theory, then the result is interpretable in PA so long as we do not extend induction. So that might seem to seal it: Truth-theories have no logical strength on their own. It will emerge below, however, that PA is, in several respects, a very special case. What does or doesn't happen when we add a truth-theory to PA is not uninteresting, but it is often very different from what happens when we add one to some other theory, in particular, to a finitely axiomatized theory. And it seems to me that, if we want to know how strong truth-theories are on their own, then the right question to ask is not "What happens when you add a truth-theory to PA?" but: What happens when we add a truth-theory to an arbitrary theory T ? Once we have reframed the investigation in these terms, then several sorts of questions become natural: (1) What is the weakest theory T such that the result of adding a truth-theory to T yields a materially adequate theory of truth for the language of T ? (2) What, in general, is the strength of such a theory, as compared to that of T , if we do not extend whatever induction axioms are present in T to permit semantic vocabulary? (3) What happens if we do extend T 's induction axioms? In particular, for which theories T does the result of adding the truth-theoretic axioms and extending T 's induction scheme allow us to prove the consistency of T ? (4) What is the strength of the theory mentioned in (3), as compared to that of the 'base' theory T ? Much turns on precisely how we formulate the truth-theory and on what sorts of base theories are in question. In particular, we shall see that the usual way of 'adding a truth-theory', though it allows a nice answer to (2), gives us only dissatisfying answers to (3) and (4). But there is a different, and older, way to proceed-Tarski's original way-that allows 2 answers to these questions that are about as elegant as one could hope they would be. The plan for the paper is as follows. In an effort to make the discussion as accessible as possible I will quickly introduce in Section 2 some of the central concepts from logic that we shall be using. In Section 3, we'll discuss the usual way of 'adding a truth-theory' and see that there is a materially adequate and fully compositional theory of truth for the language of arithmetic that is about as weak as it could be. Section 4 introduces some machinery from the study of sub-systems of PA which may be less familiar. This is applied in Section 5, where we will get our first characterization of the strength of truth-theories and see a first respect in which PA is a special case, as well as discover some annoying limitations of the approach we will have been pursuing to that point. Section 6 explores a different way of 'adding a truth-theory' and gives nice answers to the questions above. We'll also see another, more impressive respect in which PA is a special case. Finally, Section 7 briefly considers how our results bear upon some philosophical questions about the role truth-theories play in semantic consistency proofs. 2. LOGICAL PRELIMINARIES 2.1. Interpretatability. The languages in which we'll be interested here are first-order languages, constructed from atomic expressions- terms, function-symbols, and predicates of one or more places-in the usual way. These languages will also be finite, in the sense that they have only finitely many atomic expressions. It is convenient to identify a language with the set of its atomic expressions, together with some indication of their logical type, that is, with what is sometimes called the 'signature' of the language. A theory here is always a recursively axiomatized theory, unless otherwise stated, and, officially, we understand the notion in an intensional sense: A theory is not a set of axioms but a 'presentation' of a set of axioms. Formally, a theory can be understood as given by a formula in one free variable, where the axioms of the theory are the sentences of whose Gödel numbers that formula is true. When a theory has only finitely many axioms, the distinction between intensional and extensional conceptions more or less lapses. But it does matter, in general, as Feferman (1960) made abundantly clear. A theory is 'stated in' a language. There are a number of ways of comparing the logical strength of theories. If the theories are stated in the same language, then the obvious question is whether one proves all the results the other proves. Comparison is more difficult when the theories are stated in different languages. In that case, the theories will trivially prove different theorems: If A is in the language of the one but not the other, then pA ∨ ¬Aq will be a 3 theorem of the one but not the other; this is true even if the (non-logical) axioms of the two theories are the same. If the language of one theory contains that of the other, then one way to compare them is to ask if the first is a 'conservative extension' of the second, that is, whether the theory in the extended language proves any new theorems that can be stated in the original language. But even this fails if the theories are not so related. In that case, the usual method of comparison uses the notion of interpretation, which was first explored in a systematic way by Tarski, Mostowski, and Robinson (1953), although the basic idea is much older. Let theories B (for 'base') and T (for 'target') be given, stated in languages LB and LT , respectively. A relative interpretation1 of T in B consists of two parts: a translation of LT into LB, and proofs in B of the translations of the axioms of T . The translation is compositional, in the sense that the only thing we actually need to do is define the (non-logical) atomic expressions of LT in terms of those of LB2 and specify a 'domain' for the interpretation in terms of a formula δ(x) of LB. This can then be extended to a complete translation of LT into LB in the obvious way, where quantifiers are 'relativized' to δ(x): ∀xφ(x) is translated as: ∀x(δ(x)→ φ∗(x)), where φ∗(x) translates φ(x); ∃xφ(x), as: ∃x(δ(x) ∧ φ∗(x)). As well as proofs of the translations of the axioms, we also need proofs of δ(t∗), for each atomic term t of LT , and of the closure condition ∀x1 * * *xn(δ(x1) ∧ * * * ∧ δ(xn)→ δ(f∗(x1, . . . , xn))) for each primitive function-symbol f , of however many places. We also need (if this isn't already covered) a proof that the domain is non-empty: ∃xδ(x). It follows that, if B is consistent, so is T . If a contradiction could be derived from the axioms of T , that proof could be mimicked in B: Just prove the translations of the axioms of T used in the proof of the contradiction, then append (a modified version of) the proof given in T . Indeed, quite generally, if Σ `T A, then Σ∗ `B A∗, where, again, the asterisk means: translation of. Moreover, if B and T are not too terribly weak,3 then all of this will be provable in B and T themselves. So, in particular, T will prove Con(B)→ Con(T ) and so cannot prove Con(B), though B might well prove Con(T ). Note that interpretability is transitive and reflexive. 1In fact, there are several different notions of interpretation. We shall only need this one. 2It is convenient to allow terms and function-symbols to be translated using descriptions, which can then be eliminated as Russell taught. In that case, we need B to prove that the descriptions are proper. 3Facts concerning interpretability can generally be verified in the theory known as I∆0 + ω1, for which see below. 4 One way to give content to the idea that B is at least as strong as T is therefore to take it to mean: T is relatively interpretable in B. That this is a useful way to give content to the intuitive idea of relative strength emerged only after a good deal of hard work, beginning with Tarski, Mostowski, and Robinson (1953) and continuing through work by Feferman (1960) to the present day (e.g., Visser, 2006). Though the notion of interpretation is particularly useful when we are dealing with theories stated in different languages, we can still ask whether T can be interpreted in B even when LT and LB are the same: The interpretation of the atomic vocabulary does not have to be the identity function. But of course it can be, and in that case the interpretation can take a very simple form, which we might call a pure relativization: The only substantial part of the interpretation is the relativization to a new domain. Many of the interpretations in which we shall be interested are of this form. Now, a couple definitions that apply (sensibly) only to non-finitely axiomatized theories. Definition. T is said to be locally interpretable in B if every finite subset of T is intepretable in B. Local interpretability obviously follows from interpretability, which is also known as 'global' interpretability. The converse is not true. Local interpretability is also transitive and reflexive, and it relates to relative consistency just as global interpretability does: If T is locally interpretable in B, then T is consistent if B is. The reason is that any proof of a contradiction in T will use only finitely many of T 's axioms. As said above, Peano arithmetic is going to turn out to be something of a special case. This is because PA is reflexive (Mostowski, 1952), in the following sense. Definition. T is reflexive if T proves the consistency of each of its finite sub-theories. PA's reflexivity can cause all sorts of unexpected phenomena as regards interpretability in PA. What will matter most to us here is the fact that reflexive theories collapse the distinction between local and global interpretability. Theorem (Orey's Theorem). Suppose that T is locally intepretable in B and that B is reflexive. Then T is (globally) interpretable in B. The proof of this result was first published in Feferman's classic paper "The Arithmetization of Metamathematics in a General Setting", which was also, of course, where the 'unexpected phenomena' just mentioned first appeared. 2.2. Fragments of Arithmetic. As mentioned earlier, we are going to be interested in the general question what happens when we add a 5 truth-theory to some arbitrary theory T . In practice, however, we shall mostly be concerned with PA and certain of its sub-theories. Let's meet them. Robinson arithmetic, or Q, is the theory whose axioms are the universal closures of the following eight formulae: Q1 Sx 6= 0 Q2 Sx = Sy → x = y Q3 x+ 0 = x Q4 x+ Sy = S(x+ y) Q5 x× 0 = 0 Q6 x× Sy = (x× y) + x Q7 x 6= 0→ ∃y(x = Sy) Q8 x < y ≡ ∃z(y = Sz + x) The last is often considered a definition of <; it is convenient in the present context to regard < as just part of the language. The language of Q, {0, S,+,×, <}, is what we shall call 'the language of arithmetic' and denote: A. Q is in many ways extremely weak. It fails to prove such obvious facts as that x 6= Sx. But it is in other ways strong. For our purposes, the crucial fact is that Q is strong enough to allow us to do Gödel numbering and therefore some very basic syntax. For example, Q will allow us to say and prove such things as that '0 = S0∧ 0 = SS0' is the conjunction of '0 = S0' and '0 = SS0'. Here's the short version: Q is terrible at proving generalizations, but it's very good at proving particular facts. A formula is said to be ∆0 (a.k.a., Σ0) if all quantifiers contained in it are 'bounded', that is, if all of its quantified subformulae are of the form ∀x(x < t → * * * ) or ∃x(x < t ∧ * * * ), where t is a term. These are customarily abbreviated: ∀x < t(* * * ) and ∃x < t(* * * ). A formula is Σ1 (resp., Π1) if it is of the form ∃xφ (resp., ∀xφ), where φ is ∆0. A formula is Σn (resp., Πn) if it is ∃xφ (resp., ∀xφ), where φ is Πn−1 (resp., Σn−1). We can now say precisely how good Q is at proving particular facts: Q proves all true Σ1 sentences of the language of arithmetic. An important class of sub-theories of PA is characterized in terms of the induction axioms these theories permit. PA itself is Q plus the full induction scheme: A(0) ∧ ∀x(A(x)→ A(Sx)→ ∀x(A(x)), where A(x) is any formula at all. The theory IΘ is Q plus induction for formulae in the set Θ: So A(x) has to be in Θ. Thus, I∆0 is Q plus induction for ∆0 formulae, and IΣ1 is Q plus induction for Σ1 formulae. I∆0 is in one sense clearly stronger than Q: It proves lots of important generalizations about the natural numbers. But in another sense it is 6 still a very weak theory: It is interpretable in Q.4 Another respect in which I∆0 is weak is that, although one can define the relation y = 2x by means of a ∆0 formula exp(x, y), we cannot prove in I∆0 that exponentiation is total; that is, we cannot prove: ∀x∃y(exp(x, y)). The obvious proof uses by induction on ∃y(exp(x, y)), which is Σ1. But for that very reason, the totality of exponentiation is provable in IΣ1, as is the totality of every other primitive recursive function.5 So IΣ1 is much stronger than I∆0: Indeed, IΣ1 proves Con(I∆0). The final theory we shall need is known as I∆0 + ω1. Here, ω1(x) is a certain function that, like 2x, is ∆0-definable but not I∆0-provably total. The precise definition varies between authors, but one definition (Visser, 1991, p. 83) is: ω1(x) = 2 |x|2 where |x| is the least y such that 2Sy > Sx. As said, the relation y = ω1(x) can be defined by a ∆0 formula Ω1(x, y), and I∆0 + ω1 is then I∆0 plus the formula asserting that this relation is total: ∀x∃y(Ω1(x, y)). The interest of this theory lies in the fact that it is, as Visser puts it, "just right for treating syntax".6 And, like I∆0, it is interpretable in Q (Hájek and Pudlák, 1993, p. 367). As we shall see later, it is sometimes extremely helpful if our language contains no terms other than variables. We shall therefore also want to use what we might call the language of relational arithmetic. This language contains predicate letters Z, P , A, and M in place of 0, S, +, and ×. We shall therefore want axioms asserting that there is a unique zero, and that P , A, and M are function-like: Z ∃x(Zx ∧ ∀y(Zy → x = y)) P ∀x∃y(Pxy ∧ ∀z(Pxz → y = z)) A ∀x∀y∀z∃z(Axyz ∧ ∀w(Axyw → z = w)) M ∀x∀y∀z∃z(Axyz ∧ ∀w(Axyw → z = w)) It should be clear that theories in the usual language of arithmetic have natural correlates in the language of relational arithmetic. We can thus state a theory QR in this language, with much the same content as Q, by simply adapting the axioms of Q itself. The first four axioms, for example, would be: QR1 ¬Px0 QR2 Pxz ∧ Pyz → x = y 4That I∆0 is locally interpretable in Q was first proven by Edward Nelson (1986). That it is globally interpretable was proven by Alex Wilkie (Wilkie and Paris, 1987). The proof is discussed both by Hájek and Pudlák (1993, pp. 366–70) and by Burgess (2005, §2.2). 5Indeed, IΣ1 is proof-theoretically equivalent to primitive recursive arithmetic. 6Wilkie and Paris (1987) were the first to recognize the importance of I∆0 + ω1. One has to use a more "efficient" coding than is customary, however, to get things to work. Hajék and Pudlák (1993, pp. 303ff) give the details. 7 QR3 Ax0x QR4 (Pyz ∧Axzu) ∧ (Axyw ∧ Pwv)→ u = v The first two conjuncts of QR4 say, in effect, that u = x + Sy; the next two, that v = S(x+ y). It should be clear that Q and QR are interpretable in one another, in a very straightforward way. Similar things can be said about the other theories mentioned. 3. THEORIES OF TRUTH 3.1. Formalizing Compositional Truth-theories. Since the semantic axioms for the quantifiers, as Tarski bequeathed them to us, make use of sequences of elements from the domain, we shall need a nice theory of sequences if we're to formalize theories of truth. Technically, we'll need our base theory to be sequential. Definition. Let T be a theory that contains Q, either straightforwardly or by interpretation. T is said to be sequential if, in short, it can code finite sequences of its elements. More precisely, T is sequential if there are formulae lh(s, h) and val(s, n, x) for which T proves:7 ∃s(lh(s, 0)) ∀s∀n{lh(s, n)→∀m < n∃x(val(s, n, x))} ∀s∀n{lh(s, n)→∀y∃t[lh(t, Sn)∧ ∀z∀k < n(val(s, k, z) ≡ val(t, k, z))∧ val(t, n, z)]} Here, lh(s, n) means: s is a sequence of length n; val(s, n, x) means: the (n + 1)-st element of s is x. So the second of the principles says that every sequence of length n has an element at each position below n; the third says that each sequence can be extended by appending an arbitrary element of the domain; the first assures us that there is a 'null' sequence with which we can begin. We shall use '<>' as a term denoting one of the null sequences whose existence is so guaranteed.8 Q is not sequential, but there are lots of sequential theories that are interpretable in Q. For example, I∆0 is sequential, and it is interpretable in Q. More importantly, for our purposes, we can simply add a theory of sequences to Q, by adding new predicates lh(s, n) and val(s, n, x), subject to the principles that characterize sequential theories. This new theory, which we might call Qseq, is interpretable in Q, since it is obviously intepretable in any sequential theory. This fact will allow us to extend 7We can take 's is a sequence' to be defined as: ∃n(lh(s, n)). 8In the cases in which we are interested, there generally will be such a term in the language. If not, then we can conservatively extend whatever theory we are employing by adding such a term, subject to the axiom: lh(<>, 0). 8 our main results to Q, even though they do not apply to Q directly. Note, morever, that every sequential theory interprets Q.9 It should be obvious that we can easily allow val(s, n, x) to have some fixed value, say, 0, if n is beyond the length of s. That is: A theory that contained an axiom to that effect would trivially be interpretable in one that did not. So we shall assume this principle, as well, since it allows us to pretend our sequences are infinite. The theory of truth itself will consist of Tarski-style axioms for the logical and non-logical vocabulary. The axioms for the logical part of the language will always be the same: v Denσ(vi, x) ≡ val(σ, i, x), where vi is the ith variable = Satσ(pt = uq) ≡ ∃x∃y[Denσ(t, x) ∧Denσ(u, y) ∧ x = y] ¬ Satσ(p¬Aq) ≡ ¬Satσ(A) ∧ Satσ(pA ∧Bq) ≡ Satσ(A) ∧ Satσ(B) ∀ Satσ(p∀viA(vi)q) ≡ ∀τ [τ i∼ σ → Satσ(pA(vi)q)] And similarly for the other logical constants.10 Here, 'Denσ(t, x)' means: t denotes x with respect to the sequence σ; 'Satσ(A)' means: σ satisfies A; and 'τ i∼ σ' means that τ and σ agree on what they assign to each variable, with the possible exception of vi, i.e.:11 ∃n < σ[lh(σ, n) ∧ ∀k < n(k 6= i→ ∀x(val(σ, k, x) ≡ val(τ, k, x))] In the case of the language of arithmetic, we'll also have these axioms for the non-logical constants: 0 Denσ('0', x) ≡ x = 0 S Denσ(pStq, x) ≡ ∃y(Denσ(t, y) ∧ y = Sx) + Denσ(pt+ uq, x) ≡ ∃y∃z[Denσ(t, y) ∧Denσ(u, z) ∧ x = y + z] × Denσ(pt× uq, x) ≡ ∃y∃z[Denσ(t, y) ∧Denσ(u, z) ∧ x = y × z] < Satσ(pt < uq) ≡ ∃y∃z[Denσ(t, y) ∧Denσ(u, z) ∧ y < z] The pattern should be clear.12 In the case of the language of arithmetic, there are at least two simplifications one often meets in practice. First, denotation is actually definable in arithmetic in such a way that the clauses involving it can be proven, so those clauses are often regarded as not really necessary. 9For lots of details on sequential theories, including the facts mentioned here, see Visser's "Pairs, Sets, and Sequences in First-order Theories" (Visser, 2008). Note that we shall also need to use such facts as that the code of a sequence is always greater than its length. We can always arrange for this sort of thing to be true. 10Of course, the other constants are definable in terms of the ones already mentioned, but, in the present context, this is not a particularly interesting or important fact. 11It will be important below that this can be made to be ∆1. One way to see that it can be is to note that lh(σ, ) and val(σ, i) are what Boolos (1993, pp. 24–7) calls 'Σ pterms', and ∆1 formulas are closed under substitution of Σ pterms. 12It appears to have been Hao Wang (1952) who first worked out the details of this sort of construction. 9 Second, one can forego the use of sequences and instead treat quantification substitutionally: ∀viA(vi) is true iff, for each n, A(n)-the result of substituting the numeral for n for vi-is true. We shall avoid these simplifications here, however. Both simplificiations are specific to the language of arithmetic and are not available in general. We want our results to extend smoothly and naturally to other cases, such as the language of set theory. Finally, then, we need to define the notion of truth itself. Tarski, as is familiar, defines truth in terms of satisfaction by every sequence, thus: T: T(A) ≡ A is a sentence ∧ ∀σSatσ(A) Where we are discussing theories of truth over weak arithmetics, however, there is a worry about Tarski's definition, namely, that it 'hides' a quantifier in the definition of truth, so that elimination of that definition can make a formula in which T(x) occurs logically more complex after the elimination than it was before it.13 For this reason, it is sometimes preferable to use an alternate definition: T: T(A) ≡ A is a sentence ∧ Sat<>(A) So, on this definition, truth is satisfaction by the null sequence. As it happens, however, it is actually better, for our purposes, to use the original definition, so we shall stick with it. So, that's what a theory of truth is. Here is some notation. Definition. Let T be sequential. Then T T is the theory that extends T by adding truth-theoretic axioms for the logical and non-logical vocabulary of the language of T . Note that T T does not extend any induction scheme that might be present in T . There is no real chance, then, that T T is going to prove the consistency of T . So one might suspect that T T would logically be no stronger than T . If so, then, as we shall see, one would suspect wrongly, at least in general. But we shall not be ready to prove that until Section 5. First, however, let us note that T T is by no means a trivial extension of T . Lemma 3.1. T T is a materially adequate, fully compositional theory of truth for the language of T . In particular: For each sentence A in the language of T , T T proves: T(pAq) ≡ A. Proof. A rigorous proof would be by induction on the complexity of sentences of L. But this should be fairly obvious. A little experimentation 13Thanks to Cezary Cieśliński for bringing this issue to my attention. As one example of the problem, if we use Tarski's definition, then T T (which will be defined shortly) will in many cases not prove: T(p¬Aq) ≡ ¬T(pAq). The usual proof of this rests upon the fact that, if A is a sentence, then ∀σSatσ(A) iff ∃σSatσ(A), and that in turn is normally proven by an induction that may not be formalizable in T T . 10 will reveal that proofs of 'T-sentences' need no more than is available in Qseq: We're not proving any general laws, just a bunch of particular facts, and Q is very good at proving particular facts, no matter how bad it may be at proving general laws.  To put this differently: T T defines truth for sentences in the language of T . Since T is sequential, it interprets Q, so we know from Tarski's indefinability theorem that T itself cannot define truth for all sentences in the language of T . So T T is always expressively more powerful than T . Before we continue to explore T T , let me state a couple of obvious corollaries of Lemma 3.1 that we shall need below. Corollary 3.2. T T proves, of each axiom of T , that it is true. Proof. Let A be an axiom of T . By Lemma 3.1, T T proves T(pAq) ≡ A. Since T T obviously proves A, it proves T(pAq), too.  The same, of course, goes for the theorems of T , but we shall not need that fact. Corollary 3.3. Let T be a finitely axiomatized sequential theory. Then T T proves the formalization of "all axioms of T are true".14 Note the contrast with Corollary 3.2: If T is infintely axiomatized, there is no reason whatsoever to suspect that T T will prove that all axioms of T are true, although it does prove that each axiom of T is true. Indeed, we have the following. Proposition 3.4. PAT does not prove that all axioms of PA are true. This follows from Corollary 5.10, to be proven below, and it begins to illustrate one of the senses in which PA is a special case. 3.2. Object-language versus Object-theory. I suspect that Lemma 3.1 will have surprised at least some people.15 If it didn't, then maybe this will.16 14Since T is sequential, it interprets Q, which means that we can develop enough syntax in T to allow us to formalize "all axioms of T are true". 15I'm cheating. I've already seen it surprise a fair number of people. 16Something close to this result is present in Wang's paper "Truth Definitions and Consistency" (Wang, 1952), though Wang is interested in definitions of truth, whereas we are interested in theories of truth. What Wang shows (see his theorem 11) is that a theory he calls S2, which is basically adjunctive set theory (see below) plus predicative second-order quantification plus separation, defines truth for the language of set theory. As he notes, the result could be improved "if we could develop number theory in S2 without using" separation. At the time, it was not known that this could be done, but the interpretability of Q in adjunctive set theory would soon be proven by Tarski, Mostowski, and Robinson (1953). In any event, I do not claim originality for this result. As we shall see, it was to become fairly well known but a few years after Wang's work. But it has not been widely appreciated among philosophers. 11 Corollary 3.5. QseqT is a materially adequate, fully compositional theory of truth for the language of arithmetic. As mentioned earlier, it is at least tempting to try to interpret Tarski's claim that the metalanguage must be 'essentially richer' than the object langauge (Tarski, 1944, p. 354) in terms of logical strength.17 Some people seem to think that having a theory of truth for the language of arithmetic means being able to prove that PA is consistent. Better informed people know that you can have a theory of truth without extending induction. But even some of them seem to think that a theory of truth for the language of arithmetic must be at least as strong as PA, and that this is something we learned from Tarski. What Corollary 3.5 shows is that this is just false. If that's surprising, it's probably because one is thinking of theories of truth as if their subject-matter were not languages but theories. But there is no such thing as 'a truth-theory for PA'. There is only a theory of truth for the language of PA, that is, for the language of arithmetic. The question whether such a theory is materially adequate is the question whether it allows us to prove the T-sentence for each sentence of the language it concerns. And QseqT is perfectly capable of proving all the T-sentences for the language of PA, that is, for the language of arithmetic. The point is not specific to arithmetic. Consider, for example, the theory known as 'adjunctive set theory' or, following Visser, 'WS' (for 'weak' set theory). Its axioms are: Null set: ∃x∀y(y /∈ x) Adjunction: ∀x∀y∃z∀w(w ∈ z ≡ w ∈ x ∨ w = y) WS is interpretable in Q, and it is, in a sense, the weakest sequential theory (Visser, 2008). But now consider WST . It is easy to see that WST proves a T-sentence for each sentence of the language of set theory. That is: WST is a materially adequate theory of truth for the language of set theory. So, if you were tempted to say that a theory of truth for ZF cannot be developed in any theory weaker than ZF itself, you might want to reconsider. One might think I am being uncharitable. There is a perfectly natural interpretation of what people mean when they speak of 'a theory of truth for PA'. What they mean is: A materially adequate theory of truth for the language of arithmetic in which it is possible to prove that each of the axioms of PA is true.18 And if that's what a theory of truth for PA is, then it's true that a materially adequate theory of truth for PA can only be developed in a theory at least as strong as PA. But this fact is completely trivial and isn't anything we needed Tarski to teach us. Each 17See the exchange between DeVidi and Solomon (1999) and Ray (2005). 18This is what Wang (1952, p. 244) calls a 'normal' truth-theory. 12 axiom of PA will follow from (i) its T-sentence and (ii) the statement that it is true. T(A) ≡ A T-sentence T(A) Each axiom is true A Logic The lesson I propose we should learn from Corollary 3.5 is thus this: Tarski's theorem on the indefinability of truth has nothing to do with logical strength. It has all and only to do with expressive power.19 This should have been obvious already. The reason you cannot develop a truth-theory for the language of PA inside PA itself is not that PA isn't strong enough. Tarski's indefinability theorem applies even to 'true arithmetic'-to the theory whose 'axioms' are all the arithmetical truths- and true arithmetic is not only stronger than ZFC but is stronger than any consistent formal theory.20 But Corollary 3.5 reinforces this point, since it tells us that you can develop a theory of truth for the language of PA in QseqT , a theory whose consistency is provable not just in PA but already in IΣ1, and in weaker theories still.21 What distinguishes Qseq T from PA is the fact that the the language of the former is more expressive than the language of the latter, in the precise sense that there are sets that are definable in QseqT that are not definable in PA. The set of (Gödel numbers of) true sentences of the language of arithmetic is the salient example. If this point has not been widely appreciated, Tarski is partly to blame. The central task of section 4 of "The Concept of Truth in Formalized Languages" is to explain how to generalize the definition of truth that Tarski had given (in the previous section of that paper) for the language of the calculus of classes. And so Tarski first "undertake[s]. . . the construction of a corresponding meta-language and the establishment of a meta-theory which forms the proper field of investigation" (Tarski, 1958, p. 210). After explaining what the meta-language must contain (we'll discuss that below), he writes: . . . [T]he full axiom system of the meta-theory includes three groups of sentences: (1) axioms of a general logical kind; (2) axioms which have the same meaning as the axioms of the science under investigation or are logically 19A similar point is made by Ray (2005), but the results here make it clear just how great the gap is between these two ways of understanding Tarski's claim. 20Proof: Let T be a formal theory. Then the statement that T is consistent can be formalized in arithmetic; if it is true, then true arithmetic of course proves it. So true arithmetic proves Con(T ), for every consistent formal theory T . Moreover, by the arithmetized completeness theorem, true arithmetic interprets every consistent formal theory. 21We'll see shortly that QseqT is mutually interpretable with Q+ Con(Qseq), which is itself mutually interpretable with Q+ Con(Q). 13 stronger than them, but which in any case suffice. . . for the establishment of all sentences having the same meaning as the theorems of the science being investigated; finally, (3) axioms which determine the fundamental properties of the primitive concepts of a structural-descriptive type [that is, of syntax]. (Tarski, 1958, p. 211) Note how Tarski speaks, at (2), of "the science under investigation". In a way, this is fine: If Tarski wants to investigate certain sciences-that is, theories-he's welcome to do so, and of course there are plenty of interesting results to be proved about theories using the techniques Tarski developed. (For example, using those techniques, we can prove that PA is consistent.) It is because Tarski thinks of 'sciences', rather than languages, as the object of investigation that, at (2), he includes (translations of) the axioms of the object-theory among those of the metatheory. As applied to the sort of case we have been discussing, what this means is that the meta-theory in which Tarski proposes to develop a definition of truth suitable for use in investigations of PA will have to include either the axioms of PA themselves or else something sufficient for proving them. But what has not been clearly appreciated, it seems to me, is that the axioms of PA need to be included among the axioms of the meta-theory only in so far as we want to prove certain results about PA specifically. If our goal is simply a materially adequate theory of truth for the language of PA, that is, for the language of arithmetic, then we have no need of (most of) those axioms. Whether Tarski himself appreciated this point in the 1930s, I do not know. But it was known to logicians no later than 1952, when it was used by Kleene (1952). We'll look at Kleene's argument in Section 6.1 or rather, at a later reformulation of it. 4. CUTS AND CONSISTENCY The question I now want to address is this: What does adding a truththeory give us, as far as logical strength is concerned? It is obvious that T T is at least as strong as T , since it contains T . But is it any stronger than T ? There is no perfectly general answer to this question. What we shall see, however, is that, if T is finitely axiomatized, then T T is stronger than T , in the sense that T T is not interpretable in T . The proof uses a method called 'shortening of cuts' which is due to Robert Solovay and which plays a major role in the study of models of arithmetic. Since this method is not widely known among philosophers, I shall spend some time introducing it. 4.1. The Method of Cuts. Let T be an arithmetical theory that does not have full induction, in the sense that there are formulae with the form of induction axioms that are not theorems of T . Then there are 14 almost sure to be formulae φ(x) for which T proves the hypotheses of the relevant induction axiom-φ(0) and ∀x(φ(x)→ φ(Sx))-but for which T does not prove its conclusion: ∀xφ(x).22 Obviously, T will therefore prove φ(0), φ(1), φ(2), and so forth. So, from the point of view of T , φ(x) is a formula that is true of 0, 1, 2, and so on, but that is, for all T knows, false of some natural numbers. And, by the completeness theorem, there will be models of T in which φ(x) is not true of all of the 'natural numbers'. For example, as is well-known, Q does not prove that no number is its own successor. But Q does prove both 0 6= S0, which follows immediately from the first axiom of Q, and x 6= Sx→ Sx 6= SSx, which follows just as immediately from the second. So x 6= Sx is the kind of formula Russell called 'inductive', and that terminology has been adapted to the present context. Definition. A formula ι(x) is said to be inductive in T if (1) T ` ι(0) (2) T ` ∀x(ι(x)→ ι(Sx)). Inductive formulas can be used to establish results about interpretability. The crucial result is this one. Theorem 4.1. Let ι(x) be a formula that is inductive in T ⊇ Q and that is no worse than Π1. Then T interprets Q+ ∀x(ι(x)). It's not essential for what follows that the reader understand the proof of this theorem, so I shall not present it in any detail. But the method used in its proof-the shortening of cuts-is one we shall need below, so it is worth having some sense for how it works. I shall therefore explain the ideas behind the proof of Theorem 4.1 by continuing to discuss the example already mentioned: We'll see how to prove that Q interprets Q+ ∀x(x 6= Sx). The basic idea is simply to restrict the domain to the numbers that satisfy x 6= Sx-which, one might say, might as well be the natural numbers, so far as Q is concerned. But that isn't quite right. The problem is that we do not, in general, know that the numbers satisfying an inductive formula constitute an initial segment of all the numbers there are. The 'real' natural numbers will all satisfy ι(x), but then there may be some that don't and then some more that do after the ones that don't. So if we want a formula that might play the role of a 'new domain', then we need a slightly different notion, the notion of a cut. Definition. A formula ι(x) is a cut in a theory T if (1) ι(x) is inductive in T (2) T ` ∀x[ι(x)→ ∀y < x(ι(y))] 22In the case of IΣn, one can actually exhibit such a formula (Hájek and Pudlák, 1993, p. 172). But note that, if there were no such cases, then the fact that the induction axiom was missing wouldn't cost us anything. 15 If T does not prove ∀x(ι(x)), then ι(x) is said to be a proper cut in T . The numbers satisfying a formula that is a cut in T will constitute an initial segment of T 's natural numbers, and if the cut is proper, there will be models in which they constitute a proper initial segment. The crucial result relating inductive formulas and cuts is this one. Lemma 4.2 (Hájek and Pudlák 1993, p. 368). Let ι(x) be inductive in T ⊇ Q. Then there is a cut κ(x) in T for which T ` ∀x(κ(x)→ ι(x)). That is: Every inductive formula can be shortened to a cut. Proof. The obvious idea is to consider ∀y ≤ x(ι(x)) and to show that it defines a cut. Unfortunately, this doesn't quite work. The problem is that the proof that the formula in question defines a cut needs the transitivity of ≤, and Q does not prove that ≤ is transitive. This obstacle can be overcome, however, and the way in which this is done is a nice illustration of how the shortening of cuts works: We can simply restrict our attention to numbers for which ≤ is transitive. In particular, we first consider the formula: χ(x) df ≡ ι(x) ∧ ∀y∀z(y ≤ x ∧ z ≤ y → z ≤ x) χ(x) says roughly that x satisfies ι(x) and that ≤ is transitive below x. It's easy to see that Q proves: χ(x) is inductive if ι(x) is. We can then pursue the original idea, but with χ(x) in place of ι(x): κ(x) df ≡ ∀w < x(χ(w)) The verification that this defines a cut is left to the reader.  So, although Q can't prove that x 6= Sx is a cut, there is a 'subcut' κ(x) of x 6= Sx in Q. So we might now try simply restricting attention to κ(x), the thought being that this will give us an interpretation in which x 6= Sx holds and in which the axioms of Q just keep right on holding. But this doesn't quite work, either, the reason being that we need to ensure that the domain of our interpretation is closed under the operations of succession, addition, and multiplication. That it is closed under S follows from the fact that κ(x) is inductive. But we have no reason at this point to think we can prove either of these: ∀x∀y(κ(x) ∧ κ(y)→ κ(x+ y)) ∀x∀y(κ(x) ∧ κ(y)→ κ(x× y)) What to do? The answer is to use the method of cuts to restrict attention to numbers that do have sums and products.23 Doing so allows us to prove the following. 23There is an accessible treatment in Burgess's book Fixing Frege (Burgess, 2005, §2.2). 16 Lemma 4.3. If T ⊇ Q, then every formula ι(x) inductive in T can be shortened to a cut κ(x) on which T proves the relativizations of the axioms of Q. We can now see how Theorem 4.1 follows from Lemma 4.3. Proof of Theorem 4.1. Start with a very simple case: x 6= Sx. We want to see that Q interprets Q + ∀x(x 6= Sx). Since x 6= Sx is inductive in Q, by Lemma 4.3, there is a subcut κ(x) of x 6= Sx on which Q proves the relativizations of the axioms of Q. Our interpretation is thus a 'pure relativization' to κ(x). So we need only show that Q proves ∀x(κ(x)→ x 6= Sx). But of course it does, since that says, precisely, that κ(x) is a subcut of x 6= Sx. So now consider the case where ι(x) is ∆0. By Lemma 4.3, there is a subcut δ(x) of ι(x) on which T proves the relativizations of the axioms of Q. So now we just have to check that T proves the relativization of ∀x(ι(x)). This is not as straightforward as in the previous case, because now there are quantifiers in ι(x) that themselves have to be relativized. But ι(x) is ∆0, which means that the quantifiers in ι(x) are all bounded, and that means that the relativization is redundant, in the sense that, if ι(x) is ∆0 and δ(x) is a cut in T , then ∀x(δ(x) → ι(x)) is going to be T -provably equivalent to ∀x(δ(x)→ ι∗(x)), where ι∗(x) is the relativization of ι(x) to δ(x).24 The proof is by induction on the complexity of expressions, of course, but the basic idea is simple enough. Consider, for example, ∀y < t(φ(y)). This is relativized as: ∀y < t(δ(y)→ φ(t)). Since δ(x) is T -provably closed under S, +, and ×, t, whatever it may be, is T -provably going to satisfy δ(x), whence, since δ(x) is a cut, we have that y < t→ δ(y), and the new condition is redundant. Similarly for the existential case. So suppose ι(x) is Π1; say it is ∀yφ(x, y), where φ(x, y) is ∆0. Then what we need to show is that T proves ∀x[δ(x)→ ∀y(δ(y)→ φ∗(x, y))] As we just saw, T will prove ∀y(δ(y)→ φ(x, y)) ≡ ∀y(δ(y)→ φ∗(x, y)), so we need only show that T proves ∀x[δ(x)→ ∀y(δ(y)→ φ(x, y))]. But we already know that T proves the stronger: ∀x[δ(x)→ ∀y(φ(x, y))], since δ(x) is a subcut of ∀y(φ(x, y)) in T .  24Burgess (2005, pp. 101–4) again has a nice discussion of this sort of point. 17 It is not in general true that, if ι(x) is a Σ1 cut in T ⊇ Q, then T interprets Q + ∀x(ι(x)). The standard counterexample is ∃y(exp(x, y)). This is inductive even in Q, but Q does not interpret Q+ ∀x∃y(exp(x, y)). It does not follow, however, that the method of shortening cuts only works with Π1 formulae. Sometimes one can show, by other means, that T proves the relativization of some Σ1 formula. As it happens, shortening of cuts can be used to prove stronger forms of Lemma 4.3 and so of Theorem 4.1. Lemma 4.4. If T ⊇ Q, then every formula ι(x) inductive in T can be shortened to a cut κ(x) on which T proves the relativizations of the axioms of I∆0, and even of I∆0 + ω1. We'll need this stronger result below. 4.2. The Unprovability of 'Small' Consistency. We know from Gödel's second incompleteness theorem that no 'sufficiently strong' theory proves its own consistency. In the mid-1980s, Pavel Pudlák proved a beautiful version of Gödel's result, one that really ought to be better known. If we think of the numbers satisfying a cut as 'small' numbers,25 then what the theorem says is that no theory containing Q can prove that there are no 'small' proofs of contradictions from its axioms. More formally, then, what Pudlák's theorem says is that no theory containing Q proves its own consistency 'on a cut'. Theorem 4.5 (Pudlák 1985, Theorem 2.1). Suppose T ⊇ Q is consistent, and let κ(x) be a cut in T . Then T does not prove:26 ∀x(κ(x)→ ¬BewT (x, p0 = S0q)) Moreover, this continues to hold even if κ(x) is merely inductive, since it can always be shortened to a cut. This is a substantial strengthening of Gödel's result, in three respects. First, the usual form of the second incompleteness theorem applies only to theories containing enough induction to prove the Hilbert-BernaysGödel-Löb derivability conditions. Pudlák's theorem, by contrast, applies to any theory containing Q. Second, Gödel's result tells us only that T cannot show that there are no proofs of contradictions, and this is compatible with T 's being able to show that there are no 'small' proofs of contradictions. The third resepct emerges from the following consequence of Theorem 4.5. Theorem 4.6 (Pudlák 1985, Corollary 3.5). Suppose T is finitely axiomatized, sequential, and consistent. Then T does not interpret Q+ Con(T ). 25If it sounds as if there are connections here with Wang's paradox, there are. 26Here, BewT (x, y) is an appropriate formalization of 'x is a T -proof of y'. 18 Whereas Gödel tells us that T cannot prove Con(T ), Pudlák tells us that, if T is finitely axiomatized, it cannot even interpret Q+ Con(T ), let alone T + Con(T ).27 The proofs of these two results are (well) beyond the scope of the present discussion.28 Putting these together, we have:29 Corollary 4.7. Let S ⊇ Q be a consistent, finitely axiomatizable sequential theory that proves Con(T ) on a cut. Then S is not interpretable in T . Proof. If S proves the consistency of T on a cut, then by 4.1 it will interpret Q+ Con(T ). But if S were interpretable in T , then Q+ Con(T ) would be interpretable in T .  It is Corollary 4.7 that will do much of the work below. 5. THE STRENGTH OF TRUTH-THEORIES 5.1. T T is Stronger than T . We are now ready to prove our first main result.30 Theorem 5.1. Let T ⊇ I∆0 + ω1 and suppose that T T proves that all axioms of T are true. Then T T proves the consistency of T on a cut and so is not interpretable in T . The natural proof of this needs to use I∆0 + ω1 because, as I said earlier, it is only here that we can do syntax naturally. We'll see later that this assumption can be weakened. The key to the proof is the realization that we can almost mimic the 'trivial' proof of the consistency of T that we learned from Tarski. That proof proceeds as follows: First, we show that all the axioms are true; then we show that the rules of inference preserve truth; then we conclude, by induction, that all theorems are true. Since '0 = S0' is not true, it isn't a theorem, so T is consistent. 27Feferman (1960, p. 76, theorem 6.5) proved an antecedent of Pudlák's result: If PA ⊆ T , then T does not interpret T + Con(T ), assuming that the axioms of T are represented by a Σ1 formula. 28As well as the paper of Pudlák's already cited, the interested reader may consult Hajék and Pudlák (1993, pp. 173ff); see also Visser's recent paper on the second incompleteness theorem (Visser, 2009a). 29Exercise: Show that we do not have to assume that T is consistent. 30The results reported in this section were taught to me by Albert Visser, though the proofs are my own, and the complications we shall meet arose as I tried to work out the details. There is a result very similar in feel to Corollary 5.6 in a recent paper of his (Visser, 2009a, theorem 4.1). Note that Corollary 5.6 leads to an alterative statement of Theorem 4.4 of that paper, which is the characterization of consistency statements that is its central purpose. This version relies upon coding, however, which is part of what Visser is trying to avoid. We will prove related results below, Theorem 6.5 and Corollary 6.13, that do not have this flaw. 19 This won't work in the present case, of course, because we do not have 'semantic induction', that is, induction for formulae containing semantic vocabulary. But we could overcome that lack by the method of cuts if we could show that 'n line proofs have true conclusions' is inductive. Then we would have that, although T T does not prove Con(T ), it does prove it on a cut. If that were the only obstacle, the proof would be easy. Unfortunately, there is another. We're just assuming, at present, that T T can prove that all of T 's non-logical axioms are true. But, to mimic Tarski's proof, we also need to prove that all the logical axioms are true and that the rules of inference are truth-preserving. This turns out to be more difficult than one might suppose. It helps to assume that the logic in which we're working is formulated as an axiomatic system rather than a natural deduction system, with just two rules of inference: modus ponens and universal generalization. This allows us to speak simply in terms of the truth of the various lines of a proof, rather than in terms of whether the formula on a given line follows from the premises on which that line depends.31 The propositional axioms are easy enough.32 Consider, for example, p → (q → p). Let A and B be formulae. Using the clause for → twice, Satσ(pA→ (B → A)q) iff Satσ(A) → (Satσ(B) → Satσ(A)). But the latter is of course a logical truth. So, generalizing, for any A and B, and for all σ, Satσ(pA→ (B → A)q), which is to say that all instances of p→ (q → p) are true. The propositional rule, modus ponens, is also easy. What we need to show is that, if both A and A → B are satisfied by all sequences, then so is B. If ∀σSatσ(pA→ Bq), then, by the clause for the conditional: ∀σ(Satσ(A) → Satσ(B)). But then, by logic: ∀σSatσ(A) → ∀σSatσ(B). So, if ∀σSatσ(A), then ∀σSatσ(B). Unfortunately, we run into problems with quantification. (Don't we always.) Consider universal instantiation, the simplest formulation of which is: ∀vi(φvi)→ φvj , subject to the usual restrictions. The argument for its truth proceeds as follows. Suppose some sequence σ does not satisfy some instance. Then, by the clause for→, we have Satσ(∀vi(φvi)) and ¬Satσ(φvj). Now consider a sequence that is just like σ, except that what it assigns to vi is whatever σ assigns to vj . So τ i∼ σ, and hence Satτ (φvi). But 31The difficulty presented by a natural deduction system is that the correctness of a line then involves the consequent's being satisfied by all sequences if all the premises are, and this introduces more logical complexity than we have with the axiomatic treatment. 32Assuming we define truth as Tarski did, in terms of satisfaction by all sequences. If we use the alternate definition, and say that a line is true iff its universal closure is satisifed by <>, then we find ourselves needing to prove: ∀σ(Satσ(A)) ≡ Sat<>(ucl(A)). That only adds to our problems. 20 since (i) vi stands in φvi only where vj stands in φvj and (ii) τ assigns vi the same value that σ assigns vj , then we must have ¬Satτ (φvi), since ¬Satσ(φvj). Contradiction. In making this last move, however, we are appealing to a general principle concerning 'variable-switching', one we might formulate as: If φvj results from replacing all free occurrences of vi in φvi by vj , and if τ is just like σ but sets τi = σj , then Satσ(φvj) iff Satσ(φvi).33 There is clearly no hope of proving this without 'semantic' induction. The problem is all the more serious if we allow instantiation not just by variables but by arbitrary terms and so formulate UI in the form: ∀vi(φvi)→ φt (And we will have to face this problem, one way or another, if our language does indeed contain terms that are not variables.) In that case, the proof also requires the claim that all terms denote. There are similar problems concerning universal generalization: A→ φ(vi) ` A→ ∀viφ(vi), where of course A must not contain 'x' free. Suppose that A→ ∀viφ(vi) is not satisfied by all sequences. Then there is a sequence σ such that Satσ(A) and ¬Satσ(∀viφ(vi)). By the clause for ∀, then, we have a sequence τ i∼ σ such that ¬Satτ (φ(vi)). Since vi is not free in A, then, Satτ (A), as well. But how do we know that? Because whether a formula is satisfied by a sequence depends only upon what is assigned to variables that occur free in that formula, viz.: ∀i(free-in(A, vi)→ ∀x(val(σ, i, x) ≡ val(τ, i, x))→ Satσ(A) ≡ Satτ (A) But not will we be able to prove this without semantic induction.34 Careful examination of the proofs that the logical axioms are true and the rules are truth-preserving shows that those proofs need the following semantic claims. (1) If φt is the result of replacing all free occurrences of vi in φvi with t, and if Denσ(t, a) and ∀k 6= i(val(τ, k) = val(σ, k) ∧ val(τ, i) = a, then, Satσ(φt) iff Satτ (φvi). (2) Suppose that σ and τ agree on all free variables contained in A. Then Satσ(A) iff Satτ (A). The proofs of these depend upon the corresponding claims for terms: (3) If u(t) is the result of replacing all occurrences of vi in u(vi) with t, and if ∀k 6= i(τk = σk) ∧ τi = σj , then Denσ(u(t), a) iff Denτ (u(vi), a). 33Note that φvj does not contain vi free. 34This particular issue can be avoided if we reformulate our truth-theory so that a sequence satisfies a formula only if it assigns values to all and only the variables free in that formula. This complicates the statement of the theory, however, and it does not help with our other problems. 21 (4) Suppose that σ and τ agree on all free variables contained in t. Then Denσ(t, a) iff Denτ (t, a). We also need: (5) For every term t, ∃x(Denσ(t, x)), though this will be trivial if there are no terms in the language other than variables. We thus have no hope whatsoever of showing that T T proves that 'logic is true', i.e., that the logical axioms are all true and that the rules of inference are truth-preserving. All is not lost, however, because we can use the method of cuts. The idea is to show that, though T T does not prove the listed semantic principles, it does prove their relativizations to some cut. Then it will follow that any formula that is on the cut and is an instance of a logical axiom is true, and any rule of inference involving only formulae on the cut will be truth-preserving. And that will allow us to show that there can be no T -proof of a contradiction on that cut. Consider, for example: (1*) For all σ and τ , if φt is of complexity < n and is the result of replacing all occurrences of vi in φvi with t, and if Denσ(t, a) and ∀k 6= i(τk = σk) ∧ τi = a, then Satσ(φt) iff Satτ (φvi). The usual proof of (1) can be adapted to show that (1*) is inductive. There are similar formulae corresponding to (2)–(4), and the usual proofs of them can also be adapted to show that their 'starred versions' are inductive. The case of (5) is more complicated, however. The corresponding inductive formula is: (5*) If t is of complexity < n, then ∃x(Denσ(t, x)). In the case of the language of arithmetic, this will certainly be inductive. But if we were to add expressions to the language for fast growing functions, then we might have difficulty keeping the value of the term in the cut, so to speak. The problem can be side-stepped, however, by considering, in the first instance, only purely relational languages, such as the langauge of relational arithmetic. Then, as mentioned earlier, (5) is trivial. We first prove Theorem 5.1, then, for the special case of relational languages. Theorem 5.2. Let T ⊇ I∆0+ω1, where LT is relational, and suppose that T T proves that all axioms of T are true. Then T T proves the consistency of T on a cut and so is not interpretable in T . Proof. As noted, the usual proofs of (1)–(4) can be adapted to show that their starred versions are inductive, so, by Lemma 4.4, T T proves their relativizations to some cut on which it also proves the axioms of I∆0 +ω1. What we now do is 'work on this cut', as it is said: Relativizing everything to the cut, we can prove the relativization of 'logic is true' on the cut. On that cut, we will be able to prove that 'n line proofs have true conclusions' 22 is inductive and will therefore be able to construct a cut on which the relativization of 'for all n, n line proofs have true conclusions' is true. The relativization of 'all theorems of T are true' to that cut will then be provable, and so we will be able to prove the consistency of T on that cut. To fill in a little detail, consider a formula φ(n) that says: if n is the Gödel number of a proof such that (i) n lies in our cut and (ii) every formula in the proof also lies in this cut, then (iii) every formula occurring in the proof is true. I.e., if we let λ(x) be a formula describing the cut on which logic is true, then φ(n) is: BewT (n) ∧ λ(n) ∧ ∀m < len(n)(λ(nm))→ ∀m < len(n)(T(nm)) The second conjunct will often be redundant, given the usual sorts of Gödel numberings: If n lies in the cut, then the Gödel numbers of the formulae occurring in the proof it codes will be ≤ n. But of course it cannot hurt to include it. Now consider: ∀k ≤ n(φ(k)) The usual argument can be used to show that this is inductive, since all the formulas involved here lie in our cut, and logic is true on that cut. We therefore have a cut κ(x) on which ∀k ≤ n(φ(k)) holds. I.e., we can prove: ∀n{κ(n)→ ∀k ≤ n[λ(k) ∧ BewT (k) ∧ ∀m < len(k)(λ(km))→ ∀m < len(k)(T(km))]} Taking k to be n, we have: ∀n{κ(n) ∧ λ(n) ∧ BewT (n) ∧ ∀m < len(n)(λ(nm))→ ∀m < len(n)(T(nm))} What we want is: ∀n{κ(n) ∧ BewT (n)→ ∀m < len(n)(T(nm))}(*) We thus need to eliminate the other conjuncts of the antecedent by showing that those two conjuncts: (i) λ(n) and (ii) ∀m < len(n)(λ(nm)) follow from the other two: κ(n) and BewT (n). For the first, by construction, κ(x) is a sub-cut of λ(x). For (2), we need only make sure that λ(x) satisfies: seq(n)→ ∀m < len(n)(nm < n) We can do this simply by making sure that the axioms of, say, I∆0 are true on λ(x), or we could build this condition directly into λ(x). 23 From (*), then, we easily derive ∀n∀m{κ(n) ∧ BewT (n,m)→ T(m)} and so: ∀n{κ(n)→ ¬BewT (n, p0 = 1q} So T is consistent on κ(x).  With Theorem 5.2 in hand, we can extend the result to non-relational languages and so establish Theorem 5.1. Proof of Theorem 5.1. Let TR be the relational version of T . What we are going to see is that TRT is interpretable in T T . It is easy enough to interpret T in TR, of course, via such translations as: r(pAxyzq) = px+ y = zq But, of course, that is not all we need to do. We need to interpret the semantics of the relational language in that of the non-relational language. But this is fairly easy to do. The idea is just to translate Satσ(A) as Satσ(r(A)), where r(x) is a formula of the language of T that expresses the translation from LTR to LT . 35 And since r(x) commutes with the logical connectives, proving the translations of the semantic axioms for the connectives will be easy. For example, the translation of Satσ(pA ∧Bq) ≡ Satσ(A) ∧ Satσ(B) is Satσ(r(pA ∧Bq)) ≡ Satσ(r(A)) ∧ Satσ(r(B)) But r(pA ∧Bq) just is pr(A) ∧ r(B)q. And since we did not relativize the interpretation, the case of quantification is no harder. The clauses for the non-logical constants are also easy. Consider, for example, that for Axyz, which is essentially: Satσ(pAvivjvkq) ≡ Aσiσjσk Its translation is: Satσ(r(pAvivjvkq)) ≡ (σi + σj = σk) But r(pAvivjvkq) is vi + vj = vk, so this becomes: Satσ(pvi + vj = vkq) ≡ (σi + σj = σk) which is easily provable. So TRT is interpretable in T T . But TRT proves Con(TR) on a cut and so interprets Q + Con(TR) and, in fact, interprets I∆0 + ω1 + Con(TR). But I∆0 +ω1 is strong enough to verify the fact that T is interpretable in 35Since the translation is recursive, it will of course be representable in T . In general, of course, it will be represented by some formula R(x, y), not by a function like r(x). But this point affects nothing that follows and only complicates the exposition. (We probably do need to know that every formula has exactly one translation. But I∆0 + ω1 will prove such facts.) 24 TR and so to prove that, if Con(TR), then Con(T ). So I∆0 + ω1 + Con(TR) actually proves Con(T ), whence I∆0 + ω1 + Con(T ), and so of course Q+ Con(T ), is interpretable in TRT and so in T T .  Theorem 5.3. Let T ⊇ Q and suppose that T T proves that all axioms of T are true. Then T T proves the consistency of T on a cut. Proof. Start the proofs of the preceding theorems by restricting everything to a cut on which the axioms of I∆0 + ω1 are available.  Corollary 5.4. Suppose T ⊇ Q is finitely axiomatized. Then T T is not interpretable in T . Proof. From Theorem 5.3 and Corollary 3.3.  Theorem 5.5. Let T be a finitely axiomatized theory in a finite language. Then Q+ Con(T ) interprets T T . Proof. The proof of this theorem is similar to that of Theorem 6.12 below, but harder, since Q is so weak. See Visser's paper "The Predicative Frege Hierarchy" for the details (Visser, 2009b).  Corollary 5.6. Let T be a finitely axiomatized theory in a finite language. Then Q+ Con(T ) is mutually interpretable with T T . So there is a straightforward sense in which a 'full truth-theory' is a sort of functor that strengthens any finitely axiomatized theory you feed it. We'll see some similar, but much more general, results below. There is one more result I want to mention before we continue. Theorem 5.7. (I∆0 + ω1)T is interpretable in QT . We will not actually need this result for anything that follows, but the proof seems to me to be of substantial interest. The technique involved will be familiar from the proof of Theorem 5.1, but it is applied more subtly. It will be clear, too, that it is not special to this particular case. What it shows, in effect, is that we can always relativize the semantic part of a theory like QT to a cut. Proof. We know, of course, that we can interpret I∆0 + ω1 in Q. The problem is to do so while preserving the semantic part of QT . We cannot actually expect (I∆0 + ω1)T to prove the relativizations of the semantic axioms of QT . That would mean, in particular, proving the relativization of the clause for ∃, which would be: κ(σ)→ Satσ(p∃viφviq) ≡ ∃τ [κ(τ) ∧ τ i∼ σ ∧ Satτ (pφviq)], This says, in effect, that ∃viφvi is true iff there is a number in the cut that satisfies φvi, which is, in general, false. But what we can do is reinterpret satisfaction itself so that Satσ(A) means: the relativization of A is satisfied by σ. That is, we translate Satσ(A) as: Satσ(tκ(A)), where 25 tκ(x) is a syntactic function meaning: the relativization of A to κ(x).36 So what we need to prove is: κ(σ)→ Satσ(tκ(p∃viφviq)) ≡ ∃τ [κ(τ) ∧ τ i∼ σ ∧ Satτ (tκ(pφviq))]. Now, tκ(p∃viφviq) is ∃vi(κ(vi) ∧ tκ(φvi)), so this becomes: κ(σ)→ Satσ(p∃vi(κ(vi) ∧ tκ(φvi))q) ≡ ∃τ [κ(τ) ∧ τ i∼ σ ∧ Satτ (tκ(pφviq))]. This, however, is easily proven. Left to right: By the clauses for ∃ and ∧, Satσ(p∃vi(κ(vi) ∧ tκ(φvi))q) iff ∃τ [τ i∼ σ∧Satτ (pκ(vi)q)∧Satτ (tκ(pφviq))]. But κ(vi) is a specific formula, and so we can prove a Sat-sentence for it. In particular, we have: T T ` Satτ (pκ(vi)q) ≡ κ(τi) But if κ(τi), then, since κ(σ), also κ(τ). That is: If a sequence is in the cut, and some number is in the cut, then the sequence we get by replacing some member of the original sequence by the new number is also in the cut. Although this is not provable in Q, it is provable in I∆0 + ω1, so it will be true on the cut given by κ(x), so we are done. The converse is similar.  5.2. Peano Arithmetic Is a Special Case (I). The results proven in the preceding section depend heavily upon the assumption that T is finitely axiomatized. This is because, as mentioned previously, is that, if T is infinitely axiomatized, then there is no reason, in general, to suppose that T T will prove that all of T 's axioms are true, though it will prove that each of them is. But we do have the following obvious corollary. Corollary 5.8. Let T be an infinitely axiomatized theory. Then T T is mutually locally interpretable with Q+ ⋃ {Con(U)}, where U is a finite fragment of T . Each finite fragment Q+ Con(U1) + * * *+ Con(Un) of Q+ ⋃ {Con(U)} is interpretable in U1T + * * *+ UnT ⊆ T T . Each finite fragment UT of T T is interpretable in Q+ Con(U) ⊆ Q+ ⋃ {Con(U)}. Corollary 5.9. If T is reflexive, then T T is interpretable in T . Proof. A reflexive theory, by definition, is one that proves the consistency of each of its finite sub-theories. So, if T is reflexive, it contains Q +⋃ {Con(U)} and so itself locally interprets T T , and it then follows from Orey's Theorem that T globally interprets T T .  So, in particular, we have:37 36Being primitive recursive, τκ(x) is of course repesentable in Q. As above, it will actually be represented by a formula, but this will make no difference to what follows. 37Stronger versions of this result have been proven by Visser and Enayat. This version seems to be folklore. 26 Corollary 5.10. PAT is intepretable in PA. 5.3. Extending Induction. As I have emphasized, what was shown in Section 5.1 is not that T T proves that T is consistent. If T is a finitely axiomatized (sequential) theory, then T T will prove that T 's axioms are true and will prove that the rules preserve truth, but T T does not have the induction axiom needed to conclude that all proofs have true conclusions.38 The natural question to ask, then, is: What exactly do we need to get a proof of T 's consistency? We need T to contain some induction axioms in the first place, and then we need to replace T T with a version that extends the induction axioms to permit semantic predicates-in particular, the truth-predicate-to occur therein. It is not at all obvious, in general, what it means to 'extend the induction scheme'. The scheme might itself be stated in such a way as to exclude formluae containing semantic vocabulary. To take a trivial example, the scheme might require that its instances contain no predicates other than identity. In the cases in which we shall be interested, however, the right way to proceed is both clear and well established. Intuitively, the point is that we may simply regard such formulae as Satσ(t) as being among the atomic formulae from which the construction of more complex formulae begins. More precisely, we may make use of the so-called relativized arithmetical hierarchy (Hájek and Pudlák, 1993, pp. 81ff). Let X be any set of formulas. A formula is said to be ∆0(X) if it belongs to the smallest class of formulae that (i) contains all atomic (arithmetical) formulae and all formulae in X and (ii) is closed under Boolean operations and bounded quantification. A formula is then Σ1(X) if it is of the form ∀yφ(y), where φ(y) is ∆0(X). And so forth. In our case, if we take Sem to be the set of atomic semantic formulae-Denσ(t, x), Satσ(x), and so forth-then what it means to 'extend induction' in the case of I∆0, say, is that we permit induction on ∆0(Sem) formulae. The resulting theory is thus I∆0(Sem). Similarly for IΣ1, etc. Definition. Suppose that T is among I∆0, IΣn, I∆0(X), and so forth. Then: 38Indeed, T T cannot even prove that all logically provable sentences-that is, sentences provable using none of the special axioms of T-are true. Suppose T proves the following: Let the Ti be the axioms of T . Then if A is a theorem of T , ∧ i Ti → A is logically provable. I do not know exactly how much is needed for the proof of the result. Not very much, to be sure, but it surely cannot be proven in Q. In any event, reason in T T . Suppose that A is provable in T . Then ∧ i Ti → A is logically provable. By the Tarski clauses, T( ∧ i Ti → A) iff T(T1) ∧ * * * ∧ T(Tn)→ T(A). Since each axiom is true, T(A) if T( ∧ i Ti → A). So, if all logically provable sentences are true, every T -provable sentence is true. Cieśliński (2009) notes that the same result holds even in the case of PA, though of course the argument is more complicated, since PA is not finitely axiomatizable. 27 (1) T D+ is the result of (i) adding all T-sentences for the language of T and (ii) extending the induction scheme in the sense just explained. (2) T S+ is the result of (i) adding not just the T-sentences for the language of T but also the 'Sat-sentences', such as Satσ(v0 = v1)] ≡ ∃x∃y[val(σ, 0, x) ∧ val(σ, 1, y) ∧ x = y], and (ii) extending the induction scheme. (3) T T+ is the result of (i) adding a fully compositional truth-theory and (ii) extending the induction scheme. We'll begin by exploring T D+. It's well-known that PAD+ is a conservative extension of PA. Here's a similar result, but stated in terms of interpretability. Theorem 5.11. PAD+ is interpretable in PA. Proof. Let S be a finite subset of the axioms of PAD+. S will contain at most finitely many T-sentences, say for A1, . . . , An. We interpret T(x) in terms of a 'list-like' theory of truth, that is, as: (x = pA1q ∧A1) ∨ * * * ∨ (x = pAnq ∧An) Clearly, with T(x) so defined, PA will prove the T-sentences for A1, . . . , An. Moreover, with T(x) so defined, any extended induction axioms that appear in S will simply become induction axioms of PA. So PAD+ is locally interpretable in PA. Now apply Orey's Theorem.  Note that this continues to hold for PAS+, by pretty much the same reasoning. The same argument shows that QD+ and QS+ are locally interpretable in Q. The proof of 5.11 does not obviously extend, however, to sub-systems of PA such as IΣ1: We cannot show so simply that IΣD+1 is locally interpretable in IΣ1. The reason is that the Ai may be of any complexity, and so, if we have an induction axiom for some Σ1(Sem) formula A(x), the result of replacing T(x) by its 'list-like' definition in A(x) may yield a formula that is not itself Σ1. But there is a slightly more complicated proof that does work. Theorem 5.12. IΣD+n is locally interpretable in IΣn. Proof. Let S be a finite subset of the axioms of IΣD+n . Then S contains only finitely many T-sentences. For illustration, say these are A and B. As before, we interpret T(x) as: (x = pAq ∧ A) ∨ (x = pBq ∧B). We can then easily prove the T-sentences for A and B. But, of course, S may also contain some extended induction axioms from IΣD+n . We need to see that these are also going to be provable. 28 Suppose one of these induction axioms is the axiom for the formula φ(x) ∨ T(sb(pψ(x))q, x)), where φ(x) is itself Σn but ψ(x) need not be.39 The induction axiom in question is thus: φ(0) ∨ T(sb(pψ(x)q, 0))∧ ∀x[φ(x) ∨ T(sb(pψ(x)q, x))→ φ(Sx) ∨ T(sb(pψ(x)q, Sx))]→ ∀x(φ(x) ∨ T(sb(pψ(x)q, x))) Under our interpretation of T(x), this becomes:[ φ(0) ∨ (sb(pψ(x)q, 0) = pAq ∧A) ∨ φ(0) ∨ (sb(pψ(x)q, 0) = pBq ∧B) ] ∧ ∀x [ φ(x) ∨ (sb(pψ(x)q, x) = pAq ∧A) ∨ φ(x) ∨ (sb(pψ(x)q, x) = pBq ∧B)→ φ(Sx) ∨ (sb(pψ(x)q, Sx) = pAq ∧A) ∨ φ(Sx) ∨ (sb(pψ(x)q, Sx) = pBq ∧B) ] → ∀x[φ(x) ∨ (sb(pψ(x)q, x) = pAq ∧A) ∨ φ(x) ∨ (sb(pψ(x)q, x) = pBq ∧B)] (Sorry about that.) The crucial point is that A and B are sentences, so the quantifier ∀x cannot bind any variables in A or B . Hence, they can be "pulled out" in the following way. Abbreviate the formula just displayed as Φ(A,B). Then it is logically equivalent to: [A ∧B → Φ(>,>)] ∧ [A ∧ ¬B → Φ(>,⊥)]∧ [¬A ∧B → Φ(⊥,>)] ∧ [¬A ∧ ¬B → Φ(⊥,⊥)] where > is 0 = 0 and ⊥ is 0 6= 0. By completeness, this equivalence is provable. Now Φ(>,>) is: φ(0) ∨ (sb(pψ(x)q, 0) = pAq ∧ >) ∨ φ(0) ∨ (sb(pψ(x)q, 0) = pBq ∧ >)∧ ∀x[φ(x) ∨ (sb(pψ(x)q, x) = pAq ∧ >) ∨ φ(x) ∨ (sb(pψ(x)q, x) = pBq ∧ >)→ φ(Sx) ∨ (sb(pψ(x)q, Sx) = pAq ∧ >) ∨ φ(Sx) ∨ (sb(pψ(x)q, Sx) = pBq ∧ >)]→ ∀x[φ(x) ∨ (sb(pψ(x)q, x) = pAq ∧ >) ∨ φ(x) ∨ (sb(pψ(x)q, x) = pBq ∧ >)] and that is itself a Σn induction axiom. It is therefore provable, and hence so is A ∧B → Φ(>,>). The same goes for the other cases. So the induction axiom in question is provable under our interpretation of T(x). Of course, nothing hinges on the details of this particular example.  As we shall see, corresponding results do not hold for IΣS+n . 39Here, sb(y, x) means: The result of substituting the numeral for x for the sole free variable in y. I choose this example because the threat here is that the ability to substitute in this way will allow us to get the induction axiom for φ(x) ∨ ψ(x), which need not be Σn. 29 5.4. Semantic Consistency Proofs. If T T+ is going to formalize Tarski's proof of Con(T ), then it will need to be able to do two things: (i) Carry out the induction at the core of that proof, and (ii) Prove that all of the logical and non-logical axioms of T are true. The obvious sort of formula to use in the inductive part of the proof is something like: ι(n) df ≡ ∀z∀y∀m < n[(BewT (z, y) ∧ lh(z,m)→ ∀σSatσ(y)] This is Π1(Sem). Moreover, as a look back at (1)–(5) will show, the formulae involved in the various inductions needed to prove that logic is true are Π1(Sem)-except for the one concerning denotation, which is Σ1(Sem). Since IΣ1 has induction for Π1 formulae (Hájek and Pudlák, 1993, p. 63, theorem 2.4), we thus have:40 Theorem 5.13. Suppose T ⊇ IΣ1 and suppose further that T T+ proves that all axioms of S (which may or may not be T ) are true. Then T T+ proves Con(S). Corollary 5.14. Suppose T ⊇ IΣ1 is finitely axiomatized. Then T T+ proves Con(T ). This might seem like a nice, neat result. Since IΣn is finitely axiomatizable, we'll get that (IΣ1)T+ proves the consistency of IΣ1, that (IΣ2)T+ proves Con(IΣ2), and so forth. Unfortunately, things do not work out nearly so nicely. 5.5. How PAT+ Proves Con(PA). Everyone knows that PAT+ proves Con(PA). But it's a good deal less obvious how it does so than people often seem to suppose. What you usually hear people say-and what I myself usually say-is that the proof goes like this: First, you prove that the axioms are true; then you prove that the rules of inference preserve truth; and then you use the extended induction scheme to conclude that all the theorems are true. Since '0 = S0' is provably untrue, it isn't a theorem, so PA is consistent. But this sketch fails to address a very important question, namely: How are we supposed to prove that all of the axioms of PA are true?41 We can easily enough prove, of each axiom, that it is true, since we can prove its T-sentence and we can prove it. But that is an entirely different 40Henrik Kotlarski (1986) seems to claim that this result can be strengthened to T ⊇ I∆0. This seems doubtful, however. Kotlarski is simply not careful enough about the case of the logical axioms. Enayat and Visser have show that Kotlarski's result can be salvaged in the semantic setting in which he works by strengthening the conditions on satisfaction classes. In the present axiomatic setting, one could, similarly, add an axiom to the truth-theory stipulating that 'variable switching' works as it should. But that does not seem very interesting. 41Wang (1952, p. 260) credits J. Barkley Rosser with the observation that this question needs to be addressed. 30 matter. There are truckloads of very important cases where PA can prove that each number blurgs without being able to prove that every number blurgs. So again: How do we prove that all of PA's axioms are true? The answer is that the truth of all the axioms falls out of a single instance of the extended induction scheme. Consider the formula: φ(a, z, σ) df ≡ ∃τ [ τ 0∼ σ ∧ val(τ, 0, z) ∧ Satτ (a) ] Here, a is meant to code a formula with v0 free, e.g., A(v0, ~y), where ~y indicates additional free variables that might occur. So what φ(pA(v0, ~y)q, σ, z) says is that A(v0, ~y) is satisfied by the sequence that is just like σ except that it assigns z to v0. Note that A(v0, ~y) is doing duty as a variable with which we are reasoning in PA. We have the induction axiom: φ(pA(v0, ~y)q, 0, σ)∧ ∀v0[φ(pA(v0, ~y)q, v0, σ)→ φ(pA(v0, ~y)q, Sv0, σ)]→ ∀v0[φ(pA(v0, ~y)q, v0, σ)] What we want to show is that A(0, ~y) ∧ ∀v0 [A(v0, ~y)→ A(Sv0, ~y)]→ ∀v0(A(v0, ~y)) is true. This will be true just in case the displayed formula is satisfied by every sequence σ. But then, by the clauses for the connectives, that holds just in case, for every sequence σ: Satσ(pA(0, ~y)q)∧ Satσ(p∀v0[A(v0, ~y)→ A(Sv0, ~y)]q)→ Satσ(p∀v0A(v0, ~y)q) This is what we want to prove. What we need to show is: (1) Satσ(pA(0, ~y)q) implies φ(pA(v0, ~y)q, 0, σ) (2) Satσ(p∀v0[A(v0, ~y)→ A(Sv0, ~y)]q) implies ∀v0[φ(pA(v0, ~y)q, v0, σ)→ φ(pA(v0, ~y)q, Sv0, σ)] (3) ∀v0[φ(pA(v0, ~y)q, v0, σ)] implies Satσ(p∀v0A(v0, ~y)q) None of these are terribly difficult, given three important facts: (i) If σ and τ agree on the free variables present in some formula ψ, then Satσ(ψ) iff Satτ (ψ). (ii) If Satσ(pψ(0)q) and val(σ, 0, 0), then Satσ(pψ(v0)q). (iii) If Satσ(pψ(Sv0)q), and if τ is just like σ except that what it assigns to v0 is the successor of what σ assigns to v0, then Satτ (pψ(v0)q).42 All of these are provable in PAT+ by the usual sorts of arguments. We get (1) immediately from (ii). 42We're assuming, of course, that all free occurences of v0 have been replaced by occurrences of Sv0. 31 For (2), Satσ(p∀v0[A(v0, ~y)→ A(Sv0, ~y)]q) is equivalent to: (5.1) ∀χ 0∼ σ [Satχ(pA(v0, ~y)q)→ Satχ(pA(Sv0, ~y)q)] What we want to show is: ∀v0{∃τ [τ 0∼ σ ∧ val(τ, 0, v0) ∧ Satτ (pA(v0, ~y)q)]→ ∃τ [τ 0∼ σ ∧ val(τ, 0, Sv0) ∧ Satτ (pA(v0, ~y)q)]} So suppose ∃τ [τ 0∼ σ∧val(τ, 0, v0)∧Satτ (pA(v0, ~y)q)]. Then Satτ (pA(Sv0, ~y)q), by (5.1). Now let υ be just like τ except that it assigns v0 the successor of what τ assigns v0. Then, by (iii), Satυ(pA(v0, ~y)q). And so, generalizing, we have ∃τ [τ 0∼ σ ∧ val(τ, 0, Sv0) ∧ Satτ (pA(v0, ~y)q)]}, as wanted. The argument for (3) is similar. What makes all of this go, then, is the fact that PA is schematically axiomatized: An extended instance of the induction scheme for PA can be made to yield all the unextended instances. But, by the same token, the argument works only because PA is schematically axiomatized. If T is an infinitely axiomatized theory that is not schematically axiomatized, such as primitive recursive arithmetic, then there is no reason whatsoever to expect that T T+ should prove that all of T 's axioms are true. So, as Visser once put it, the fact that PAT+ proves Con(PA) is something of a happy accident. Too happy, as we are about to see. 5.6. An Unfortunate Result. Lemma 5.15. (IΣ1)T+ proves that all axioms of PA are true. Proof. The argument given in the last section needed precisely one extended instance of induction, that for the formula φ(a, σ, z). This is Σ1.43 The other thing we need to check is that the general principles (i)–(iii) on which we relied can be proven in (IΣ1)T+. The proofs of these are all by induction, but, other than the semantic notions, there is nothing in these that isn't primtive recursive and so ∆1 in IΣ1; the universal quantifier over sequences makes the relevant claims Π1. So the reasoning in the last section can all be carried out in (IΣ1)T+.  Corollary 5.16. (IΣ1)T+ proves Con(PA).44 Proof. By Theorem 5.13.  43As mentioned in note 11, τ 0∼ σ can be made to be ∆1. 44Kotlarski (1986) claims that (I∆0)T+ proves Con(PA). It can be shown that (I∆0) T+ proves that all axioms of PA are true, but, as mentioned in note 40, Kotlarski does not address the question how we are supposed to show that 'logic is true', and it does not appear that (I∆0)T+ can prove that logic is true. 32 Since Con(PA) is a single theorem of PAT+, the full power of PAT+ can't be needed for the proof; only finitely many axioms of PAT+ will be needed, so Con(PA) has to be provable in (IΣn)T+, for some n. In that sense, Corollary 5.16 is no surprise. Nonetheless, I take it to be a bad result in the context of the present investigation, in so far as it suggests that we do not yet have things properly formulated. It's a perfectly natural question what sort of truth-theory you need to formalize the obvious sort of semantic consistency proof for IΣ1. It's disappointing if the answer turns out to be, "One that proves Con(PA)".45 It's worth noting that we get a similar phenomenon in (I∆0)S+. Theorem 5.17. (I∆0)S+ contains PA. The argument is similar in spirit to the one for Lemma 5.15. The difference is that, in this case, we are proving only that (I∆0)S+ proves each of the induction axioms for PA, rather than that it proves that they are all true. This makes things rather easier. Proof. Let A(v0, v1) be a formula; extension to the case of extra free variables is straightforward. Now consider the formula: φ(z) df ≡ ∃τ < t(σ, z) [ τ 0∼ σ ∧ val(τ, 0, z) ∧ Satτ (pA(v0, v1)q) ] Here, t(σ, z) is a term I shall not attempt to describe that appropriately bounds the initial quantifier.46 So this is ∆0(Sem), so (I∆0)S+ has induction for it. The induction axiom is: ∃τ < t(σ, 0) [ τ 0∼ σ ∧ val(τ, 0, 0) ∧ Satτ (pA(v0, v1)q) ] ∧ ∀v0{∃τ < t(σ, v0)[τ 0∼ σ ∧ val(τ, 0, v0) ∧ Satτ (pA(v0, v1)q)]→ ∃τ < t(σ, Sv0)[τ 0∼ σ ∧ val(τ, 0, Sv0) ∧ Satτ (pA(Sv0, v1)q)]} → ∀v0∃τ < t(σ, v0) [ τ 0∼ σ ∧ val(τ, 0, v0) ∧ Satτ (pA(v0, v1)q) ] But the Sat-sentence for A(x, y) will give us: Satτ (pA(v0, v1)q) ≡ ∃v0∃v1[val(τ, 0, v0) ∧ val(τ, 1, v1) ∧A(v0, v1)] So φ(z) is equivalent to: ∃v1[val(τ, 1, v1) ∧A(z, v1)] 45Of course, PA itself proves Con(IΣ1), and the argument is semantic in character-it uses a partial truth-theory for the language of arithmetic-but it is very much not the sort of argument we are discussing. 46Since τ can be taken to be σ with the first entry changed to z, we can actually calculate what τ is, given σ. 33 Then the induction axiom reduces to: ∃v1[val(σ, 1, v1) ∧A(0, v1)]∧ ∀v0 [∃v1[val(σ, 1, v1) ∧A(v0, v1)]→ ∃v1[val(σ, 1, v1) ∧A(Sv0, v1)]]→ ∀v0∃v1[val(σ, 1, v1) ∧A(v0, v1)] and this holds for any σ. Now suppose A(0, v1) and ∀v0(A(v0, v1)→ A(Sv0, v1)). Then there is a sequence χ such that val(χ, 1, v1). Hence, for this χ, we have: A(v0, v1) ≡ ∃v1(val(χ, 1, v1) ∧A(v0, v1)). So the induction axiom becomes: A(0, v1) ∧ ∀v0(A(v0, v1)→ A(Sv0, v1))→ ∀v0(A(v0)) as wanted.  This last result is relevant to Volker Halbach's (2001a) claim that the 'uniform disquotation scheme'-our (*)S+-is plausibly analytic, since PAS+ is a conservative extension of PA. What we have just seen, however, is that this result depends crucially upon the choice of PA as base theory. Whether one takes conservativity to be required for analyticity or regards it as merely indicative of it, the uniform disquotation scheme appears to be logically quite strong, transforming a theory interpretable in Q into one that contains PA. It is only in very special cases that it gives us nothing we did not already have. 6. DISENTANGLING SYNTAX FROM THE OBJECT-LANGUAGE 6.1. Reviving an Old Approach to Truth-theories. What's responsible for the unfortunate Corollary 5.16? Semantic consistency proofs make use of two different sorts of theories, for two very different sorts of reasons. On the one hand, we have a 'base theory' that gives us the syntactic machinery we need to formulate our truth-theory and then to reason within it. Among other things, for example, the extended induction axioms allow us to formalize arguments by induction on the complexity of expressions, or the length of proofs, or what have you. On the other hand, there is the object-theory, which is the theory we mean to be reasoning about, the theory whose consistency we mean to be proving. We need to know that all the axioms of the objecttheory are true, and the idea is to get their truth from them: We assume the axioms themselves and derive their truth from their T-sentences. As we have seen, however, that is not at all how things work in the case of PA. The truth of all the axioms of PA is not derived from the axioms of PA in that way, and, on reflection, it's easy to see that it can't be: The truth of each axiom of PA can be derived from that axiom, but that's it. This is what leads to Corollary 5.16: The truth of all the axioms of PA is a consequence, not of those axioms, but of a single instance of induction. Speaking more generally, the problem is that a single theory is playing both of the roles I just distinguished: In (IΣ1)T+, IΣ1 is both the 34 underlying syntax and what provides us with the axioms of the theory we had meant to be reasoning about. So induction axioms that were introduced to allow us to formalize certain sorts of syntactic arguments have instances that entail the truth of principles in the object-language that go beyond what we'd meant to be assuming. The solution to the problem is therefore obvious: We need to disentangle the syntax from the object-language. And, interestingly enough, this is how Tarski himself proceeds in "The Concept of Truth in Formalized Languages". I quoted Tarski's description of the meta-theory in which he proposes to define truth earlier. Here is his description of the meta-language: A meta-language which meets our requirements must contain three groups of expressions: (1) expressions of a general logical kind; (2) expressions having the same meaning as all the constants of the language to be discussed. . . ; (3) expressions of the structural-descriptive type which denote single signs and expressions of the language considered, whole classes and sequences of such expressions or, finally, the relations existing between them. (Tarski, 1958, pp. 210–11) The expressions mentioned under (3) belong, of course, to syntax. Tarski does not actually say that these expressions will be disjoint from those mentioned under (2), but it is natural to read him that way. That is plainly how he conceives the matter in his discussion, in section 3, of the calculus of classes (Tarski, 1958, pp. 172ff). Tarski was of course aware-at least by the time his paper was published-that the syntactic theory can be interpreted in arithmetic: His famous theorem on the indefinability of truth depends upon that fact. But the positive part of Tarski's project-showing how it is possible to define truth in a consistent manner, suitable for the purposes of meta-mathematics-in no way depends upon this now familiar maneuver. So the basic idea of separating the syntax from the object-theory is old, even if the application I propose to make of it is somewhat new. So let L be the (finite) language for which we want to give a truththeory. We let S be a disjoint (and fixed) language in which we will formalize our syntax. The most natural choice for S, and the one that would be closest to Tarksi's original intentions, would be a theory of concatenation (Quine, 1946; Tarski et al., 1953; Grzegorczyk, 2005); this would also have the advantage that what follows would be independent of issues about how we code expressions. To keep things familiar, however, we shall take S to be isomorphic to the language of arithmetic.47 (Think of S as the language of arithmetic written in boldface, or something 47The fact that L is disjoint from S is no obstacle to our coding facts about L in S. 35 of the sort.) Our theory of syntax can then be taken be Q, or I∆0, or whatever we wish. If we're going to do the semantics of L, then we're going to need to be able to talk about the things L talks about. In particular, if we're going to have the usual Tarski-style clauses for the primitive expressions of the object-language, then we are going to need to have the expressive resources of L available to us, as Tarski notes at (2). So the obvious choice for the language of our semantic theory would be S ∪ L. There are, however, complications. Suppose that L is the language of set theory. Then the quantifiers in sentences of L would normally be understood as ranging over all and only the sets. But the quantifiers in sentences of S do not range over all sets, even if (perhaps) they range over some of them. So we need to keep the domains of S and L separate somehow. There are various ways to do this. Perhaps the simplest is to let the semantic theory be many-sorted. So that's what we'll do. Variables ranging over the domain of S will be italic; those ranging over the domain of L will be upright.48 If we do go this way, then we're also going to need a separate theory of sequences or, better, of assignments of objects to variables: There will be no hope at all of coding sequences of objects from the domain of L as objects in S, at least not in general. So we shall takes ourselves to have the following theory of assignments available: ∀v[var(v)→ ∀α∀x∃β(val(β, ν) = x ∧ β v∼ α)] As before, β v∼ α abbreviates: ∀w [var(w) ∧ v 6= w → val(β,w) = val(α,w))]. What this says is thus that, given any assignment, the value it assigns to a given variable can always be changed as one pleases. Assignments live in yet a third sort. Variables ranging over them will be Greek letters. That there is at least one assignment, and that every assignment assigns a unique object to each variable, are truths of logic, in this formulation. Given this theory of assignments, we can then state a truth-theory for L. The theory will be more or less the familiar one, though with some adjustments to take account of the present framework. For example, these axioms will be common to all theories, independent of L: v: var(v)→ Denα(v, x) ≡ x = val(α, v) ∧: Satα(pA ∧Bq) ≡ Satα(A) ∧ Satα(B) ∀: Satα(p∀viA(vi)q) ≡ ∀β[β i∼ α→ Satβ(pA(x)q) The other axioms of the theory will depend upon L. If L is the language of set theory, then the only other axiom will be: ∈: Satα(pt ∈ uq) ≡ ∃x∃y[Denα(t, x) ∧Denα(u, y) ∧ x ∈ y] In the case of the language of arithmetic, we'll have axioms like: 48The two-sorted theory can of course be interpreted in a single sorted theory via the usual relativization to a pair of domains. This is more or less what Craig and Vaught (1958) do. 36 0: Denα('0', x) ≡ x = 0 +: Denα(pt+ uq, x) ≡ ∃y∃z[Denσ(t, y) ∧Denσ(u, z) ∧ x = y + z] Note that, in both these cases, the used expressions '0' and '+' are expressions of L, not of S. So that is the theory in which I propose henceforth to work. As for notation: Definition. Let T be an arithmetical theory. Then T TL is the semantics for L we have just described. We can think of ξ Tη as a two-place functor: Given a theory and a language, it returns a new theory that constitutes a semantics for that language based upon the original theory as syntax. Our interest is in the properties of this functor. Note that we are not (yet) extending any induction scheme that might be present in T . So T TL is not going to be formalizing semantic consistency proofs of the sort discussed in Section 5.4. More generally, induction in T TL does not apply to statements involving assignments, or semantics, or the object-language. The induction axioms must be 'purely syntactical'. 6.2. The Weakness of Compositional Truth-theories. We now prove a strong generalization of Corollary 3.5, which told us that we can formalize a materially adequate theory of truth for the language of arithmetic in the weak theory we called QseqT . It turns out that we can do this for any finite language, and we can do it in a theory interpretable in Q.49 Lemma 6.1. Q TL is a materially adequate theory of truth for L. That is, Q TL proves T(pAq) ≡ A for each sentence A of L. Proof. Essentially the same as that of Lemma 3.1.  The first lesson we learn here, then, is that a materially adequate theory of truth for L need make use of no information whatsoever about whatever it is that L talks about. As said, any theory of truth that is going to be materially adequate, in the sense that it proves all 'disquotational' T-sentences, is of course going to have to have the expressive resources of the object-language available to it. But that is all. We haven't even mentioned any theory formulated in L to this point, let alone made use of one. This result plays an important role in Craig and Vaught's (1958) proof that every axiomatizable theory that has no finite models has a finitely 49I think we can do something similar even for non-finite languages. In this case, it'll be local interpretability that's of interest, rather than interpretability. But I'm not sure about this. 37 axiomatizable conservative extension. Their argument is an extension of one due to Kleene (1952). Consider some recursively axiomatizable theory T . We take a weak, finitely axiomatizable theory of syntax-Q, more or less-a weak theory of assignments, and the Tarski clauses for the language of T . That's enough to prove the T-sentence for each sentence of the language of T (Craig and Vaught, 1958, p. 296, Lemma 2.4). So now, since the set of T 's axioms is recursive, it is representable in Q, and we need only add one more axiom: All of T 's axioms are true. This theory clearly contains T , and the fact that it is a conservative extension of T can be proven by the usual sort of model-theoretic argument (Craig and Vaught, 1958, p. 298, Lemma 2.7). Thus, Q TL is not a mere curiosity but is of actual mathematical utility. It is also as weak as it is possible for it to be. Lemma 6.2. Q TL is interpretable in Q. Proof. The basic idea here is very simple: Since no theory stated in L is so far in evidence, we can give L the completely trivial interpretation whose domain is {0}, that takes each term to denote 0, and that takes every predicate to have an empty extension. The theory of assignments is then completely trivial: val(v, x) will always be true, for each v and x. A semantic theory for the language, so interpreted, is then easily constructed.  Lemmas 6.1 and 6.2 give us a first indication of why it is worth disentangling syntax from the object-theory. Together, they imply that there is a materially adequate truth-theory for the language of arithmetic that is as weak as it could plausibly be: It is interpretable in Q. If, on the other hand, we develop our truth-theory in the usual way, where syntax and the object-theory are intertwined, then the weakest materially adequate truth-theory is QseqT . And it follows from Theorem 5.3 that QseqT is not interpretable in Q. So Lemma 6.1 is an improvement on Theorem 5.3.50 6.3. The Strength of Compositional Truth-theories, and the Weakness of Disquotational Ones. We know, then, from Lemma 6.2, that Q TL is very weak. Unfortunately, however, this does not really help us to characterize the strength of truth-theories. For one thing, the interpretation of Q TL in Q wreaks havoc on the meanings of the primitives of L: It 50These results have another sort of significance. If, as I am inclined to believe (Heck, 2005, 2007), a speaker's semantic competence consists in her tacitly knowing a truth-theory for her language, one might worry, for reasons similar to those mentioned in connection with the problem of 'essential richness', that this would credit ordinary speakers with far too much tacit knowledge. But knowing such a theory need involve no more than knowing Q TL , and the logical strength of that theory derives entirely from its syntactic component. 38 all but treats L as uninterpreted. How, then, might we force the truththeory to respect the meanings of L's primitives? One plausible answer is to require the interpretation to preserve some theory stated in L. Indeed, we might naturally interpret Tarski as taking the object-theory to play something like this role. (Though we do not need to suppose, as Tarski may have, that L must in any sense consist of 'meaning postulates'.)51 Moreover, the question how strong truth-theories are is best understood as the question: What does 'adding a truth-theory' give us, in terms of logical strength? That is, if we have some theory T and we 'add a truth-theory' to it, how strong is the resulting theory, compared to T itself? In our terminology, the question is thus how Q TL + T compares, in logical strength, to T . From this point of view, Lemma 6.2 concerns the special case where T is the null theory. As was explained in Section 2.1, there are different ways of comparing theories, so we can ask various sorts of questions about the relationship between Q TL + T and T . One question is whether Q TL + T is a conservative extension of T . And we have, in fact, already seen that it is: That is the result of Craig and Vaught's (1958) mentioned earlier, in a slightly different form. But there is a different, and ultimately more interesting, question we can also ask, namely, whether Q TL + T is interpretable in T . And to this question, the answer is "no", at least if T is finitely axiomatized. Theorem 6.3. Let T be a consistent theory in L. Then Q TL plus 'all axioms of T are true' proves the consistency of T on a cut. Corollary 6.4. Let T be a finitely axiomatized, consistent theory in L. Then Q TL + T proves the consistency of T on a cut and so is not interpretable in T .52 The proof of Theorem 6.3 is essentially the same as that of Theorem 5.3. Corollary 6.4 then follows from Corollary 4.7 and the obvious analogue of Corollary 3.3. It's worth emphasizing that the only role T plays in the proof is in allowing us to prove that all T 's axioms are true. The work is all done in Q TL . We also get an analogue of Corollary 5.6. Theorem 6.5. Let T be a finitely axiomatized, consistent theory in L. Then Q TL + T is mutually interpretable with Q+ Con(T ). Proof. That Q TL + T interprets Q + Con(T ) follows from Theorem 6.3. The other direction is just a minor modification of Theorem 5.5.  51These remarks are largely based upon observations due to John P. Burgess. 52Note that it follows that the finitely axiomatizable theory Craig and Vaught show is a conservative extension of T is not interpretable in T . 39 We thus see again that compositional truth-theories have at least some logical power: If we start with a finitely axiomatized theory T and add an absolutely minimal but still compositional theory of truth for the language of T-and add it in a way that is guaranteed not to 'infect' T itself-the result is a theory that is logically stronger than T in the sense that it is not interpretable in T .53 Even from a purely technical point of view, then, Q TL is an interesting theory. It is as weak as it can be, yet Q TL 'upGödels' any finitely axiomatized theory T that you care to give it.54 We can think of Pudlák's form of the incompleteness theorem as defining a map on theories: Given a consistent, sequential theory T containing Q, it hands us Q + Con(T ), which is guaranteed to be logically stronger than T in the sense that it is not interpretable in T . In effect, what we have found is that, for finitely axiomatized theories, Q TL can be used to define the same functor, modulo interpretabilty: Given a finitely axiomatized theory T in L, Q TL + T is mutually interpretable with Q+ Con(T ).55 By contrast, the T-sentences themselves have no logical power, even if we extend whatever induction scheme might happen to be available. Once again, getting a completely general version of this result is hard, because schemes can come in so many different forms. But if, as before, we focus just on the case of the usual hierarchy, then the claim can be stated precisely. And again, the results here significantly improve the corresponding results of Section 5.3, which is yet another reason to want to disentangle syntax from the object-theory. Definition. T D+A is the theory of truth for the language of arithmetic that is similar to T TA but, instead of containing a compositional theory of truth contains the T-sentences for A and extends the induction scheme to permit the presence of the truth-predicate. By essentially the same argument as for Theorem 5.12, we have: Proposition 6.6. ÎΣn D+A is locally interpretable in IΣn. It is clear that we thus also have: 53It would, I think, be well worth investigating such theories as ÎΣ1 TL . It's of course immediate that ÎΣ1 TL + T is not interpretable in T , since even Q TL + T isn't. But is there some nice characterization of exactly how strong ÎΣ1 TL + T is? In general, one would suppose it is stronger than Q TL + T ; surely it isn't interpretable in Q+ Con(T ). On the other hand, one would suppose that ÎΣ1 TL + T is weaker than ÎΣ1 T + L + T (for which, see below) and, in particular, that it does not interpret IΣ1 + Con(T ). So where precisely does it sit? And what of intermediate theories, like Î∆0 TL + T ? 54Thanks to Visser for the wonderful neologism. 55Note that if we use a theory of concatenation as our base theory, then this result is coding-free and so gives us a co-ordinate-free account of what consistency statements are, like the central result of Visser's paper on the second incompleteness theorem (Visser, 2009a, theorem 4.1). 40 Proposition 6.7. ÎΣn D+A + T is locally interpretable in IΣn + T . And so, in particular: Corollary 6.8. ÎΣn D+A + IΣm is locally interpretable in IΣk, where k = max(m,n). Proof. ÎΣn D+A + IΣm is locally interpretable in IΣn + IΣm, where these two theories are formulated in disjoint copies of the language of arithmetic. But IΣn + IΣm will obviously be interpretable in IΣmax(m,n).  So we get an analogue of Theorem 5.11. Corollary 6.9. PAD+A + PA is interpretable in PA. Proof. Any finite fragment of this theory is contained in one or another of the ÎΣn D+A + IΣm. So each finite fragment is interpretable in IΣn, for some n, and so in PA. That establishes local interpretability, and now we invoke Orey's Theorem.  We'll see shortly that something even stronger is true.56 6.4. Semantic Consistency Proofs, Again. We have seen that Q TL + T is not interpretable in T , because it proves the consistency of T on a cut. It does so because it proves the basis case and the induction step of the usual semantic proof of the consistency of T . That leaves us more or less where we were at the end of Section 5.1. The next question to ask, then, is what we need to add if we are to get a proof of the consistency of T . As we saw in Section 5.4, the answer is going to be something along the lines of 'induction for Σ1 formulae'. In the framework in which we were then working, this answer turned out to be correct but disappointing. It's true that (IΣ1)T+ proves the consistency of IΣ1, but it also proves the consistency of PA. The work we did in the last section puts us in a position to resolve this problem. What we need to add is, indeed, something along the lines of 'induction for Σ1 formulae'. But the problem that infected our earlier efforts has now been resolved: We can strengthen our theory of syntax without thereby strengthening the object-theory whose consistency we are trying to prove. Let me emphasize what this says about the role induction plays in semantic consistency proofs: The induction we need for the proof is a syntactic principle, not a number-theoretic one. It's a principle that has to do, at least in the application we need to make of it, with inductions on proofs; it has nothing to do with whatever the object-language happens to be about. This is obvious once stated, but the usual way of formulating truth-theories obscures the point. So now we need a definition paralleling that of T T+. 56What happens if we consider theories related to T D+A as T S+ is related to T D+? 41 Definition. T T + L is T TL with the induction axioms in T extended to permit semantic vocabulary and reference to assignments. As before, this definition isn't perfectly general. But we know how to apply it to the cases that matter here: ÎΣn T + L is really IΣn(Sem) TL , where Sem is the set of atomic formulae of the forms: Denα(t, x); Satα(x); T(x); and val(σ, x). It's important to appreciate that, in extending induction in this way, we are not extending it nearly as far as we might extend it. The only new induction axioms we are allowing are ones that contain the semantic predicates mentioned. For example, suppose that L is the language of set-theory, and consider the following formulae (which are chosen more or less randomly): ∃xDenσ(t, x) x ∈ y ∧ val(σ, v) = x ∧ Satσ(z) We do not have induction for such formulae in ÎΣn T + L , as I am understanding it. The first is ruled out because it contains the quantifier '∃x', which ranges not over numbers but over sets. The second is ruled out because it contains the predicate ∈. I am not terribly happy about this restriction. By imposing it, we force ourselves to operate with what might seem like an unnaturally weak theory, and the significance of the results we shall prove about what it can or, more importantly, cannot do might therefore be questioned. It's my hope that there will prove to be some natural way of loosening these restrictions and allowing induction over the formulae mentioned, and others of their general kind, without adding (significant?) strength to ÎΣn T+L . Part of the difficulty is that it is hard to know how to integrate quantifiers over sets (or, more generally, whatever the object-language talks about) into our measure of logical complexity: What sorts of formulae count as Σn, in the relevant sense, if these formulae may contain quantifiers over both numbers and sets? But let us leave such questions aside for now. It is clear that we can now adapt the arguments given in Section 5.4 to our new framework. In particular, we will be able to formalize a semantic proof of Con(T ) is ÎΣ1 T + L + T , where L of course is the language of T . So we have: Theorem 6.10. Let T be a theory in a finite relational language L, and suppose that ÎΣ1 T + L + T proves that all axioms of T are true. Then ÎΣ1 T+L + T ` Con(T ). The restriction to relational languages is essential, because, we cannot prove that every term has a denotation. The obvious way to do so would be by induction on ∃xDenσ(t, x), but, for the reasons just mentioned, we 42 do not have induction for this predicate in ÎΣ1 T + L . As earlier, however, the restriction can then be lifted, since IΣ1 will prove that a theory T stated in a non-relational language can always be interpreted in its relational counterpart TR and so that, if Con(TR), then Con(T ). So we have: Corollary 6.11. Let T be a finitely axiomatized theory in a finite language L. Then ÎΣ1 T + L + T ` Con(T ). There is thus a sense in which Corollary 5.16, though surprising in a way, is in another way natural. The syntax needed to carry out semantic consistency proofs is no more than can be formalized in IΣ1. If Corollary 5.16 seems surprising, it is because one might have thought we needed to assume the axioms of PA in order to able to prove that all the axioms of PA are true. Well, we don't. As Tarski himself put it, we need not assume "axioms which have the same meaning as the axioms of the science under investigation", but only ones that "suffice. . . for the establishment of all sentences having the same meaning as the theorems of the science being investigated" (Tarski, 1958, p. 211). And it turns out that assuming extended Σ1 induction is assuming axioms that "suffice. . . for the establishment of all" axioms of PA. What we want to see now, then, is that our way of disentangling syntax from the object-language really does solve the problem Corollary 5.16 revealed. What we would like to be able to show is that, although ÎΣ1 T + L + IΣ1 proves Con(IΣ1), it does not prove Con(PA) or even Con(IΣ2). To show this, we will establish a sort of converse of Corollary 6.11. Theorem 6.12. Let T be a finitely axiomatized theory in a finite language. Then ÎΣ1 T + L + T is interpretable in IΣ1 + Con(T ). Before we begin the proof, let me note a couple important corollaries. Corollary 6.13. Let T be a finitely axiomatized theory in a finite language. Then ÎΣ1 T + L + T is mutually interpretable with IΣ1 + Con(T ). This follows immediately, since ÎΣ1 T + L + T contains IΣ1 + Con(T ). We can generalize yet further. Corollary 6.14. Let T be a finitely axiomatized theory in a finite language. Then, if n ≥ 1, ÎΣn T + L + T is mutually interpretable with IΣn + Con(T ). We'll prove Corollary 6.14 after we prove Theorem 6.12. We are going to need a version of the so-called arithmetized completeness theorem (Hájek and Pudlák, 1993, pp. 104–5), which is provable in IΣ1. There are two different ways one often sees this theorem stated, and the proof of Theorem 6.12 rests upon the way these two statements of it relate to one another. 43 Theorem 6.15 (Arithmetized Completeness Theorem). Let T be a recursively axiomatized theory. Then: (1) IΣ1 + Con(T ) interprets T . (2) IΣ1 + Con(T ) proves that T has a model, one whose complexity is what Hajék and Pudlák call low Σ∗0(Σ1), or LL1. By a 'model' here is meant precisely what one would think is meant: A certain sort of set, arithmetically coded, of course.57 The model is understood to come with a corresponding compositional truth-theory, that is, with notions of denotation, satisfaction, and truth for which the usual Tarskian clauses can be proved, and of course sequences will serve to code the theory of assignments.58 That the model is a model of T amounts to its being provable, in IΣ1 + Con(T ), that the axioms of T are, in the sense of truth associated with the model, true, that is, that they are true in the model. I am not going to attempt to explain what 'low Σ∗0(Σ1)' means. It doesn't really matter for our purposes-and, frankly, I don't really understand it very well.59 I will explain why the complexity of the model matters-note that it is independent of T-and why its being LL1 is enough for the proof of Theorem 6.12. Proof of Theorem 6.12. If we are going to intepret ÎΣ1 T + L + T in IΣ1 + Con(T ), we need to deal with three things: (i) T ; (ii) the semantic theory for L, including the theory of assignments; and (iii) the underlying syntax, IΣ1. A significant part of the last will be no problem, since we already have IΣ1 available. But we will need to make sure that we can prove the extended induction axioms. We'll deal with that last. The arithmetized completeness theorem tells us that IΣ1 + Con(T ) can give us (i) and (ii): It interprets T , and it gives us a a model for T , with which we get a semantics for L. But these aren't enough by themselves: We need to make sure that they fit together the right way. To see why, suppose T is Q. Then '0' is a term, and among the axioms of 57It is not widely appreciated among philosophers how much set theory can be coded even in very weak theories of arithmetic. Everyone knows that PA is capable of talking about finite sets of numbers, but PA can also talk about lots of infinite sets, too. This is because, even though PA cannot define truth for the whole of the language of arithmetic, it can define truth for ever larger fragments. In particular, there is a Σn sentence Satn,σ(x) such that IΣ1 proves the Tarski clauses for Σn formulae and therefore proves, for each Σn formula A(x) the Sat-sentence: Satn,σ(pA(v0)q) ≡ A(val(σ, 0)). One can therefore use Σn formulae as codes for Σn-definable sets when working in IΣ1 (Hájek and Pudlák, 1993, §I.1(d), esp. p. 60, Remark 1.80). 58Note that this works because the model we get is, obviously, one in the natural numbers (as IΣ1 understands them), and this is true even if T is, say, ZFC. 59The definition is on p. 85 of Hajék and Pudlák's book, for those who would like to explore it. Thanks to Ali Enayat for making it a little clearer to me. 44 ÎΣ1 T+L +Q that we need to interpret are these two: ∀x(0 6= Sx) Denα(p0q, 0) The first comes from Q itself; the second, from the semantics. The point to note is that the term '0' occurs in both of these and so must be interpreted the same way both times, or at least in ways that are compatible. The mere fact that IΣ1 + Con(Q) both interpets Q and gives us a semantics for the language of Q doesn't guarantee that. For all we know so far, the former could interpret '0' as 'S0' while the latter told us that '0' denotes 0. This needn't happen, however, because the two versions of the arithmetized completeness theorem are closely related. It is really the second that is more fundamental. The way you get an interpretation of T once you have a model of T is the same way you can always get an interpretation of T once you have a model of T : You just interpret it the way the model tells you to interpret it. So if the model tells you that some term t denotes u, you translate t as 'u'. If the model tells you that some predicate R(x, y) has as its extension the set S, then you translate R(x, y) as meaning: <x, y> ∈ S.60 And, of course, you restrict the quantifiers to the domain of the model. The fact that T is provably true in the model will then imply that T 's axioms, so translated, are provably true. Which means that we've successfully interpreted T . What this means in our case is that the interpretation and the model do 'fit together in the right way'. If the semantic theory says that '0' denotes S0, then the interpretation of '0' will be 'S0'. Some fiddling may be necessary here and there to get everything completely in sync, but this is merely tedious. So that takes care of the interpretation of T and the interpretation of the semantics for L. What's left is (iii), the underlying syntax, IΣ1. As noted earlier, much of that is trivial, since we're working in IΣ1 + Con(T ) and so have IΣ1 readily available. So if we were just trying to interpret ÎΣ1 TL+T , we'd be done. What we're actually trying to interpret, however, is ÎΣ1 T + L + T , and so what we lack at this point-all we lack-is a demonstration that the extended induction axioms can be proven in IΣ1 + Con(T ), given the interpretation of T , and of the semantics for L, that we've already got. It is here, then, that we need to make use of what we know about the complexity of the model and, in particular, of its associated notions of denotation, satisfaction, and truth. If the formula we were using to interpret Satα(x) were, say, Σ2, then we'd have no hope whatsoever of 60Note that this is all intensional: In the theory in which we are working, we'll be given the extension of R(x, y) in a certain way, that is, by means of a certain formula; and we then use that very formula to construct the translation of R(x, y). 45 proving the translations of induction axioms containing Satα(x) in IΣ1.61 But we know that Satα(x) and its friends are LL1. The induction axioms we're trying to prove are, therefore, of the form ∃xφ(x), where φ(x) is built from atomic arithmetical formulae and the translations of our atomic semantic formulae: Denα(t, x); Satα(x); T(x); and val(σ, x). Since these are at worst LL1, the induction axioms we're trying to prove are Σ1(LL1). And it just so happens that IΣ1 proves induction for Σ1(LL1) formulae (Hájek and Pudlák, 1993, p. 85, lemma 2.78).  It's just beautiful the way this works out: LL1 is precisely what the model needs to be for that last step to work. Proof of Corollary 6.14. There are various ways of proving this. One is to note that, in IΣ2, we get a better bound on the complexity of the model: It's low ∆2. So then the question is whether IΣn proves induction for Σn(∆2) formulae, when n ≥ 2. It does (Hájek and Pudlák, 1993, p. 82, theorem 2.67).  It's a nice question whether this also extends to PA-that is, to the case where PA is our theory of syntax, rather than the object-theory. We clearly have this: Corollary 6.16. Let T be a finitely axiomatized theory in a finite language. Then PA T + L + T is mutually locally interpretable with PA + Con(T ). Proof. Each finite fragment of PA + Con(T ) is contained in one of the IΣn + Con(T ), which is interpretable in ÎΣn T + L + T and so in PA T + L + T . The converse is similar.  The reflexivity of PA entails that of PA + Con(T ),62 so PA T + L + T is globally interpretable in PA+ Con(T ), by Orey's Theorem. It is not at all obvious, however-to me, anyway-that PA T + L + T must be reflexive. It would be nice if it was, though, since then we could remove "locally" from Corollary 6.16. 6.5. Peano Arithmetic Is a Special Case (II). I've remarked several times now that PA is in certain respects unrepresentative. We're now in a position to see another way in which that is so. Corollary 6.17. ÎΣm T + L + PA is interpretable in PA. 61Note, though, that it's nonetheless clear that this is going to work at some level or other, given that the complexity of the model is independent of T . In the situation just mentioned, for example, we'd be perfectly fine at IΣ2. 62Mostowski shows that every extension of PA that does not expand the language is reflexive. 46 Proof. Any finite fragment of ÎΣm T + L + PA is contained in one of the theories: ÎΣm T + L + IΣn and so by Corollary 6.14 is interpretable in IΣm + Con(IΣn). But PA, being reflexive, contains every such theory. So every finite fragment of ÎΣm T + L + PA is interpretable in PA, which shows that ÎΣm T + L +PA is locally interpretable in PA. Now invoke Orey's Theorem.  Corollary 6.18. PA T + L + PA is interpretable in PA. Proof. Any finite fragment of PA T + L + PA is contained in one of the theories: ÎΣm T + L +PA. So PA T + L +PA is locally interpretable in PA and so is interpretable in PA.  On the other hand: Corollary 6.19. PA T + L+ plus 'all axioms of PA are true' proves Con(PA). Indeed, ÎΣ1 T + L plus 'all axioms of PA are true' proves Con(PA). Proof. From Theorem 6.10.  It follows, obviously, that PA T + L + PA does not prove that all axioms of PA are true. What this means is that, once we have disentangled the syntax from the object-language, the 'happy accident' that permits PAT+ to prove Con(PA) is revealed as something more like a dirty trick. It is only because of the interaction between the extended induction principle and the theory whose consistency we are trying to prove that PAT+ proves Con(PA). The combination of Corollary 6.18 and Corollary 6.19 is notable for another reason, as well. Deflationists about truth typically hold that the only legitimate use for the truth-predicate is as a 'device of generalization'. Precisely what that is supposed to mean has never been made terribly clear. But one thing one might have thought it meant, or at least implied, was something like: Assuming we have the T-sentences for some language L, then a theory consisting of the sentences in some set S is in some natural sense equivalent to the theory containing the single statement "All sentences in S are true".63 Indeed, the one attempt known to me to explain what it might mean to "use the truth-predicate as a device of generalization" proceeds along precisely these lines (Halbach, 1999). Considered as additions to PA T + L , however, or even as additions to ÎΣ1 T + L , the two theories consisting of the axioms of PA, on the one hand, and the single statement "All axioms of PA are true", on the other, 63It sometimes seems to be supposed that the fact that the truth-predicate is a 'device of generalization' is suitably explained by the fact that the truth-predicate allows us to form sentences like the one just mentioned, which is of course a generalization. But "All axioms of PA contain the symbol '→'" is a generalization, too. Every predicate allows us to form generalizations we could not form without it. In what sense, then, is the truth-predicate supposed just to be a 'device of generalization'? 47 have very different logical properties: The latter is a lot stronger than the former. That's not to say, of course, that there's not some other way of explaining what it means to 'use the truth predicate merely as a device of generalization'. But I don't know what that would be. 7. CONCLUDING PHILOSOPHICAL REFLECTIONS Taken together, the results proven in Section 6.4 show that, where T is a finitely axiomatized theory in L, Û T + L + T is mutually interpretable with U + Con(T ), for a very wide range of choices for U . This holds, in particular, for Q, for IΣn, so long as n ≥ 1, and even for PA (though in this last case we have only mutual local interpretability). It would be nice to know if more cases could be added to the list. I do not know whether Î∆0 T + L + T is mutually interpretable with I∆0 + Con(T ). But even without an answer to that question, or even if the answer turns out to be negative, the fact that these theories are mutually interpretable in so many cases gives us reason to suppose that the connection between truth-theories and consistency-statements we have been exploring is robust. More precisely, what it shows is that a compositional truththeory amounts to a kind of abstract consistency statement: If you build a truth-theory for L on top of an appropriate syntax S, and then hand it a finitely axiomatized theory T in the language it concerns, it hands you back S plus the consistency statement for T . This is despite the fact that, as we have seen, there is another sense in which a compositional truth-theory adds nothing at all to the underlying syntax: ÎΣ1 T + L is interpretable in IΣ1, by the same argument as given for Lemma 6.2.64 What that shows is that one needs to be very careful about how one measures the strength of truth-theories. This observation is relevant to another issue that comes up in the literature on deflationism. Facts about what happens when one adds various semantic assumptions to PA play a critical role in the discussion of the so-called conservativeness argument, championed by Stewart Shapiro (1998) and Jeffrey Ketland (1999). The argument emerges from the thought that a deflationary truth-predicate, being in some sense 'insubstantial', ought not to allow us to prove anything we cannot prove without it. That is: UT+ ought to be a conservative extension of U . But, of course, PAT+ proves Con(PA) and so isn't a conservative extension of PA. In his response to this argument, Hartry Field places very heavy weight upon the fact that the non-conservativity result depends essentially upon the presence of the new induction axioms and is not due 64Indeed, it would appear that, so long as T ⊇ Q, T T + L is going to be interpretable in T . 48 simply to the presence of a compositional truth-theory.65 In particular, if we add a compositional truth-theory to PA without extending induction, then the result is a conservative extension of PA (Parsons, 1981, pp. 213–15). And so Field writes: Since truth can be added in ways that produce a conservative extension. . . , there is no need to disagree with Shapiro when he says that "conservativeness is essential to deflationism". . . . Shapiro's position, however, is that a deflationist must hold that adding 'true' to number theory in the full-blooded way that involves [extending the induction axioms also] produces a conservative extension. (1999, p. 536) Field goes on to argue that a deflationist need hold no such thing. At most, the deflationist should hold that the principles that are "essential to truth"-that flow from its disquotational nature-are conservative over number theory. But, Field claims, the induction principles are not "essential to truth" in the relevant sense: Their truth flows not from the nature of truth, but from the nature of the natural numbers. They are not semantical but arithmetical in character. I more or less agree with this last point, though what Field ought to have said is that the induction axioms are syntactic in character, not arithmetical. What our discussion here shows, however, is that Field's emphasis on induction is misplaced. In what seems to me the crucial passage, Field quotes Shapiro's (1998, p. 499) question: "How thin can the notion of arithmetical truth be if, by invoking it, we can learn more about the natural numbers?" and then replies: . . . [T]he way in which we "learn more about the natural numbers by invoking truth" is that in having that notion we can rigorously formulate a more powerful arithmetical theory than we could rigorously formulate before. There is nothing very special about truth here: using any other notion not expressible in the original language we can get new instances of induction, and in many cases these lead to nonconservative extensions. (Field, 1999, p. 536) There are two respects in which this is at best misleading. What does Field mean by "using [a] notion not expressible in the original language"? The natural way to read him would be as talking about definability, about what happens if we add a new predicate that allows us to define a set not definable in the original language. In that case, Field would be saying this: 65See also Halbach's (2001b) treatment of the argument. 49 Base: PA No New Induction Extend Induction Add the T-sentences Conservative Interpretable PAD+ Conservative Interpretable Add a fully compositional truth-theory PAT Conservative Interpretable PAT+ Non-conservative Not Intepretable TABLE 1. Some Mathematical Facts If we add a new predicate that defines a set not definable in the original language, we can get new instances of induction, which may lead to new theorems in the original language. That is of course right. We can get new instances that may lead to new theorems. But the case of the truth-predicate is precisely not one of those cases. Tarski's Theorem tells us that the set of truths of the language of PA is not definable in the language of arithmetic. This has nothing to do with whether we add a fully compositional truth-theory, as opposed just to adding just the T-sentences. Either way, we will be able to define a set we could not previously define: It will be defined by T(x). Let me say that again. If we add a truth-predicate T(x) to the language of PA and extend PA by adding the T-sentences, then that is enough to guarantee that T(x) defines the set of truths of the language of PA and so defines a set not definable in the original language. But even if we extend induction, the result is still a conservative extension of PA. The moral, then, is supposed to be this: The non-conservativity result is not due just the presence of "new instances of induction" formulated using a "notion not expressible in the original language". The presence of a fully compositional truth-theory is essential. Indeed, what I should like to say is that what is most responsible for the non-conservativity result is the compositional truth-theory, not the extension of induction. When truth-theories are considered simply as additions to PA, it is essentially impossible to disentangle the contributions being made by the truth-theory, on the one hand, and the extension of induction, on the other. The mathematical facts are summarized in table 1. So long as we do not both add a compositional truth-theory and extend the induction scheme, the resulting theory is conservative over PA and interpretable in PA. How are we supposed to choose whom to blame, then? Does it even make sense to blame one rather than the other? We have seen, however, that PA is a special case. We have also seen that the usual way of formalizing truth-theories can lead to peculiar 50 Base: U + T No New Induction Extend Induction Add the T-sentences Locally Interpretable Û D+A + T Locally Interpretable Add a fully compositional truth-theory Û TL + T Not interpretable Û T + L + T Not interpretable, and stronger still TABLE 2. Some Other Mathematical Facts (U = IΣn, PA) phenomena. And if we now look again, with these lessons in mind- if we focus not on conservativity but on interpretability;66 if we make sure the object-theory is finitely axiomatized; and if we disentangle the syntax from the object-theory-then the facts, summarized in table 2, look very different.67 What we see is that adding the compositional principles results in an increase in logical strength whether or not we extend the induction axioms.68 Extending the induction axioms, on the other hand, results in an increase in strength only in the presence of a fully compositional truth-theory. That suggests, to me, anyway, that it is the compositional truth-theory that is doing the work here. By themselves, of course, these observations do not pose a serious problem for anyone, deflationists included. They do, however, make it clear that, as a matter of mathematical fact, the compositional principles have substantial logical strength. That makes it worth asking how deflationists intend to earn a right to them: The various principles comprising a compositional truth-theory cannot be regarded as a collection of trivialities on the order of the T-sentences.69 But that issue, I shall have to discuss on another occasion.70 66Do the conservativity results still hold when we consider weaker theories? Is IΣ1T a conservative extension of IΣ1? The proof of this, in the case of PA, is far from trivial, and I have no idea whether it works for weaker theories. 67I'm assuming here, of course, that T is not a theory whose consistency is independently provable in IΣ1, e.g., Q: The methods used above show that ÎΣ1 T + L + Q is interpretable in IΣ1, since it is interpretable in IΣ1 + Con(Q), and IΣ1 proves Con(Q). 68Since these results hold even with U = PA, we do not need to restrict the syntax in any way to get these sorts of results. What matters is that the object-theory should be finitely axiomatized. But that is simply because we can't hope to prove that all the theory's axioms are true otherwise. 69In my view (Heck, 2004), the T-sentences themselves are not trivialities, either, but this is a separate issue. 70Thanks to Volker Halbach and Jeff Ketland for conversations early in the history of this paper, and to Josh Schechter for conversations later on, that helped greatly. Comments on a draft of the paper from Cezary Cieśliński and Ali Enayat did much to improve it. A talk based upon the paper was given at a conference on philosophical logic, organized by Delia Graff Fara and held at Princeton University in April 2009. The paper 51 REFERENCES Boolos, G. (1993). The Logic of Provability. New York, Cambridge University Press. 9 Burgess, J. P. (2005). Fixing Frege. Princeton NJ, Princeton University Press. 7, 16, 17 Cieśliński, C. (2009). Truth, conservativeness, and provability. Forthcoming in Mind. 27 Craig, W. and Vaught, R. L. (1958). 'Finite axiomatizability using additional predicates', Journal of Symbolic Logic 23: 289–308. 36, 37, 38, 39 DeVidi, D. and Solomon, G. (1999). 'Tarski on 'essentially richer' metalanguages', Journal of Philosophical Logic 28: 1–28. 12 Feferman, S. (1960). 'Arithmetization of metamathematics in a general setting', Fundamenta Mathematicae 49: 35–92. 3, 5, 19 Field, H. (1999). 'Deflating the conservativeness requirement', Journal of Philosophy 96: 533–40. 49 Grzegorczyk, A. (2005). 'Undecidability without arithmetization', Studia Logica 79: 163–230. 35 Hájek, P. and Pudlák, P. (1993). Metamathematics of First-order Arithmetic. New York, Springer-Verlag. 7, 15, 16, 19, 27, 30, 43, 44, 46 Halbach, V. (1999). 'Disquotationalism and infinite conjunction', Mind 108: 1–22. 47 was also discussed at a meeting of the New England Logic and Langauge Colloquium in May 2011, and it was presented at the Philosophy of Mathematics Seminar at Oxford University, also in May 2011, and at a meeting of the Logic Group at the University of Connecticut in April 2012. Thanks to everyone present for their questions and comments, especially J. C. Beall, John P. Burgess, Hartry Field, Volker Halbach, Daniel Isaacson, Charles Parsons, Agustín Rayo, and Lionel Shapiro. Special thanks to my commentator at Princeton, Josh Dever, whose comments did a lot to improve the presentation. I owe the greatest debt, however, to Albert Visser. This paper simply could not have been written but for his generous assistance. Just as my ideas were starting to come together, Albert helped me to disentangle different threads in what I was trying to do. Once we'd managed that, he then made a series of observations based upon work he was doing at the time (Visser, 2009a,b) that transformed the direction of my project: These appear here as Theorem 5.3 and Corollary 6.4. And, finally, Albert has generously, and patiently, answered question after question after question as I've struggled to make sure all the details were right. So thanks, Albert! I've learned a ton. Given Albert's extensive influence on this paper, it is perhaps worth my saying a word about where I take my own contribution to lie. A few of the key results are mine: Corollary 6.13 and Corollary 6.18, for example; so is the proof of Theorem 5.3, in particular, the way it resolves the problems posed by the logical axioms, which Albert and I independently discovered had been neglected in the existing literature. But I take my main contribution to lie in the general approach taken here: The idea that we should investigate truth-theories over weak base theories; the realization that, if we proceed in the usual way, then this investigation doesn't go as one might have hoped; the consequent suggestion that we ought to disentangle syntax from the object-theory; and the realization that this investigation goes a good deal better, indeed, almost as well as one could have hoped. 52 (2001a). 'Disquotational truth and analyticity', Journal of Symbolic Logic 66: 1959–1973. 34 (2001b). 'How innocent is deflationism?', Synthese 126: 167–94. 49 Heck, R. G. (2004). 'Truth and disquotation', Synthese 142: 317–52. 51 (2005). 'Reason and language', in C. MacDonald and G. MacDonald (eds.), McDowell and His Critics. Oxford, Blackwells, 22–45. 38 (2007). 'Meaning and truth-conditions', in D. Greimann and G. Siegwart (eds.), Truth and Speech Acts: Studies in the Philosophy of Language. New York, Routledge, 349–76. 38 Ketland, J. (1999). 'Deflationism and Tarski's paradise', Mind 108: 69–94. 48 Kleene, S. (1952). 'Finite axiomatizability of theories in the predicate calculus using additional predicate symbols', Memoirs of the American Mathematical Society 10: 27–68. 14, 38 Kotlarski, H. (1986). 'Bounded induction and satisfaction classes', Zeitschrift für Mathematische Logik 32: 531–544. 30, 32 Mostowski, A. (1952). 'On models of axiomatic systems', Fundamenta Mathematicae 39: 133–58. 5 Nelson, E. (1986). Predicative Arithmetic. Mathematical Notes 32. Princeton NJ, Princeton University Press. 7 Parsons, C. (1981). 'Sets and classes', in Mathematics in Philosophy. Ithaca NY, Cornell University Press, 209–20. 49 Pudlák, P. (1985). 'Cuts, consistency statements and interpretations', Journal of Symbolic Logic 50: 423–41. 18 Quine, W. V. O. (1946). 'Concatenation as a basis for arithmetic', Journal of Symbolic Logic 11: 105–14. 35 Ray, G. (2005). 'On the matter of essential richness', Journal of Philosophical Logic 34: 433–57. 12, 13 Shapiro, S. (1998). 'Proof and truth: Through thick and thin', Journal of Philosophy 95: 493–521. 48, 49 Tarski, A. (1944). 'The semantic conception of truth and the foundations of semantics', Philosophy and Phenomenological Research 4: 341–75. 1, 12 (1958). 'The concept of truth in formalized languages', in J. Corcoran (ed.), Logic, Semantics, and Metamathematics. Indianapolis, Hackett, 152–278. 13, 14, 35, 43 Tarski, A., Mostowski, A., and Robinson, A. (1953). Undecidable Theories. Amsterdam, North-Holland Publishing. 4, 5, 11, 35 Visser, A. (1991). 'The formalization of interpretability', Studia Logica 50: 81–105. 7 (2006). 'Categories of theories and interpretations'. Wellesley MA, A. K. Peters, 77–136. 5 (2008). 'Pairs, sets and sequences in first-order theories', Archive for Mathematical Logic 47: 299–326. 9, 12 53 (2009a). 'Can we make the second incompleteness theorem coordinate free?', Journal of Logic and Computation 21: 543–60. 19, 40, 52 (2009b). 'The predicative Frege hierarchy', Annals of Pure and Applied Logic 160: 129–53. 25, 52 Wang, H. (1952). 'Truth definitions and consistency proofs', Transactions of the American Mathematical Society 73: 243–275. 9, 11, 12, 30 Wilkie, A. J. and Paris, J. B. (1987). 'On the scheme of induction for bounded arithmetic formulas', Annals of Pure and Applied Logic 35: 261–302. 7 DEPARTMENT OF PHILOSOPHY, BROWN UNIVERSITY, PROVIDENCE RI 02912