Incompleteness and Computability An Open Introduction to Gödel's Theorems F19 Incompleteness and Computability The Open Logic Project Instigator Richard Zach, University of Calgary Editorial Board Aldo Antonelli,† University of California, Davis Andrew Arana, Université Paris I Panthénon–Sorbonne Jeremy Avigad, Carnegie Mellon University Tim Button, University College London Walter Dean, University of Warwick Gillian Russell, University of North Carolina Nicole Wyatt, University of Calgary Audrey Yap, University of Victoria Contributors Samara Burns, University of Calgary Dana Hägg, University of Calgary Zesen Qian, Carnegie Mellon University Incompleteness and Computability An Open Introduction to Gödel's Theorems Remixed by Richard Zach Fall 2019 The Open Logic Project would like to acknowledge the generous support of the Taylor Institute of Teaching and Learning of the University of Calgary, and the Alberta Open Educational Resources (ABOER) Initiative, which is made possible through an investment from the Alberta government. Cover illustrations by Matthew Leadbeater, used under a Creative Commons Attribution-NonCommercial 4.0 International License. Typeset in Baskervald X and Nimbus Sans by LATEX. This version of Incompleteness and Computability is revision fb07d66 (2019-11-11), with content generated from Open Logic Text revision 1cdcec1 (2019-11-09). Free download at: https://ic.openlogicproject.org/ Incompleteness and Computability by Richard Zach is licensed under a Creative Commons Attribution 4.0 International License. It is based on The Open Logic Text by the Open Logic Project, used under a Creative Commons Attribution 4.0 International License. Contents About this Book x 1 Introduction to Incompleteness 1 1.1 Historical Background . . . . . . . . . . . . . . . 1 1.2 Definitions . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Overview of Incompleteness Results . . . . . . . . 14 1.4 Undecidability and Incompleteness . . . . . . . . 16 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2 Recursive Functions 20 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . 20 2.2 Primitive Recursion . . . . . . . . . . . . . . . . . 21 2.3 Composition . . . . . . . . . . . . . . . . . . . . . 24 2.4 Primitive Recursion Functions . . . . . . . . . . . 26 2.5 Primitive Recursion Notations . . . . . . . . . . . 30 2.6 Primitive Recursive Functions are Computable . . 30 2.7 Examples of Primitive Recursive Functions . . . . 31 2.8 Primitive Recursive Relations . . . . . . . . . . . . 35 2.9 Bounded Minimization . . . . . . . . . . . . . . . 38 2.10 Primes . . . . . . . . . . . . . . . . . . . . . . . . 39 2.11 Sequences . . . . . . . . . . . . . . . . . . . . . . 40 2.12 Trees . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.13 Other Recursions . . . . . . . . . . . . . . . . . . 45 v vi CONTENTS 2.14 Non-Primitive Recursive Functions . . . . . . . . . 47 2.15 Partial Recursive Functions . . . . . . . . . . . . . 49 2.16 The Normal Form Theorem . . . . . . . . . . . . 51 2.17 The Halting Problem . . . . . . . . . . . . . . . . 52 2.18 General Recursive Functions . . . . . . . . . . . . 54 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3 Arithmetization of Syntax 58 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . 58 3.2 Coding Symbols . . . . . . . . . . . . . . . . . . . 60 3.3 Coding Terms . . . . . . . . . . . . . . . . . . . . 62 3.4 Coding Formulas . . . . . . . . . . . . . . . . . . 65 3.5 Substitution . . . . . . . . . . . . . . . . . . . . . 66 3.6 Derivations in Natural Deduction . . . . . . . . . 67 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4 Representability in Q 76 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . 76 4.2 Functions Representable in Q are Computable . . 79 4.3 The Beta Function Lemma . . . . . . . . . . . . . 80 4.4 Simulating Primitive Recursion . . . . . . . . . . . 85 4.5 Basic Functions are Representable in Q . . . . . . 86 4.6 Composition is Representable in Q . . . . . . . . 89 4.7 Regular Minimization is Representable in Q . . . 91 4.8 Computable Functions are Representable in Q . . 96 4.9 Representing Relations . . . . . . . . . . . . . . . 97 4.10 Undecidability . . . . . . . . . . . . . . . . . . . . 98 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5 Incompleteness and Provability 101 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . 101 5.2 The Fixed-Point Lemma . . . . . . . . . . . . . . 103 5.3 The First Incompleteness Theorem . . . . . . . . 106 vii CONTENTS 5.4 Rosser's Theorem . . . . . . . . . . . . . . . . . . 108 5.5 Comparison with Gödel's Original Paper . . . . . 110 5.6 The Derivability Conditions for PA . . . . . . . . 111 5.7 The Second Incompleteness Theorem . . . . . . . 112 5.8 Löb's Theorem . . . . . . . . . . . . . . . . . . . . 115 5.9 The Undefinability of Truth . . . . . . . . . . . . . 118 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6 Models of Arithmetic 123 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . 123 6.2 Reducts and Expansions . . . . . . . . . . . . . . 124 6.3 Isomorphic Structures . . . . . . . . . . . . . . . . 125 6.4 The Theory of a Structure . . . . . . . . . . . . . 128 6.5 Standard Models of Arithmetic . . . . . . . . . . . 129 6.6 Non-Standard Models . . . . . . . . . . . . . . . . 132 6.7 Models of Q . . . . . . . . . . . . . . . . . . . . . 133 6.8 Models of PA . . . . . . . . . . . . . . . . . . . . 136 6.9 Computable Models of Arithmetic . . . . . . . . . 140 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7 Second-Order Logic 146 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . 146 7.2 Terms and Formulas . . . . . . . . . . . . . . . . . 147 7.3 Satisfaction . . . . . . . . . . . . . . . . . . . . . . 149 7.4 Semantic Notions . . . . . . . . . . . . . . . . . . 152 7.5 Expressive Power . . . . . . . . . . . . . . . . . . 153 7.6 Describing Infinite and Countable Domains . . . 154 7.7 Second-order Arithmetic . . . . . . . . . . . . . . 156 7.8 Second-order Logic is not Axiomatizable . . . . . 159 7.9 Second-order Logic is not Compact . . . . . . . . 159 7.10 The Löwenheim-Skolem Theorem Fails for Second-order Logic . . . . . . . . . . . . . . . . . 160 7.11 Comparing Sets . . . . . . . . . . . . . . . . . . . 161 7.12 Cardinalities of Sets . . . . . . . . . . . . . . . . . 163 viii CONTENTS 7.13 The Power of the Continuum . . . . . . . . . . . . 164 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 168 8 The Lambda Calculus 169 8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . 169 8.2 The Syntax of the Lambda Calculus . . . . . . . . 171 8.3 Reduction of Lambda Terms . . . . . . . . . . . . 172 8.4 The Church-Rosser Property . . . . . . . . . . . . 173 8.5 Currying . . . . . . . . . . . . . . . . . . . . . . . 174 8.6 Lambda Definability . . . . . . . . . . . . . . . . . 175 8.7 λ -Definable Arithmetical Functions . . . . . . . . 177 8.8 Pairs and Predecessor . . . . . . . . . . . . . . . . 179 8.9 Truth Values and Relations . . . . . . . . . . . . . 180 8.10 Primitive Recursive Functions are λ -Definable . . 182 8.11 Fixpoints . . . . . . . . . . . . . . . . . . . . . . . 184 8.12 Minimization . . . . . . . . . . . . . . . . . . . . . 188 8.13 Partial Recursive Functions are λ -Definable . . . . 190 8.14 λ -Definable Functions are Recursive . . . . . . . . 190 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 191 A Derivations in Arithmetic Theories 193 B First-order Logic 201 B.1 First-Order Languages . . . . . . . . . . . . . . . 201 B.2 Terms and Formulas . . . . . . . . . . . . . . . . . 203 B.3 Free Variables and Sentences . . . . . . . . . . . . 206 B.4 Substitution . . . . . . . . . . . . . . . . . . . . . 208 B.5 Structures for First-order Languages . . . . . . . . 210 B.6 Satisfaction of a Formula in a Structure . . . . . . 212 B.7 Variable Assignments . . . . . . . . . . . . . . . . 217 B.8 Extensionality . . . . . . . . . . . . . . . . . . . . 221 B.9 Semantic Notions . . . . . . . . . . . . . . . . . . 223 B.10 Theories . . . . . . . . . . . . . . . . . . . . . . . 226 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 228 ix CONTENTS C Natural Deduction 231 C.1 Natural Deduction . . . . . . . . . . . . . . . . . . 231 C.2 Rules and Derivations . . . . . . . . . . . . . . . . 233 C.3 Propositional Rules . . . . . . . . . . . . . . . . . 234 C.4 Quantifier Rules . . . . . . . . . . . . . . . . . . . 235 C.5 Derivations . . . . . . . . . . . . . . . . . . . . . . 237 C.6 Examples of Derivations . . . . . . . . . . . . . . 238 C.7 Derivations with Quantifiers . . . . . . . . . . . . 243 C.8 Derivations with Identity predicate . . . . . . . . 247 C.9 Proof-Theoretic Notions . . . . . . . . . . . . . . . 249 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 252 D Biographies 253 D.1 Alonzo Church . . . . . . . . . . . . . . . . . . . . 253 D.2 Kurt Gödel . . . . . . . . . . . . . . . . . . . . . . 254 D.3 Rózsa Péter . . . . . . . . . . . . . . . . . . . . . . 256 D.4 Julia Robinson . . . . . . . . . . . . . . . . . . . . 258 D.5 Alfred Tarski . . . . . . . . . . . . . . . . . . . . . 260 Photo Credits 263 Bibliography 265 About the Open Logic Project 269 About this Book This is a textbook on Gödel's incompleteness theorems and recursive function theory. I use it as the main text when I teach Philosophy 479 (Logic III) at the University of Calgary. It is based on material from the Open Logic Project. As its name suggests, the course is the third in a sequence, so students (and hence readers of this book) are expected to be familiar with first-order logic already. (Logic I uses the text forall x: Calgary, and Logic II another textbook based on the OLP, Sets, Logic, Computation.) The material assumed from Logic II, however, is included as appendices B and C. Logic III is a thirteen-week course, meeting three hours per week. This is typically enough to cover the material in chapters 1 to 5 and either chapter 6 or chapter 8, depending on student interest. You may want to spend more time on the basics of first-order logic and especially on natural deduction, if students are not already familiar with it. Note that when provability in arithmetical theories (such as Q and PA) is discussed in the main text, the proofs of provability claims are not given using a specific proof system. Rather, that certain claims follow from the axioms by first-order logic is justified intuitively. However, appendix A contains a number of examples of actual natural deduction derivations from the axioms of Q . x xi Acknowledgments The material in the OLP used in chapters 1 to 5 and 8 was based originally on Jeremy Avigad's lecture notes on "Computability and Incompleteness," which he contributed to the OLP. I have heavily revised and expanded this material. The lecture notes, e.g., based theories of arithmetic on an axiomatic proof system. Here, we use Gentzen's standard natural deduction system (described in appendix C), which requires dealing with trees primitive recursively (in section 2.12) and a more complicated approach to the arithmetization of derivations (in section 3.6). The material in chapter 8 was also expanded by Zesen Qian during his stay in Calgary as a Mitacs summer intern. The material in the OLP on model theory and models of arithmetic in chapter 6 was originally taken from Aldo Antonelli's lecture notes on "The Completeness of Classical Propositional and Predicate Logic," which he contributed to the OLP before his untimely death in 2015. The biographies of logicians in appendix D and much of the material in appendix C are originally due to Samara Burns. Dana Hägg originally worked on the material in appendix B.

CHAPTER 1 Introduction to Incompleteness 1.1 Historical Background In this section, we will briefly discuss historical developments that will help put the incompleteness theorems in context. In particular, we will give a very sketchy overview of the history of mathematical logic; and then say a few words about the history of the foundations of mathematics. The phrase "mathematical logic" is ambiguous. One can interpret the word "mathematical" as describing the subject matter, as in, "the logic of mathematics," denoting the principles of mathematical reasoning; or as describing the methods, as in "the mathematics of logic," denoting a mathematical study of the principles of reasoning. The account that follows involves mathematical logic in both senses, often at the same time. The study of logic began, essentially, with Aristotle, who lived approximately 384–322 bce. His Categories, Prior analytics, and Posterior analytics include systematic studies of the principles of scientific reasoning, including a thorough and systematic study of the syllogism. Aristotle's logic dominated scholastic philosophy through the middle ages; indeed, as late as eighteenth century Kant main1 2 CHAPTER 1. INTRODUCTION TO INCOMPLETENESS tained that Aristotle's logic was perfect and in no need of revision. But the theory of the syllogism is far too limited to model anything but the most superficial aspects of mathematical reasoning. A century earlier, Leibniz, a contemporary of Newton's, imagined a complete "calculus" for logical reasoning, and made some rudimentary steps towards designing such a calculus, essentially describing a version of propositional logic. The nineteenth century was a watershed for logic. In 1854 George Boole wrote The Laws of Thought, with a thorough algebraic study of propositional logic that is not far from modern presentations. In 1879 Gottlob Frege published his Begriffsschrift (Concept writing) which extends propositional logic with quantifiers and relations, and thus includes first-order logic. In fact, Frege's logical systems included higher-order logic as well, and more. In his Basic Laws of Arithmetic, Frege set out to show that all of arithmetic could be derived in his Begriffsschrift from purely logical assumption. Unfortunately, these assumptions turned out to be inconsistent, as Russell showed in 1902. But setting aside the inconsistent axiom, Frege more or less invented modern logic singlehandedly, a startling achievement. Quantificational logic was also developed independently by algebraically-minded thinkers after Boole, including Peirce and Schröder. Let us now turn to developments in the foundations of mathematics. Of course, since logic plays an important role in mathematics, there is a good deal of interaction with the developments just described. For example, Frege developed his logic with the explicit purpose of showing that all of mathematics could be based solely on his logical framework; in particular, he wished to show that mathematics consists of a priori analytic truths instead of, as Kant had maintained, a priori synthetic ones. Many take the birth of mathematics proper to have occurred with the Greeks. Euclid's Elements, written around 300 B.C., is already a mature representative of Greek mathematics, with its emphasis on rigor and precision. The definitions and proofs in Euclid's Elements survive more or less in tact in high school geometry textbooks today (to the extent that geometry is still taught in 3 1.1. HISTORICAL BACKGROUND high schools). This model of mathematical reasoning has been held to be a paradigm for rigorous argumentation not only in mathematics but in branches of philosophy as well. (Spinoza even presented moral and religious arguments in the Euclidean style, which is strange to see!) Calculus was invented by Newton and Leibniz in the seventeenth century. (A fierce priority dispute raged for centuries, but most scholars today hold that the two developments were for the most part independent.) Calculus involves reasoning about, for example, infinite sums of infinitely small quantities; these features fueled criticism by Bishop Berkeley, who argued that belief in God was no less rational than the mathematics of his time. The methods of calculus were widely used in the eighteenth century, for example by Leonhard Euler, who used calculations involving infinite sums with dramatic results. In the nineteenth century, mathematicians tried to address Berkeley's criticisms by putting calculus on a firmer foundation. Efforts by Cauchy, Weierstrass, Bolzano, and others led to our contemporary definitions of limits, continuity, differentiation, and integration in terms of "epsilons and deltas," in other words, devoid of any reference to infinitesimals. Later in the century, mathematicians tried to push further, and explain all aspects of calculus, including the real numbers themselves, in terms of the natural numbers. (Kronecker: "God created the whole numbers, all else is the work of man.") In 1872, Dedekind wrote "Continuity and the irrational numbers," where he showed how to "construct" the real numbers as sets of rational numbers (which, as you know, can be viewed as pairs of natural numbers); in 1888 he wrote "Was sind und was sollen die Zahlen" (roughly, "What are the natural numbers, and what should they be?") which aimed to explain the natural numbers in purely "logical" terms. In 1887 Kronecker wrote "Über den Zahlbegriff" ("On the concept of number") where he spoke of representing all mathematical object in terms of the integers; in 1889 Giuseppe Peano gave formal, symbolic axioms for the natural numbers. The end of the nineteenth century also brought a new bold4 CHAPTER 1. INTRODUCTION TO INCOMPLETENESS ness in dealing with the infinite. Before then, infinitary objects and structures (like the set of natural numbers) were treated gingerly; "infinitely many" was understood as "as many as you want," and "approaches in the limit" was understood as "gets as close as you want." But Georg Cantor showed that it was possible to take the infinite at face value. Work by Cantor, Dedekind, and others help to introduce the general set-theoretic understanding of mathematics that is now widely accepted. This brings us to twentieth century developments in logic and foundations. In 1902 Russell discovered the paradox in Frege's logical system. In 1904 Zermelo proved Cantor's well-ordering principle, using the so-called "axiom of choice"; the legitimacy of this axiom prompted a good deal of debate. Between 1910 and 1913 the three volumes of Russell and Whitehead's Principia Mathematica appeared, extending the Fregean program of establishing mathematics on logical grounds. Unfortunately, Russell and Whitehead were forced to adopt two principles that seemed hard to justify as purely logical: an axiom of infinity and an axiom of "reducibility." In the 1900's Poincaré criticized the use of "impredicative definitions" in mathematics, and in the 1910's Brouwer began proposing to refound all of mathematics in an "intuitionistic" basis, which avoided the use of the law of the excluded middle (A ∨ ¬A). Strange days indeed! The program of reducing all of mathematics to logic is now referred to as "logicism," and is commonly viewed as having failed, due to the difficulties mentioned above. The program of developing mathematics in terms of intuitionistic mental constructions is called "intuitionism," and is viewed as posing overly severe restrictions on everyday mathematics. Around the turn of the century, David Hilbert, one of the most influential mathematicians of all time, was a strong supporter of the new, abstract methods introduced by Cantor and Dedekind: "no one will drive us from the paradise that Cantor has created for us." At the same time, he was sensitive to foundational criticisms of these new methods (oddly enough, now called "classical"). He proposed a way of having one's cake and eating 5 1.1. HISTORICAL BACKGROUND it too: 1. Represent classical methods with formal axioms and rules; represent mathematical questions as formulas in an axiomatic system. 2. Use safe, "finitary" methods to prove that these formal deductive systems are consistent. Hilbert's work went a long way toward accomplishing the first goal. In 1899, he had done this for geometry in his celebrated book Foundations of geometry. In subsequent years, he and a number of his students and collaborators worked on other areas of mathematics to do what Hilbert had done for geometry. Hilbert himself gave axiom systems for arithmetic and analysis. Zermelo gave an axiomatization of set theory, which was expanded on by Fraenkel, Skolem, von Neumann, and others. By the mid-1920s, there were two approaches that laid claim to the title of an axiomatization of "all" of mathematics, the Principia mathematica of Russell and Whitehead, and what came to be known as ZermeloFraenkel set theory. In 1921, Hilbert set out on a research project to establish the goal of proving these systems to be consistent. He was aided in this project by several of his students, in particular Bernays, Ackermann, and later Gentzen. The basic idea for accomplishing this goal was to cast the question of the possibility of a derivation of an inconsistency in mathmatics as a combinatorial problem about possible sequences of symbols, namely possible sequences of sentences which meet the criterion of being a correct derivation of, say, A ∧ ¬A from the axioms of an axiom system for arithmetic, analysis, or set theory. A proof of the impossibility of such a sequence of symbols would-since it is itself a mathematical proof-be formalizable in these axiomatic systems. In other words, there would be some sentence Con which states that, say, arithmetic is consistent. Moreover, this sentence should be provable in the systems in question, especially if its proof requires only very restricted, "finitary" means. 6 CHAPTER 1. INTRODUCTION TO INCOMPLETENESS The second aim, that the axiom systems developed would settle every mathematical question, can be made precise in two ways. In one way, we can formulate it as follows: For any sentence A in the language of an axiom system for mathematics, either A or ¬A is provable from the axioms. If this were true, then there would be no sentences which can neither be proved nor refuted on the basis of the axioms, no questions which the axioms do not settle. An axiom system with this property is called complete. Of course, for any given sentence it might still be a difficult task to determine which of the two alternatives holds. But in principle there should be a method to do so. In fact, for the axiom and derivation systems considered by Hilbert, completeness would imply that such a method exists-although Hilbert did not realize this. The second way to interpret the question would be this stronger requirement: that there be a mechanical, computational method which would determine, for a given sentence A, whether it is derivable from the axioms or not. In 1931, Gödel proved the two "incompleteness theorems," which showed that this program could not succeed. There is no axiom system for mathematics which is complete, specifically, the sentence that expresses the consistency of the axioms is a sentence which can neither be proved nor refuted. This struck a lethal blow to Hilbert's original program. However, as is so often the case in mathematics, it also opened up exciting new avenues for research. If there is no one, allencompassing formal system of mathematics, it makes sense to develop more circumscribesd systems and investigate what can be proved in them. It also makes sense to develop less restricted methods of proof for establishing the consistency of these systems, and to find ways to measure how hard it is to prove their consistency. Since Gödel showed that (almost) every formal system has questions it cannot settle, it makes sense to look for "interesting" questions a given formal system cannot settle, and to figure out how strong a formal system has to be to settle them. To the present day, logicians have been pursuing these questions in a new mathematical discipline, the theory of proofs. 7 1.2. DEFINITIONS 1.2 Definitions In order to carry out Hilbert's project of formalizing mathematics and showing that such a formalization is consistent and complete, the first order of business would be that of picking a language, logical framework, and a system of axioms. For our purposes, let us suppose that mathematics can be formalized in a first-order language, i.e., that there is some set of constant symbols, function symbols, and predicate symbols which, together with the connectives and quatifiers of first-order logic, allow us to express the claims of mathematics. Most people agree that such a language exists: the language of set theory, in which ∈ is the only non-logical symbol. That such a simple language is so expressive is of course a very implausible claim at first sight, and it took a lot of work to establish that practically of all mathematics can be expressed in this very austere vocabulary. To keep things simple, for now, let's restrict our discussion to arithmetic, so the part of mathematics that just deals with the natural numbers N. The natural language in which to express facts of arithmetic is LA. LA contains a single two-place predicate symbol <, a single constant symbol 0, one one-place function symbol ′, and two two-place function symbols + and ×. Definition 1.1. A set of sentences Γ is a theory if it is closed under entailment, i.e., if Γ = {A : Γ ⊨ A}. There are two easy ways to specify theories. One is as the set of sentences true in some structure. For instance, consider the structure for LA in which the domain is N and all non-logical symbols are interpreted as you would expect. Definition 1.2. The standard model of arithmetic is the structure N defined as follows: 1. |N | = N 8 CHAPTER 1. INTRODUCTION TO INCOMPLETENESS 2. 0N = 0 3. ′N(n) = n + 1 for all n ∈ N 4. +N(n,m) = n +m for all n,m ∈ N 5. ×N(n,m) = n * m for all n,m ∈ N 6. <N = {⟨n,m⟩ : n ∈ N,m ∈ N,n < m} Note the difference between × and *: × is a symbol in the language of arithmetic. Of course, we've chosen it to remind us of multiplication, but × is not the multiplication operation but a two-place function symbol (officially, f 21 . By contrast, * is the ordinary multiplication function. When you see something like n * m, we mean the product of the numbers n and m; when you see something like x × y we are talking about a term in the language of arithmetic. In the standard model, the function symbol times is interpreted as the function * on the natural numbers. For addition, we use + as both the function symbol of the language of arithmetic, and the addition function on the natural numbers. Here you have to use the context to determine what is meant. Definition 1.3. The theory of true arithmetic is the set of sentences satisfied in the standard model of arithmetic, i.e., TA = {A : N ⊨ A}. TA is a theory, for whenever TA ⊨ A, A is satisfied in every structure which satisfies TA. Since M ⊨ TA, M ⊨ A, and so A ∈ TA. The other way to specify a theory Γ is as the set of sentences entailed by some set of sentences Γ0. In that case, Γ is the "closure" of Γ0 under entailment. Specifying a theory this way is only interesting if Γ0 is explicitly specified, e.g., if the elements of Γ0 are listed. At the very least, Γ0 has to be decidable, i.e., there has to be a computable test for when a sentence counts as an 9 1.2. DEFINITIONS element of Γ0 or not. We call the sentences in Γ0 axioms for Γ , and Γ axiomatized by Γ0. Definition 1.4. A theory Γ is axiomatized by Γ0 iff Γ = {A : Γ0 ⊨ A} Definition 1.5. The theoryQ axiomatized by the following sentences is known as "Robinson's Q " and is a very simple theory of arithmetic. ∀x ∀y (x ′ = y ′ → x = y) (Q1) ∀x 0 ≠ x ′ (Q2) ∀x (x = 0 ∨ ∃y x = y ′) (Q3) ∀x (x + 0) = x (Q4) ∀x ∀y (x + y ′) = (x + y)′ (Q5) ∀x (x × 0) = 0 (Q6) ∀x ∀y (x × y ′) = ((x × y) + x) (Q7) ∀x ∀y (x < y ↔∃z (z ′ + x) = y) (Q8) The set of sentences {Q1, . . . ,Q8} are the axioms of Q , so Q consists of all sentences entailed by them: Q = {A : {Q1, . . . ,Q8} ⊨ A}. Definition 1.6. Suppose A(x) is a formula in LA with free variables x and y1, . . . , yn . Then any sentence of the form ∀y1 . . .∀yn ((A(0) ∧ ∀x (A(x) → A(x ′))) → ∀x A(x)) is an instance of the induction schema. 10 CHAPTER 1. INTRODUCTION TO INCOMPLETENESS Peano arithmetic PA is the theory axiomatized by the axioms of Q together with all instances of the induction schema. Every instance of the induction schema is true in N. This is easiest to see if the formula A only has one free variable x . Then A(x) defines a subset XA of N in N. XA is the set of all n ∈ N such that N, s ⊨ A(x) when s (x) = n. The corresponding instance of the induction schema is ((A(0) ∧ ∀x (A(x) → A(x ′))) → ∀x A(x)). If its antecedent is true in N, then 0 ∈ XA and, whenever n ∈ XA, so is n + 1. Since 0 ∈ XA, we get 1 ∈ XA. With 1 ∈ XA we get 2 ∈ XA. And so on. So for every n ∈ N, n ∈ XA. But this means that ∀x A(x) is satisfied in N. Both Q and PA are axiomatized theories. The big question is, how strong are they? For instance, can PA prove all the truths about N that can be expressed in LA? Specifically, do the axioms of PA settle all the questions that can be formulated in LA? Another way to put this is to ask: Is PA = TA? TA obviously does prove (i.e., it includes) all the truths about N, and it settles all the questions that can be formulated in LA, since if A is a sentence in LA, then either N ⊨ A or N ⊨ ¬A, and so either TA ⊨ A or TA ⊨ ¬A. Call such a theory complete. Definition 1.7. A theory Γ is complete iff for every sentence A in its language, either Γ ⊨ A or Γ ⊨ ¬A. By the Completeness Theorem, Γ ⊨ A iff Γ ⊢ A, so Γ is complete iff for every sentence A in its language, either Γ ⊢ A or Γ ⊢ ¬A. Another question we are led to ask is this: Is there a computational procedure we can use to test if a sentence is in TA, in PA, or even just in Q ? We can make this more precise by defining when a set (e.g., a set of sentences) is decidable. 11 1.2. DEFINITIONS Definition 1.8. A set X is decidable iff there is a computational procedure which on input x returns 1 if x ∈ X and 0 otherwise. So our question becomes: Is TA (PA, Q ) decidable? The answer to all these questions will be: no. None of these theories are decidable. However, this phenomenon is not specific to these particular theories. In fact, any theory that satisfies certain conditions is subject to the same results. One of these conditions, which Q and PA satisfy, is that they are axiomatized by a decidable set of axioms. Definition 1.9. A theory is axiomatizable if it is axiomatized by a decidable set of axioms. Example 1.10. Any theory axiomatized by a finite set of sentences is axiomatizable, since any finite set is decidable. Thus, Q , for instance, is axiomatizable. Schematically axiomatized theories like PA are also axiomatizable. For to test if B is among the axioms of PA, i.e., to compute the function χX where χX (B) = 1 if B is an axiom of PA and = 0 otherwise, we can do the following: First, check if B is one of the axioms of Q . If it is, the answer is "yes" and the value of χX (B) = 1. If not, test if it is an instance of the induction schema. This can be done systematically; in this case, perhaps it's easiest to see that it can be done as follows: Any instance of the induction schema begins with a number of universal quantifiers, and then a sub-formula that is a conditional. The consequent of that conditional is ∀x A(x, y1, . . . , yn) where x and y1, . . . , yn are all the free variables of A and the initial quantifiers of B bind the variables y1, . . . , yn . Once we have extracted this A and checked that its free variables match the variables bound by the universal qauntifiers at the front and ∀x , we go on to check that the antecedent of the conditional matches A(0, y1, . . . , yn) ∧ ∀x (A(x, y1, . . . , yn) → A(x ′, y1, . . . , yn)) 12 CHAPTER 1. INTRODUCTION TO INCOMPLETENESS Again, if it does, B is an instance of the induction schema, and if it doesn't, B isn't. In answering this question-and the more general question of which theories are complete or decidable-it will be useful to consider also the following definition. Recall that a set X is countable iff it is empty or if there is a surjective function f : N→ X . Such a function is called an enumeration of X . Definition 1.11. A set X is called computably enumerable (c.e. for short) iff it is empty or it has a computable enumeration. In addition to axiomatizability, another condition on theories to which the incompleteness theorems apply will be that they are strong enough to prove basic facts about computable functions and decidable relations. By "basic facts," we mean sentences which express what the values of computable functions are for each of their arguments. And by "strong enough" we mean that the theories in question count these sentences among its theorems. For instance, consider a prototypical computable function: addition. The value of + for arguments 2 and 3 is 5, i.e., 2+3 = 5. A sentence in the language of arithmetic that expresses that the value of + for arguments 2 and 3 is 5 is: (2 + 3) = 5. And, e.g., Q proves this sentence. More generally, we would like there to be, for each computable function f (x1,x2) a formula A f (x1,x2, y) in LA such thatQ ⊢ A f (n1,n2,m) whenever f (n1,n2) = m. In this way, Q proves that the value of f for arguments n1, n2 is m. In fact, we require that it proves a bit more, namely that no other number is the value of f for arguments n1, n2. And the same goes for decidable relations. This is made precise in the following two definitions. 13 1.2. DEFINITIONS Definition 1.12. A formula A(x1, . . . ,xk , y) represents the function f : Nk → N in Γ iff whenever f (n1, . . . ,nk ) = m, then 1. Γ ⊢ A(n1, . . . ,nk ,m), and 2. Γ ⊢ ∀y(A(n1, . . . ,nk , y) → y = m). Definition 1.13. A formula A(x1, . . . ,xk ) represents the relation R ⊆ Nk iff, 1. whenever R(n1, . . . ,nk ), Γ ⊢ A(n1, . . . ,nk ), and 2. whenever not R(n1, . . . ,nk ), Γ ⊢ ¬A(n1, . . . ,nk ). A theory is "strong enough" for the incompleteness theorems to apply if it represents all computable functions and all decidable relations. Q and its extensions satisfy this condition, but it will take us a while to establish this-it's a non-trivial fact about the kinds of things Q can prove, and it's hard to show because Q has only a few axioms from which we'll have to prove all these facts. However, Q is a very weak theory. So although it's hard to prove that Q represents all computable functions, most interesting theories are stronger than Q , i.e., prove more than Q does. And if Q proves something, any stronger theory does; since Q represents all computable functions, every stronger theory does. This means that many interesting theories meet this condition of the incompleteness theorems. So our hard work will pay off, since it shows that the incompletess theorems apply to a wide range of theories. Certainly, any theory aiming to formalize "all of mathematics" must prove everything that Q proves, since it should at the very least be able to capture the results of elementary computations. So any theory that is a candidate for a theory of "all of mathematics" will be one to which the incompleteness theorems apply. 14 CHAPTER 1. INTRODUCTION TO INCOMPLETENESS 1.3 Overview of Incompleteness Results Hilbert expected that mathematics could be formalized in an axiomatizable theory which it would be possible to prove complete and decidable. Moreover, he aimed to prove the consistency of this theory with very weak, "finitary," means, which would defend classical mathematics agianst the challenges of intuitionism. Gödel's incompleteness theorems showed that these goals cannot be achieved. Gödel's first incompleteness theorem showed that a version of Russell and Whitehead's Principia Mathematica is not complete. But the proof was actually very general and applies to a wide variety of theories. This means that it wasn't just that Principia Mathematica did not manage to completely capture mathematics, but that no acceptable theory does. It took a while to isolate the features of theories that suffice for the incompleteness theorems to apply, and to generalize Gödel's proof to apply make it depend only on these features. But we are now in a position to state a very general version of the first incompleteness theorem for theories in the language LA of arithmetic. Theorem 1.14. If Γ is a consistent and axiomatizable theory in LA which represents all computable functions and decidable relations, then Γ is not complete. To say that Γ is not complete is to say that for at least one sentence A, Γ ⊬ A and Γ ⊬ ¬A. Such a sentence is called independent (of Γ). We can in fact relatively quickly prove that there must be independent sentences. But the power of Gödel's proof of the theorem lies in the fact that it exhibits a specific example of such an independent sentence. The intriguing construction produces a sentence GΓ , called a Gödel sentence for Γ , which is unprovable because in Γ ,GΓ is equivalent to the claim thatGΓ is unprovable in Γ . It does so constructively, i.e., given an axiomatization of Γ and a description of the proof system, the proof gives a method for actually writing down GΓ . 15 1.3. OVERVIEW OF INCOMPLETENESS RESULTS The construction in Gödel's proof requires that we find a way to express in LA the properties of and operations on terms and formulas of LA itself. These include properties such as "A is a sentence," "δ is a derivation of A," and operations such as A[t/x]. This way must (a) express these properties and relations via a "coding" of symbols and sequences thereof (which is what terms, formulas, derivations, etc. are) as natural numbers (which is what LA can talk about). It must (b) do this in such a way that Γ will prove the relevant facts, so we must show that these properties are coded by decidable properties of natural numbers and the operations correspond to computable functions on natural numbers. This is called "arithmetization of syntax." Before we investigate how syntax can be arithmetized, however, we will consider the condition that Γ is "strong enough," i.e., represents all computable functions and decidable relations. This requires that we give a precise definition of "computable." This can be done in a number of ways, e.g., via the model of Turing machines, or as those functions computable by programs in some general-purpose programming language. Since our aim is to represent these functions and relations in a theory in the language LA, however, it is best to pick a simple definition of computability of just numerical functions. This is the notion of recursive function. So we will first discuss the recursive functions. We will then show that Q already represents all recursive functions and relations. This will allow us to apply the incompleteness theorem to specific theories such asQ and PA, since we will have established that these are examples of theories that are "strong enough." The end result of the arithmetization of syntax is a formula ProvΓ (x) which, via the coding of formulas as numbers, expresses provability from the axioms of Γ . Specifically, if A is coded by the number n, and Γ ⊢ A, then Γ ⊢ ProvΓ (n). This "provability predicate" for Γ allows us also to express, in a certain sense, the consistency of Γ as a sentence of LA: let the "consistency statement" for Γ be the sentence ¬ProvΓ (n), where we take n to be the code of a contradiction, e.g., of ⊥. The second incompleteness 16 CHAPTER 1. INTRODUCTION TO INCOMPLETENESS theorem states that consistent axiomatizable theories also do not prove their own consistency statements. The conditions required for this theorem to apply are a bit more stringent than just that the theory represents all computable functions and decidable relations, but we will show that PA satisifes them. 1.4 Undecidability and Incompleteness Gödel's proof of the incompleteness theorems require arithmetization of syntax. But even without that we can obtain some nice results just on the assumtion that a theory represents all decidable relations. The proof is a diagonal argument similar to the proof of the undecidability of the halting problem. Theorem 1.15. If Γ is a consistent theory that represents every decidable relation, then Γ is not decidable. Proof. Suppose Γ were decidable. We show that if Γ represents every decidable relation, it must be inconsistent. Decidable properties (one-place relations) are represented by formulas with one free variable. Let A0(x), A1(x), . . . , be a computable enumeration of all such formulas. Now consider the following set D ⊆ N: D = {n : Γ ⊢ ¬An(n)} The set D is decidable, since we can test if n ∈ D by first computing An(x), and from this ¬An(n). Obviously, substituting the term n for every free occurrence of x in An(x) and prefixing A(n) by ¬ is a mechanical matter. By assumption, Γ is decidable, so we can test if ¬A(n) ∈ Γ . If it is, n ∈ D , and if it isn't, n ∉ D . So D is likewise decidable. Since Γ represents all decidable properties, it represents D . And the formulas which represent D in Γ are all among A0(x), A1(x), . . . . So let d be a number such that Ad (x) represents D in Γ . If d ∉ D , then, since Ad (x) represents D , Γ ⊢ ¬Ad (d ). 17 1.4. UNDECIDABILITY AND INCOMPLETENESS But that means that d meets the defining condition of D , and so d ∈ D . This contradicts d ∉ D . So by indirect proof, d ∈ D . Since d ∈ D , by the definition of D , Γ ⊢ ¬Ad (d ). On the other hand, since Ad (x) represents D in Γ , Γ ⊢ Ad (d ). Hence, Γ is inconsistent. □ The preceding theorem shows that no theory that represents all decidable relations can be decidable. We will show that Q does represent all decidable relations; this means that all theories that include Q , such as PA and TA, also do, and hence also are not decidable. We can also use this result to obtain a weak version of the first incompleteness theorem. Any theory that is axiomatizable and complete is decidable. Consistent theories that are axiomatizable and represent all decidable properties then cannot be complete. Theorem 1.16. If Γ is axiomatizable and complete it is decidable. Proof. Any inconsistent theory is decidable, since inconsistent theories contain all sentences, so the answer to the question "is A ∈ Γ" is always "yes," i.e., can be decided. So suppose Γ is consistent, and furthermore is axiomatizable, and complete. Since Γ is axiomatizable, it is computably enumerable. For we can enumerate all the correct derivations from the axioms of Γ by a computable function. From a correct derivation we can compute the sentence it derives, and so together there is a computable function that enumerates all theorems of Γ . A sentence is a theorem of Γ iff ¬A is not a theorem, since Γ is consistent and complete. We can therefore decide if A ∈ Γ as follows. Enumerate all theorems of Γ . When A appears on this list, we know that Γ ⊢ A. When ¬A appears on this list, we know that Γ ⊬ A. Since Γ is complete, one of these cases eventually obtains, so the procedure eventually produces and answer. □ 18 CHAPTER 1. INTRODUCTION TO INCOMPLETENESS Corollary 1.17. If Γ is consistent, axiomatizable, and represents every decidable property, it is not complete. Proof. If Γ were complete, it would be decidable by the previous theorem (since it is axiomatizable and consistent). But since Γ represents every decidable property, it is not decidable, by the first theorem. □ Once we have established that, e.g., Q , represents all decidable properties, the corollary tells us thatQ must be incomplete. However, its proof does not provide an example of an independent sentence; it merely shows that such a sentence must exist. For this, we have to arithmetize syntax and follow Gödel's original proof idea. And of course, we still have to show the first claim, namely that Q does, in fact, represent all decidable properties. It should be noted that not every interesting theory is incomplete or undecidable. There are many theories that are sufficiently strong to describe interesting mathematical facts that do not satisify the conditions of Gödel's result. For instance, Pres = {A ∈ LA+ : N ⊨ A}, the set of sentences of the language of arithmetic without × true in the standard model, is both complete and decidable. This theory is called Presburger arithmetic, and proves all the truths about natural numbers that can be formulated just with 0, ′, and +. Summary Hilbert's program aimed to show that all of mathematics could be formalized in an axiomatized theory in a formal language, such as the language of arithmetic or of set theory. He believed that such a theory would be complete. That is, for every sentence A, either T ⊢ A or T ⊢ ¬A. In this sense then, T would have settled every mathematical question: it would either prove that it's true or that it's false. If Hilbert had been right, it would also have turned out that mathematics is decidable. That's because any axiomatizable theory is computably enumerable, i.e., there is 19 1.4. UNDECIDABILITY AND INCOMPLETENESS a computable function that lists all its theorems. We can test if a sentence A is a theorem by listing all of them until we find A (in which it is a theorem) or ¬A (in which case it isn't). Alas, Hilbert was wrong. Gödel proved that no axiomatizable, consistent theory that is "strong enough" is complete. That's the first incompleteness theorem. The requirement that the theory be "strong enough" amounts to it representing all computable functions and relations. Specifically, the very weak theory Q satisfies this property, and any theory that is at least as strong as Q also does. He also showed-that is the second incompleteness theorem-that the sentence that expresses the consistency of the theory is itself undecidable in it, i.e., the theory proves neither it nor its negation. So Hilbert's further aim of finding "finitary" consistency proof of all of mathematics cannot be realized. For any finitary consistency proof would, presumably, be formalizable in a theory that captures all of mathematics. Finally, we established that theories that represent all computable functions and relations are not decidable. Note that although axomatizability and completeness implies decidability, incompleteness does not imply undecidability. So this result shows that the second of Hilbert's goals, namely that there be a procedure that decides if T ⊢ A or not, can also not be achieved, at least not for theories at least as strong as Q . Problems Problem 1.1. Show thatTA = {A : N ⊨ A} is not axiomatizable. You may assume that TA represents all decidable properties. CHAPTER 2 Recursive Functions 2.1 Introduction In order to develop a mathematical theory of computability, one has to, first of all, develop a model of computability. We now think of computability as the kind of thing that computers do, and computers work with symbols. But at the beginning of the development of theories of computability, the paradigmatic example of computation was numerical computation. Mathematicians were always interested in number-theoretic functions, i.e., functions f : Nn → N that can be computed. So it is not surprising that at the beginning of the theory of computability, it was such functions that were studied. The most familiar examples of computable numerical functions, such as addition, multiplication, exponentiation (of natural numbers) share an interesting feature: they can be defined recursively. It is thus quite natural to attempt a general definition of computable function on the basis of recursive definitions. Among the many possible ways to define number-theoretic functions recursively, one particulalry simple pattern of definition here becomes central: so-called primitive recursion. In addition to computable functions, we might be interested 20 21 2.2. PRIMITIVE RECURSION in computable sets and relations. A set is computable if we can compute the answer to whether or not a given number is an element of the set, and a relation is computable iff we can compute whether or not a tuple ⟨n1, . . . ,nk ⟩ is an element of the relation. By considering the characteristic function of a set or relation, discussion of computable sets and relations can be subsumed under that of computable functions. Thus we can define primitive recursive relations as well, e.g., the relation "n evenly divides m" is a primitive recursive relation. Primitive recursive functions-those that can be defined using just primitive recursion-are not, however, the only computable number-theoretic functions. Many generalizations of primitive recursion have been considered, but the most powerful and widelyaccepted additional way of computing functions is by unbounded search. This leads to the definition of partial recursive functions, and a related definition to general recursive functions. General recursive functions are computable and total, and the definition characterizes exactly the partial recursive functions that happen to be total. Recursive functions can simulate every other model of computation (Turing machines, lambda calculus, etc.) and so represent one of the many accepted models of computation. 2.2 Primitive Recursion A characteristic of the natural numbers is that every natural number can be reached from 0 by applying the successor operation +1 finitely many times-any natural number is either 0 or the successor of . . . the successor of 0. One way to specify a function f : N → N that makes use of this fact is this: (a) specify what the value of f is for argument 0, and (b) also specify how to, given the value of f (x), compute the value of f (x +1). For (a) tells us directly what f (0) is, so f is defined for 0. Now, using the instruction given by (b) for x = 0, we can compute f (1) = f (0+1) from f (0). Using the same instructions for x = 1, we compute f (2) = f (1 + 1) from f (1), and so on. For every natural num22 CHAPTER 2. RECURSIVE FUNCTIONS ber x , we'll eventually reach the step where we define f (x) from f (x + 1), and so f (x) is defined for all x ∈ N. For instance, suppose we specify h : N → N by the following two equations: h(0) = 1 h(x + 1) = 2 * h(x) If we already know how to multiply, then these equations give us the information required for (a) and (b) above. Successively the second equation, we get that h(1) = 2 * h(0) = 2, h(2) = 2 * h(1) = 2 * 2, h(3) = 2 * h(2) = 2 * 2 * 2, ... We see that the function h we have specified is h(x) = 2x . The characteristic feature of the natural numbers guarantees that there is only one function d that meets these two criteria. A pair of equations like these is called a definition by primitive recursion of the function d . It is so-called because we define f "recursively," i.e., the definition, specifically the second equation, involves f itself on the right-hand-side. It is "primitive" because in defining f (x +1) we only use the value f (x), i.e., the immediately preceding value. This is the simplest way of defining a function on N recursively. We can define even more fundamental functions like addition and multiplication by primitive recursion. In these cases, however, the functions in question are 2-place. We fix one of the argument places, and use the other for the recursion. E.g, to define add(x, y) we can fix x and define the value first for y = 0 and then for y + 1 in terms of y . Since x is fixed, it will appear on the left and on the right side of the defining equations. add(x,0) = x 23 2.2. PRIMITIVE RECURSION add(x, y + 1) = add(x, y) + 1 These equations specify the value of add for all x and y . To find add(2,3), for instance, we apply the defining equations for x = 2, using the first to find add(2,0) = 2, then using the second to successively find add(2,1) = 2 + 1 = 3, add(2,2) = 3 + 1 = 4, add(2,3) = 4 + 1 = 5. In the definition of add we used + on the right-hand-side of the second equation, but only to add 1. In other words, we used the successor function succ(z ) = z + 1 and applied it to the previous value add(x, y) to define add(x, y + 1). So we can think of the recursive definition as given in terms of a single function which we apply to the previous value. However, it doesn't hurt-and sometimes is necessary-to allow the function to depend not just on the previous value but also on x and y . Consider: mult(x,0) = 0 mult(x, y + 1) = add(mult(x, y),x) This is a primitive recursive definition of a function mult by applying the function add to both the preceding value mult(x, y) and the first argument x . It also defines the function mult(x, y) for all arguments x and y . For instance, mult(2,3) is determined by successively computingmult(2,0), mult(2,1), mult(2,2), and mult(2,3): mult(2,0) = 0 mult(2,1) = mult(2,0 + 1) = add(mult(2,0),2) = add(0,2) = 2 mult(2,2) = mult(2,1 + 1) = add(mult(2,1),2) = add(2,2) = 4 mult(2,3) = mult(2,2 + 1) = add(mult(2,2),2) = add(4,2) = 6 The general pattern then is this: to give a primitive recursive definition of a function h(x0, . . . ,xk−1, y), we provide two equations. The first defines the value of h(x0, . . . ,xk−1,0) without reference to f . The second defines the value of h(x0, . . . ,xk−1, y + 1) in terms of h(x0, . . . ,xk−1, y), the other arguments x0, . . . , xk−1, 24 CHAPTER 2. RECURSIVE FUNCTIONS and y . Only the immediately preceding value of h may be used in that second equation. If we think of the operations given by the right-hand-sides of these two equations as themselves being functions f and g , then the pattern to define a new function h by primitive recursion is this: h(x0, . . . ,xk−1,0) = f (x0, . . . ,xk−1) h(x0, . . . ,xk−1, y + 1) = g (x0, . . . ,xk−1, y,h(x0, . . . ,xk−1, y)) In the case of add, we have k = 0 and f (x0) = x0 (the identity function), and g (x0, y, z ) = z +1 (the 3-place function that returns the successor of its third argument): add(x0,0) = f (x0) = x0 add(x0, y + 1) = g (x0, y,add(x0, y)) = succ(add(x0, y)) In the case of mult, we have f (x0) = 0 (the constant function always returning 0) and g (x0, y, z ) = add(z,x0) (the 3-place function that returns the sum of its last and first argument): mult(x0,0) = f (x0) = 0 mult(x0, y + 1) = g (x0, y,mult(x0, y)) = add(mult(x0, y),x0) 2.3 Composition If f and g are two one-place functions of natural numbers, we can compose them: h(x) = g (f (x). The new function h(x) is then defined by composition from the functions f and g . We'd like to generalize this to functions of more than one argument. Here's one way of doing this: suppose f is a k -place function, and g0, . . . , gk−1 are k functions which are all n-place. Then we can define a new n-place function h as follows: h(x0, . . . ,xn−1) = f (g0(x0, . . . ,xn−1), . . . , gk−1(x0, . . . ,xn−1)) If f and all gi are computable, so is h: To compute h(x0, . . . ,xn−1), first compute the values yi = gi (x0, . . . ,xn−1) for each i = 0, . . . , k− 25 2.3. COMPOSITION 1. Then feed these values into f to compute h(x0, . . . ,xk−1) = f (y0, . . . , yk−1). This may seem like an overly restrictive characterization of what happens when we compute a new function using some existing ones. For one thing, sometimes we do not use all the arguments of a function, as when we defined g (x, y, z ) = succ(z ) for use in the primitive recursive definition of add. Suppose we are allowed use of the following functions: P ni (x0, . . . ,xn−1) = xi The functions P ki are called projection functions: P n i is an n-place function. Then g can be defined by g (x, y, z ) = succ(P 32 ). Here the role of f is played by the 1-place function succ, so k = 1. And we have one 3-place function P 32 which plays the role of g0. The result is a 3-place function that returns the successor of the third argument. The projection functions also allow us to define new functions by reordering or identifying arguments. For instance, the function h(x) = add(x,x) can be defined by h(x0) = add(P 10 (x0),P 1 0 (x0)). Here k = 2, n = 1, the role of f (y0, y1) is played by add, and the roles of g0(x0) and g1(x0) are both played by P 10 (x0), the one-place projection function (aka the identity function). If f (y0, y1) is a function we already have, we can define the function h(x0,x1) = f (x1,x0) by h(x0,x1) = f (P 21 (x0,x1),P 2 0 (x0,x1)). Here k = 2, n = 2, and the roles of g0 and g1 are played by P 21 and P 20 , respectively. You may also worry that g0, . . . , gk−1 are all required to have the same arity n. (Remember that the arity of a function is the 26 CHAPTER 2. RECURSIVE FUNCTIONS number of arguments; an n-place function has arity n.) But adding the projection functions provides the desired flexibility. For example, suppose f and g are 3-place functions and h is the 2-place function defined by h(x, y) = f (x, g (x,x, y), y). The definition of h can be rewritten with the projection functions, as h(x, y) = f (P 20 (x, y), g (P 2 0 (x, y),P 2 0 (x, y),P 2 1 (x, y)),P 2 1 (x, y)). Then h is the composition of f with P 20 , l , and P 2 1 , where l (x, y) = g (P 20 (x, y),P 2 0 (x, y),P 2 1 (x, y)), i.e., l is the composition of g with P 20 , P 2 0 , and P 2 1 . 2.4 Primitive Recursion Functions Let us record again how we can define new functions from existing ones using primitive recursion and composition. Definition 2.1. Suppose f is a k -place function (k ≥ 1) and g is a (k + 2)-place function. The function defined by primitive recursion from f and g is the (k + 1)-place function h defined by the equations h(x0, . . . ,xk−1, y) = f (x0, . . . ,xk−1) h(x0, . . . ,xk−1, y + 1) = g (x0, . . . ,xk−1, y,h(x0, . . . ,xk−1, y)) Definition 2.2. Suppose f is a k -place function, and g0, . . . , gk−1 are k functions which are all n-place. The function defined by composition from f and g0, . . . , gk−1 is the n-place function h defined 27 2.4. PRIMITIVE RECURSION FUNCTIONS by h(x0, . . . ,xn−1) = f (g0(x0, . . . ,xn−1), . . . , gk−1(x0, . . . ,xn−1)). In addition to succ and the projection functions P ni (x0, . . . ,xn−1) = xi , for each natural number n and i < n, we will include among the primitive recursive functions the function zero(x) = 0. Definition 2.3. The set of primitive recursive functions is the set of functions from Nn to N, defined inductively by the following clauses: 1. zero is primitive recursive. 2. succ is primitive recursive. 3. Each projection function P ni is primitive recursive. 4. If f is a k -place primitive recursive function and g0, . . . , gk−1 are n-place primitive recursive functions, then the composition of f with g0, . . . , gk−1 is primitive recursive. 5. If f is a k -place primitive recursive function and g is a k + 2-place primitive recursive function, then the function defined by primitive recursion from f and g is primitive recursive. Put more concisely, the set of primitive recursive functions is the smallest set containing zero, succ, and the projection functions P nj , and which is closed under composition and primitive recursion. Another way of describing the set of primitive recursive functions is by defining it in terms of "stages." Let S0 denote the set of starting functions: zero, succ, and the projections. These are the primitive recursive functions of stage 0. Once a stage Si has 28 CHAPTER 2. RECURSIVE FUNCTIONS been defined, let Si+1 be the set of all functions you get by applying a single instance of composition or primitive recursion to functions already in Si . Then S = ⋃︂ i ∈N Si is the set of all primitive recursive functions Let us verify that add is a primitive recursive function. Proposition 2.4. The addition function add(x, y) = x + y is primitive recursive. Proof. We already have a primitive recursive definition of add in terms of two functions f and g which matches the format of Definition 2.1: add(x0,0) = f (x0) = x0 add(x0, y + 1) = g (x0, y,add(x0, y)) = succ(add(x0, y)) So add is primitive recursive provided f and g are as well. f (x0) = x0 = P 10 (x0), and the projection functions count as primitive recursive, so f is primitive recursive. The function g is the three-place function g (x0, y, z ) defined by g (x0, y, z ) = succ(z ). This does not yet tell us that g is primitive recursive, since g and succ are not quite the same function: succ is one-place, and g has to be three-place. But we can define g "officially" by composition as g (x0, y, z ) = succ(P 32 (x0, y, z )) Since succ and P 32 count as primitive recursive functions, g does as well, since it can be defined by composition from primitive recursive functions. □ 29 2.4. PRIMITIVE RECURSION FUNCTIONS Proposition 2.5. The multiplication function mult(x, y) = x * y is primitive recursive. Proof. Exercise. □ Example 2.6. Here's our very first example of a primitive recursive definition: h(0) = 1 h(y + 1) = 2 * h(y). This function cannot fit into the form required by Definition 2.1, since k = 0. The definition also involves the constants 1 and 2. To get around the first problem, let's introduce a dummy argument and define the function h ′: h ′(x0,0) = f (x0) = 1 h ′(x0, y + 1) = g (x0, y,h ′(x0, y)) = 2 * h ′(x0, y). The function f (x0) = 1 can be defined from succ and zero by composition: f (x0) = succ(zero(x0)). The function g can be defined by composition from g ′(z ) = 2 * z and projections: g (x0, y, z ) = g ′(P 32 (x0, y, z )) and g ′ in turn can be defined by composition as g ′(z ) = mult(g ′′(z ),P 10 (z )) and g ′′(z ) = succ(f (z )), where f is as above: f (z ) = succ(zero(z )). Now that we have h ′ we can use composition again to let h(y) = h ′(P 10 (y),P 1 0 (y)). This shows that h can be defined from the basic functions using a sequence of compositions and primitive recursions, so h is primitive recursive. 30 CHAPTER 2. RECURSIVE FUNCTIONS 2.5 Primitive Recursion Notations One advantage to having the precise inductive description of the primitive recursive functions is that we can be systematic in describing them. For example, we can assign a "notation" to each such function, as follows. Use symbols zero, succ, and P ni for zero, successor, and the projections. Now suppose f is defined by composition from a k -place function h and n-place functions g0, . . . , gk−1, and we have assigned notations H , G0, . . . , Gk−1 to the latter functions. Then, using a new symbol Compk,n , we can denote the function f by Compk,n[H ,G0, . . . ,Gk−1]. For the functions defined by primitive recursion, we can use analogous notations of the form Reck [G ,H ], where k + 1 is the arity of the function being defined. With this setup, we can denote the addition function by Rec2[P 10 ,Comp1,3[succ,P 3 2 ]]. Having these notations sometimes proves useful. 2.6 Primitive Recursive Functions are Computable Suppose a function h is defined by primitive recursion h(x⃗,0) = f (x⃗) h(x⃗, y) = g (x⃗, y,h(x⃗, y)) and suppose the functions f and g are computable. (We use x⃗ to abbreviate x0, . . . , xk−1.) Then h(x⃗,0) can obviously be computed, since it is just f (x⃗) which we assume is computable. h(x⃗,1) can then also be computed, since 1 = 0 + 1 and so h(x⃗,1) is just h(x⃗,1) = g (x⃗,0,h(x⃗,0)) = g (x⃗,0, f (x⃗)). We can go on in this way and compute h(x⃗,2) = g (x⃗,1,h(x⃗,1)) = g (x⃗,1, g (x⃗,0, f (x⃗))) 31 2.7. EXAMPLES OF PRIMITIVE RECURSIVE FUNCTIONS h(x⃗,3) = g (x⃗,2,h(x⃗,2)) = g (x⃗,2, g (x⃗,1, g (x⃗,0, f (x⃗)))) h(x⃗,4) = g (x⃗,3,h(x⃗,3)) = g (x⃗,3, g (x⃗,2, g (x⃗,1, g (x⃗,0, f (x⃗))))) ... Thus, to compute h(x⃗, y) in general, successively compute h(x⃗,0), h(x⃗,1), . . . , until we reach h(x⃗, y). Thus, a primitive recursive definition yields a new computable function if the functions f and g are computable. Composition of functions also results in a computable function if the functions f and gi are computable. Since the basic functions zero, succ, and P ni are computable, and composition and primitive recursion yield computable functions from computable functions, this means that every primitive recursive function is computable. 2.7 Examples of Primitive Recursive Functions We already have some examples of primitive recursive functions: the addition and multiplication functions add and mult. The identity function id(x) = x is primitive recursive, since it is just P 10 . The constant functions constn(x) = n are primitive recursive since they can be defined from zero and succ by successive composition. This is useful when we want to use constants in primitive recursive definitions, e.g., if we want to define the function f (x) = 2 * x can obtain it by composition from constn(x) and multiplication as f (x) = mult(const2(x),P 10 (x)). We'll make use of this trick from now on. Proposition 2.7. The exponentiation function exp(x, y) = x y is primitive recursive. Proof. We can define exp primitive recursively as exp(x,0) = 1 32 CHAPTER 2. RECURSIVE FUNCTIONS exp(x, y + 1) = mult(x, exp(x, y)). Strictly speaking, this is not a recursive definition from primitive recursive functions. Officially, though, we have: exp(x,0) = f (x) exp(x, y + 1) = g (x, y, exp(x, y)). where f (x) = succ(zero(x)) = 1 g (x, y, z ) = mult(P 30 (x, y, z ),P 3 2 (x, y, z ) = x * z and so f and g are defined from primitive recursive functions by composition. □ Proposition 2.8. The predecessor function pred(y) defined by pred(y) = {︄ 0 if y = 0 y − 1 otherwise is primitive recursive. Proof. Note that pred(0) = 0 and pred(y + 1) = y . This is almost a primitive recursive definition. It does not, strictly speaking, fit into the pattern of definition by primitive recursion, since that pattern requires at least one extra argument x . It is also odd in that it does not actually use pred(y) in the definition of pred(y + 1). But we can first define pred′(x, y) by pred′(x,0) = zero(x) = 0, pred′(x, y + 1) = P 31 (x, y,pred ′(x, y)) = y . and then define pred from it by composition, e.g., as pred(x) = pred′(zero(x),P 10 (x)). □ 33 2.7. EXAMPLES OF PRIMITIVE RECURSIVE FUNCTIONS Proposition 2.9. The factorial function fac(x) = x ! = 1 *2 *3 * * * * *x is primitive recursive. Proof. The obvious primitive recursive definition is fac(0) = 1 fac(y + 1) = fac(y) * (y + 1). Officially, we have to first define a two-place function h h(x,0) = const1(x) h(x, y) = g (x, y,h(x, y)) where g (x, y, z ) = mult(P 32 (x, y, z ), succ(P 3 1 (x, y, z ) and then let fac(y) = h(P 10 (y),P 1 0 (y)) From now on we'll be a bit more laissez-faire and not give the official definitions by composition and primitive recursion. □ Proposition 2.10. Truncated subtraction, x − y , defined by x − y = {︄ 0 if x > y x − y otherwise is primitive recursive. Proof. We have: x − 0 = x x − (y + 1) = pred(x − y) □ 34 CHAPTER 2. RECURSIVE FUNCTIONS Proposition 2.11. The distance between x and y , |︁|︁x − y |︁|︁, is primitive recursive. Proof. We have |︁|︁x − y |︁|︁ = (x − y) + (y − x), so the distance can be defined by composition from + and −, which are primitive recursive. □ Proposition 2.12. The maximum of x and y , max(x, y), is primitive recursive. Proof. We can define max(x, y) by composition from + and − by max(x, y) = x + (y − x). If x is the maximum, i.e., x ≥ y , then y − x = 0, so x + (y − x) = x + 0 = x . If y is the maximum, then y − x = y − x , and so x + (y − x) = x + (y − x) = y . □ Proposition 2.13. The minimum of x and y , min(x, y), is primitive recursive. Proof. Exercise. □ Proposition 2.14. The set of primitive recursive functions is closed under the following two operations: 1. Finite sums: if f (x⃗, z ) is primitive recursive, then so is the function g (x⃗, y) = y∑︂ z=0 f (x⃗, z ). 2. Finite products: if f (x⃗, z ) is primitive recursive, then so is the 35 2.8. PRIMITIVE RECURSIVE RELATIONS function h(x⃗, y) = y∏︂ z=0 f (x⃗, z ). Proof. For example, finite sums are defined recursively by the equations g (x⃗,0) = f (x⃗,0) g (x⃗, y + 1) = g (x⃗, y) + f (x⃗, y + 1). □ 2.8 Primitive Recursive Relations Definition 2.15. A relation R(x⃗) is said to be primitive recursive if its characteristic function, χR(x⃗) = {︃ 1 if R(x⃗) 0 otherwise is primitive recursive. In other words, when one speaks of a primitive recursive relation R(x⃗), one is referring to a relation of the form χR(x⃗) = 1, where χR is a primitive recursive function which, on any input, returns either 1 or 0. For example, the relation IsZero(x), which holds if and only if x = 0, corresponds to the function χIsZero, defined using primitive recursion by χIsZero(0) = 1, χIsZero(x + 1) = 0. It should be clear that one can compose relations with other primitive recursive functions. So the following are also primitive recursive: 1. The equality relation, x = y , defined by IsZero( |︁|︁x − y |︁|︁) 2. The less-than relation, x ≤ y , defined by IsZero(x − y) 36 CHAPTER 2. RECURSIVE FUNCTIONS Proposition 2.16. The set of primitive recursive relations is closed under boolean operations, that is, if P (x⃗) and Q (x⃗) are primitive, so are 1. ¬R(x⃗) 2. P (x⃗) ∧Q (x⃗) 3. P (x⃗) ∨Q (x⃗) 4. P (x⃗) →Q (x⃗) Proof. Suppose P (x⃗) and Q (x⃗) are primitive recursive, i.e., their characteristic functions χP and χQ are. We have to show that the characteristic functions of ¬R(x⃗), etc., are also primitive recursive. χ¬P (x⃗) = {︄ 0 if χP (x⃗) = 1 1 otherwise We can define χ¬P (x⃗) as 1 − χP (x⃗). χP∧Q (x⃗) = {︄ 1 if χP (x⃗) = χQ (x⃗) = 1 0 otherwise We can define χP∧Q (x⃗) as χP (x⃗) * χQ (x⃗) or as min(χP (x⃗), χQ (x⃗)). Similarly, χP∨Q (x⃗) = max(χP (x⃗), χQ (x⃗)) and χP∨Q (x⃗) = max(1 − χP (x⃗), χQ (x⃗)). □ Proposition 2.17. The set of primitive recursive relations is closed under bounded quantification, i.e., if R(x⃗, z ) is a primitive recursive relation, then so are the relations (∀z < y) R(x⃗, z ) and (∃z < y) R(x⃗, z ). ((∀z < y) R(x⃗, z ) holds of x⃗ and y if and only if R(x⃗, z ) holds for every z less than y , and similarly for (∃z < y) R(x⃗, z ).) Proof. By convention, we take (∀z < 0) R(x⃗, z ) to be true (for the trivial reason that there are no z less than 0) and (∃z < 0) R(x⃗, z ) 37 2.8. PRIMITIVE RECURSIVE RELATIONS to be false. A universal quantifier functions just like a finite product or iterated minimum, i.e., if P (x⃗, y) ⇔ (∀z < y) R(x⃗, z ) then χP (x⃗, y) can be defined by χP (x⃗,0) = 1 χP (x⃗, y + 1) = min(χP (x⃗, y), χR(x⃗, y + 1))). Bounded existential quantification can similarly be defined using max. Alternatively, it can be defined from bounded universal quantification, using the equivalence (∃z < y) R(x⃗, z ) ↔ ¬(∀z < y) ¬R(x⃗, z ). Note that, for example, a bounded quantifier of the form (∃x ≤ y) . . . x . . . is equivalent to (∃x < y + 1) . . . x . . . . □ Another useful primitive recursive function is the conditional function, cond(x, y, z ), defined by cond(x, y, z ) = {︄ y if x = 0 z otherwise. This is defined recursively by cond(0, y, z ) = y, cond(x + 1, y, z ) = z . One can use this to justify definitions of primitive recursive functions by cases from primitive recursive relations: Proposition 2.18. If g0(x⃗), . . . , gm(x⃗) are functions, and R1(x⃗), . . . , Rm−1(x⃗) are primitive recursive relations, then the function f defined by f (x⃗) = ⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ g0(x⃗) if R0(x⃗) g1(x⃗) if R1(x⃗) and not R0(x⃗) ... gm−1(x⃗) if Rm−1(x⃗) and none of the previous hold gm(x⃗) otherwise 38 CHAPTER 2. RECURSIVE FUNCTIONS is also primitive recursive. Proof. When m = 1, this is just the function defined by f (x⃗) = cond(χ¬R0(x⃗), g0(x⃗), g1(x⃗)). For m greater than 1, one can just compose definitions of this form. □ 2.9 Bounded Minimization It is often useful to define a function as the least number satisfying some property or relation P . If P is decidable, we can compute this function simply by trying out all the possible numbers, 0, 1, 2, . . . , until we find the least one satisfying P . This kind of unbounded search takes us out of the realm of primitive recursive functions. However, if we're only interested in the least number less than some independently given bound, we stay primitive recursive. In other words, and a bit more generally, suppose we have a primitive recursive relation R(x, z ). Consider the function that maps x and y to the least z < y such that R(x, z ). It, too, can be computed, by testing whether R(x,0), R(x,1), . . . , R(x, y − 1). But why is it primitive recursive? Proposition 2.19. If R(x⃗, z ) is primitive recursive, so is the function mR(x⃗, y) which returns the least z less than y such that R(x⃗, z ) holds, if there is one, and y otherwise. We will write the function mR as (min z < y)R(x⃗, z ), Proof. Note than there can be no z < 0 such that R(x⃗, z ) since there is no z < 0 at all. So mR(x⃗,0) = 0. In case the bound is of the form y + 1 we have three cases: (a) There is a z < y such that R(x⃗, z ), in which case mR(x⃗, y + 1) = mR(x⃗, y). (b) There is no such z < y but R(x⃗, y) holds, then 39 2.10. PRIMES mR(x⃗, y + 1) = y . (c) There is no z < y + 1 such that R(x⃗, z ), then mR(z⃗, y + 1) = y + 1. So, mR(x⃗,0) = 0 mR(x⃗, y + 1) = ⎧⎪⎪⎪⎨⎪⎪⎪⎩ mR(x⃗, y) if mR(x⃗, y) ≠ y y if mR(x⃗, y) = y and R(x⃗, y) y + 1 otherwise. Note that there is a z < y such that R(x⃗, z ) iff mR(x⃗, y) ≠ y . □ 2.10 Primes Bounded quantification and bounded minimization provide us with a good deal of machinery to show that natural functions and relations are primitive recursive. For example, consider the relation "x divides y", written x | y . The relation x | y holds if division of y by x is possible without remainder, i.e., if y is an integer multiple of x . (If it doesn't hold, i.e., the remainder when dividing x by y is > 0, we write x ∤ y .) In other words, x | y iff for some z , x * z = y . Obviously, any such z , if it exists, must be ≤ y . So, we have that x | y iff for some z ≤ y , x * z = y . We can define the relation x | y by bounded existential quantification from = and multiplication by x | y ⇔ (∃z ≤ y) (x * z ) = y . We've thus shown that x | y is primitive recursive. A natural number x is prime if it is neither 0 nor 1 and is only divisible by 1 and itself. In other words, prime numbers are such that, whenever y | x , either y = 1 or y = x . To test if x is prime, we only have to check if y | x for all y ≤ x , since if y > x , then automatically y ∤ x . So, the relation Prime(x), which holds iff x is prime, can be defined by Prime(x) ⇔ x ≥ 2 ∧ (∀y ≤ x) (y | x → y = 1 ∨ y = x) and is thus primitive recursive. 40 CHAPTER 2. RECURSIVE FUNCTIONS The primes are 2, 3, 5, 7, 11, etc. Consider the function p(x) which returns the xth prime in that sequence, i.e., p(0) = 2, p(1) = 3, p(2) = 5, etc. (For convenience we will often write p(x) as px (p0 = 2, p1 = 3, etc.) If we had a function nextPrime(x), which returns the first prime number larger than x , p can be easily defined using primitive recursion: p(0) = 2 p(x + 1) = nextPrime(p(x)) Since nextPrime(x) is the least y such that y > x and y is prime, it can be easily computed by unbounded search. But it can also be defined by bounded minimization, thanks to a result due to Euclid: there is always a prime number between x and x ! + 1. nextPrime(x) = (min y ≤ x ! + 1) (y > x ∧ Prime(y)). This shows, that nextPrime(x) and hence p(x) are (not just computable but) primitive recursive. (If you're curious, here's a quick proof of Euclid's theorem. Suppose pn is the largest prime ≤ x and consider the product p = p0 * p1 * * * * * pn of all primes ≤ x . Either p +1 is prime or there is a prime between x and p +1. Why? Suppose p +1 is not prime. Then some prime number q | p + 1 where q < p + 1. None of the primes ≤ x divide p + 1. (By definition of p, each of the primes pi ≤ x divides p, i.e., with remainder 0. So, each of the primes pi ≤ x divides p + 1 with remainder 1, and so pi ∤ p + 1.) Hence, q is a prime > x and < p + 1. And p ≤ x !, so there is a prime > x and ≤ x ! + 1.) 2.11 Sequences The set of primitive recursive functions is remarkably robust. But we will be able to do even more once we have developed a adequate means of handling sequences. We will identify finite 41 2.11. SEQUENCES sequences of natural numbers with natural numbers in the following way: the sequence ⟨a0,a1,a2, . . . ,ak ⟩ corresponds to the number pa0+10 * p a1+1 1 * p a2+1 2 * * * * * p ak+1 k . We add one to the exponents to guarantee that, for example, the sequences ⟨2,7,3⟩ and ⟨2,7,3,0,0⟩ have distinct numeric codes. We can take both 0 and 1 to code the empty sequence; for concreteness, let Λ denote 0. The reason that this coding of sequences works is the so-called Fundamental Theorem of Arithmetic: every natural number n ≥ 2 can be written in one and only one way in the form n = pa00 * p a1 1 * * * * * p ak k with ak ≥ 1. This guarantees that the mapping ⟨⟩(a0, . . . ,ak ) = ⟨a0, . . . ,ak ⟩ is injective: different sequences are mapped to different numbers; to each number only at most one sequence corresponds. We'll now show that the operations of determining the length of a sequence, determining its i th element, appending an element to a sequence, and concatenating two sequences, are all primitive recursive. Proposition 2.20. The function len(s ), which returns the length of the sequence s , is primitive recursive. Proof. Let R(i, s ) be the relation defined by R(i, s ) iff pi | s ∧ pi+1 ∤ s . R is clearly primitive recursive. Whenever s is the code of a nonempty sequence, i.e., s = pa0+10 * * * * * p ak+1 k , R(i, s ) holds if pi is the largest prime such that pi | s , i.e., i = k . The length of s thus is i+1 iff pi is the largest prime that divides s , 42 CHAPTER 2. RECURSIVE FUNCTIONS so we can let len(s ) = {︄ 0 if s = 0 or s = 1 1 + (min i < s )R(i, s ) otherwise We can use bounded minimization, since there is only one i that satisfies R(s , i ) when s is a code of a sequence, and if i exists it is less than s itself. □ Proposition 2.21. The function append(s ,a), which returns the result of appending a to the sequence s , is primitive recursive. Proof. append can be defined by: append(s ,a) = {︄ 2a+1 if s = 0 or s = 1 s * pa+1len(s ) otherwise. □ Proposition 2.22. The function element(s , i ), which returns the i th element of s (where the initial element is called the 0th), or 0 if i is greater than or equal to the length of s , is primitive recursive. Proof. Note that a is the i th element of s iff pa+1i is the largest power of pi that divides s , i.e., pa+1i | s but p a+2 i ∤ s . So: element(s , i ) = {︄ 0 if i ≥ len(s ) (min a < s ) (pa+2i ∤ s ) otherwise. □ Instead of using the official names for the functions defined above, we introduce a more compact notation. We will use (s )i instead of element(s , i ), and ⟨s0, . . . , sk ⟩ to abbreviate append(append(. . . append(Λ, s0) . . . ), sk ). Note that if s has length k , the elements of s are (s )0, . . . , (s )k−1. 43 2.11. SEQUENCES Proposition 2.23. The function concat(s , t ), which concatenates two sequences, is primitive recursive. Proof. We want a function concat with the property that concat(⟨a0, . . . ,ak ⟩, ⟨b0, . . . ,bl ⟩) = ⟨a0, . . . ,ak ,b0, . . . ,bl ⟩. We'll use a "helper" function hconcat(s , t,n) which concatenates the first n symbols of t to s . This function can be defined by primitive recursion as follows: hconcat(s , t,0) = s hconcat(s , t,n + 1) = append(hconcat(s , t,n), (t )n) Then we can define concat by concat(s , t ) = hconcat(s , t, len(t )). □ We will write s ⌒ t instead of concat(s , t ). It will be useful for us to be able to bound the numeric code of a sequence in terms of its length and its largest element. Suppose s is a sequence of length k , each element of which is less than or equal to some number x . Then s has at most k prime factors, each at most pk−1, and each raised to at most x + 1 in the prime factorization of s . In other words, if we define sequenceBound(x,k ) = pk *(x+1)k−1 , then the numeric code of the sequence s described above is at most sequenceBound(x,k ). Having such a bound on sequences gives us a way of defining new functions using bounded search. For example, we can define concat using bounded search. All we need to do is write down a primitive recursive specification of the object (number of the concatenated sequence) we are looking for, and a bound on how far to look. The following works: concat(s , t ) = (min v < sequenceBound(s + t, len(s ) + len(t ))) 44 CHAPTER 2. RECURSIVE FUNCTIONS (len(v ) = len(s ) + len(t ) ∧ (∀i < len(s )) ((v )i = (s )i ) ∧ (∀ j < len(t )) ((v )len(s )+ j = (t ) j )) Proposition 2.24. The function subseq(s , i,n) which returns the subsequence of s of length n beginning at the i th element, is primitive recursive. Proof. Exercise. □ 2.12 Trees Sometimes it is useful to represent trees as natural numbers, just like we can represent sequences by numbers and properties of and operations on them by primitive recursive relations and functions on their codes. We'll use sequences and their codes to do this. A tree can be either a single node (possibly with a label) or else a node (possibly with a label) connected to a number of subtrees. The node is called the root of the tree, and the subtrees it is connected to its immediate subtrees. We code trees recursively as a sequence ⟨k,d1, . . . ,dk ⟩, where k is the number of immediate subtrees and d1, . . . , dk the codes of the immediate subtrees. If the nodes have labels, they can be included after the immediate subtrees. So a tree consisting just of a single node with label l would be coded by ⟨0, l ⟩, and a tree consisting of a root (labelled l1) connected to two single nodes (labelled l2, l3) would be coded by ⟨2, ⟨0, l2⟩, ⟨0, l3⟩, l1⟩. Proposition 2.25. The function SubtreeSeq(t ), which returns the code of a sequence the elements of which are the codes of all subtrees of the tree with code t , is primitive recursive. Proof. First note that ISubtrees(t ) = subseq(t,1, (t )0) is primitive recursive and returns the codes of the immediate subtrees of a tree t . Now we can define a helper function hSubtreeSeq(t,n) 45 2.13. OTHER RECURSIONS which computes the sequence of all subtrees which are n nodes remove from the root. The sequence of subtrees of t which is 0 nodes removed from the root-in other words, begins at the root of t-is the sequence consisting just of t . To obtain a sequence of all level n +1 subtrees of t , we concatenate the level n subtrees with a sequence consisting of all immediate subtrees of the level n subtrees. To get a list of all these, note that if f (x) is a primitive recursive function returning codes of sequences, then g f (s ,k ) = f ((s )0)⌒ . . . ⌒ f ((s )k ) is also primivive recursive: g (s ,0) = f ((s )0) g (s ,k + 1) = g (s ,k )⌒ f ((s )k+1) For instance, if s is a sequence of trees, then h(s ) = gISubtrees(s , len(s )) gives the sequence of the immediate subtrees of the elements of s . We can use it to define hSubtreeSeq by hSubtreeSeq(t,0) = ⟨t⟩ hSubtreeSeq(t,n + 1) = hSubtreeSeq(t,n)⌒ h(hSubtree(t,n)). The maximum level of subtrees in a tree coded by t , i.e., the maximum distance between the root and a leaf node, is bounded by the code t . So a sequence of codes of all subtrees of the tree coded by t is given by hSubtreeSeq(t, t ). □ 2.13 Other Recursions Using pairing and sequencing, we can justify more exotic (and useful) forms of primitive recursion. For example, it is often useful to define two functions simultaneously, such as in the following definition: h0(x⃗,0) = f0(x⃗) h1(x⃗,0) = f1(x⃗) h0(x⃗, y + 1) = g0(x⃗, y,h0(x⃗, y),h1(x⃗, y)) h1(x⃗, y + 1) = g1(x⃗, y,h0(x⃗, y),h1(x⃗, y)) 46 CHAPTER 2. RECURSIVE FUNCTIONS This is an instance of simultaneous recursion. Another useful way of defining functions is to give the value of h(x⃗, y + 1) in terms of all the values h(x⃗,0), . . . , h(x⃗, y), as in the following definition: h(x⃗,0) = f (x⃗) h(x⃗, y + 1) = g (x⃗, y, ⟨h(x⃗,0), . . . ,h(x⃗, y)⟩). The following schema captures this idea more succinctly: h(x⃗, y) = g (x⃗, y, ⟨h(x⃗,0), . . . ,h(x⃗, y − 1)⟩) with the understanding that the last argument to g is just the empty sequence when y is 0. In either formulation, the idea is that in computing the "successor step," the function h can make use of the entire sequence of values computed so far. This is known as a course-of-values recursion. For a particular example, it can be used to justify the following type of definition: h(x⃗, y) = {︄ g (x⃗, y,h(x⃗,k (x⃗, y))) if k (x⃗, y) < y f (x⃗) otherwise In other words, the value of h at y can be computed in terms of the value of h at any previous value, given by k . You should think about how to obtain these functions using ordinary primitive recursion. One final version of primitive recursion is more flexible in that one is allowed to change the parameters (side values) along the way: h(x⃗, y) = f (x⃗) h(x⃗, y + 1) = g (x⃗, y,h(k (x⃗), y)) This, too, can be simulated with ordinary primitive recursion. (Doing so is tricky. For a hint, try unwinding the computation by hand.) 47 2.14. NON-PRIMITIVE RECURSIVE FUNCTIONS 2.14 Non-Primitive Recursive Functions The primitive recursive functions do not exhaust the intuitively computable functions. It should be intuitively clear that we can make a list of all the unary primitive recursive functions, f0, f1, f2, . . . such that we can effectively compute the value of fx on input y ; in other words, the function g (x, y), defined by g (x, y) = fx (y) is computable. But then so is the function h(x) = g (x,x) + 1 = fx (x) + 1. For each primitive recursive function fi , the value of h and fi differ at i . So h is computable, but not primitive recursive; and one can say the same about g . This is an "effective" version of Cantor's diagonalization argument. One can provide more explicit examples of computable functions that are not primitive recursive. For example, let the notation g n(x) denote g (g (. . . g (x))), with n g 's in all; and define a sequence g0, g1, . . . of functions by g0(x) = x + 1 gn+1(x) = g xn (x) You can confirm that each function gn is primitive recursive. Each successive function grows much faster than the one before; g1(x) is equal to 2x , g2(x) is equal to 2x *x , and g3(x) grows roughly like an exponential stack of x 2's. Ackermann's function is essentially the functionG (x) = gx (x), and one can show that this grows faster than any primitive recursive function. Let us return to the issue of enumerating the primitive recursive functions. Remember that we have assigned symbolic notations to each primitive recursive function; so it suffices to 48 CHAPTER 2. RECURSIVE FUNCTIONS enumerate notations. We can assign a natural number #(F ) to each notation F , recursively, as follows: #(0) = ⟨0⟩ #(S ) = ⟨1⟩ #(P ni ) = ⟨2,n, i ⟩ #(Compk,l [H ,G0, . . . ,Gk−1]) = ⟨3,k, l ,#(H ),#(G0), . . . ,#(Gk−1)⟩ #(Recl [G ,H ]) = ⟨4, l ,#(G ),#(H )⟩ Here we are using the fact that every sequence of numbers can be viewed as a natural number, using the codes from the last section. The upshot is that every code is assigned a natural number. Of course, some sequences (and hence some numbers) do not correspond to notations; but we can let fi be the unary primitive recursive function with notation coded as i , if i codes such a notation; and the constant 0 function otherwise. The net result is that we have an explicit way of enumerating the unary primitive recursive functions. (In fact, some functions, like the constant zero function, will appear more than once on the list. This is not just an artifact of our coding, but also a result of the fact that the constant zero function has more than one notation. We will later see that one can not computably avoid these repetitions; for example, there is no computable function that decides whether or not a given notation represents the constant zero function.) We can now take the function g (x, y) to be given by fx (y), where fx refers to the enumeration we have just described. How do we know that g (x, y) is computable? Intuitively, this is clear: to compute g (x, y), first "unpack" x , and see if it is a notation for a unary function. If it is, compute the value of that function on input y . You may already be convinced that (with some work!) one can write a program (say, in Java or C++) that does this; and now we can appeal to the Church-Turing thesis, which says that anything that, intuitively, is computable can be computed by a Turing machine. 49 2.15. PARTIAL RECURSIVE FUNCTIONS Of course, a more direct way to show that g (x, y) is computable is to describe a Turing machine that computes it, explicitly. This would, in particular, avoid the Church-Turing thesis and appeals to intuition. Soon we will have built up enough machinery to show that g (x, y) is computable, appealing to a model of computation that can be simulated on a Turing machine: namely, the recursive functions. 2.15 Partial Recursive Functions To motivate the definition of the recursive functions, note that our proof that there are computable functions that are not primitive recursive actually establishes much more. The argument was simple: all we used was the fact was that it is possible to enumerate functions f0, f1, . . . such that, as a function of x and y , fx (y) is computable. So the argument applies to any class of functions that can be enumerated in such a way. This puts us in a bind: we would like to describe the computable functions explicitly; but any explicit description of a collection of computable functions cannot be exhaustive! The way out is to allow partial functions to come into play. We will see that it is possible to enumerate the partial computable functions. In fact, we already pretty much know that this is the case, since it is possible to enumerate Turing machines in a systematic way. We will come back to our diagonal argument later, and explore why it does not go through when partial functions are included. The question is now this: what do we need to add to the primitive recursive functions to obtain all the partial recursive functions? We need to do two things: 1. Modify our definition of the primitive recursive functions to allow for partial functions as well. 2. Add something to the definition, so that some new partial functions are included. 50 CHAPTER 2. RECURSIVE FUNCTIONS The first is easy. As before, we will start with zero, successor, and projections, and close under composition and primitive recursion. The only difference is that we have to modify the definitions of composition and primitive recursion to allow for the possibility that some of the terms in the definition are not defined. If f and g are partial functions, we will write f (x) ↓ to mean that f is defined at x , i.e., x is in the domain of f ; and f (x) ↑ to mean the opposite, i.e., that f is not defined at x . We will use f (x) ≃ g (x) to mean that either f (x) and g (x) are both undefined, or they are both defined and equal. We will use these notations for more complicated terms as well. We will adopt the convention that if h and g0, . . . , gk all are partial functions, then h(g0(x⃗), . . . , gk (x⃗)) is defined if and only if each gi is defined at x⃗ , and h is defined at g0(x⃗), . . . , gk (x⃗). With this understanding, the definitions of composition and primitive recursion for partial functions is just as above, except that we have to replace "=" by "≃". What we will add to the definition of the primitive recursive functions to obtain partial functions is the unbounded search operator. If f (x, z⃗ ) is any partial function on the natural numbers, define μx f (x, z⃗ ) to be the least x such that f (0, z⃗ ), f (1, z⃗ ), . . . , f (x, z⃗ ) are all defined, and f (x, z⃗ ) = 0, if such an x exists with the understanding that μx f (x, z⃗ ) is undefined otherwise. This defines μx f (x, z⃗ ) uniquely. Note that our definition makes no reference to Turing machines, or algorithms, or any specific computational model. But like composition and primitive recursion, there is an operational, computational intuition behind unbounded search. When it comes to the computability of a partial function, arguments where the function is undefined correspond to inputs for which the computation does not halt. The procedure for computing 51 2.16. THE NORMAL FORM THEOREM μx f (x, z⃗ ) will amount to this: compute f (0, z⃗ ), f (1, z⃗ ), f (2, z⃗ ) until a value of 0 is returned. If any of the intermediate computations do not halt, however, neither does the computation of μx f (x, z⃗ ). If R(x, z⃗ ) is any relation, μx R(x, z⃗ ) is defined to be μx (1 − χR(x, z⃗ )). In other words, μx R(x, z⃗ ) returns the least value of x such that R(x, z⃗ ) holds. So, if f (x, z⃗ ) is a total function, μx f (x, z⃗ ) is the same as μx (f (x, z⃗ ) = 0). But note that our original definition is more general, since it allows for the possibility that f (x, z⃗ ) is not everywhere defined (whereas, in contrast, the characteristic function of a relation is always total). Definition 2.26. The set of partial recursive functions is the smallest set of partial functions from the natural numbers to the natural numbers (of various arities) containing zero, successor, and projections, and closed under composition, primitive recursion, and unbounded search. Of course, some of the partial recursive functions will happen to be total, i.e., defined for every argument. Definition 2.27. The set of recursive functions is the set of partial recursive functions that are total. A recursive function is sometimes called "total recursive" to emphasize that it is defined everywhere. 2.16 The Normal Form Theorem Theorem 2.28 (Kleene's Normal Form Theorem). There is a primitive recursive relationT (e,x, s ) and a primitive recursive function U (s ), with the following property: if f is any partial recursive function, 52 CHAPTER 2. RECURSIVE FUNCTIONS then for some e , f (x) ≃ U (μs T (e,x, s )) for every x . The proof of the normal form theorem is involved, but the basic idea is simple. Every partial recursive function has an index e , intuitively, a number coding its program or definition. If f (x) ↓, the computation can be recorded systematically and coded by some number s , and that s codes the computation of f on input x can be checked primitive recursively using only x and the definition e . This means that T is primitive recursive. Given the full record of the computation s , the "upshot" of s is the value of f (x), and it can be obtained from s primitive recursively as well. The normal form theorem shows that only a single unbounded search is required for the definition of any partial recursive function. We can use the numbers e as "names" of partial recursive functions, and write φe for the function f defined by the equation in the theorem. Note that any partial recursive function can have more than one index-in fact, every partial recursive function has infinitely many indices. 2.17 The Halting Problem The halting problem in general is the problem of deciding, given the specification e (e.g., program) of a computable function and a number n, whether the computation of the function on input n halts, i.e., produces a result. Famously, Alan Turing proved that this problem itself cannot be solved by a computable function, i.e., the function h(e,n) = {︄ 1 if computation e halts on input n 0 otherwise, is not computable. 53 2.17. THE HALTING PROBLEM In the context of partial recursive functions, the role of the specification of a program may be played by the index e given in Kleene's normal form theorem. If f is a partial recursive function, any e for which the equation in the normal form theorem holds, is an index of f . Given a number e , the normal form theorem states that φe (x) ≃ U (μs T (e,x, s )) is partial recursive, and for every partial recursive f : N → N, there is an e ∈ N such that φe (x) ≃ f (x) for all x ∈ N. In fact, for each such f there is not just one, but infinitely many such e . The halting function h is defined by h(e,x) = {︄ 1 if φe (x) ↓ 0 otherwise. Note that h(e,x) = 0 if φe (x) ↑, but also when e is not the index of a partial recursive function at all. Theorem 2.29. The halting function h is not partial recursive. Proof. If h were partial recursive, we could define d (y) = {︄ 1 if h(y, y) = 0 μx x ≠ x otherwise. From this definition it follows that 1. d (y) ↓ iff φy (y) ↑ or y is not the index of a partial recursive function. 2. d (y) ↑ iff φy (y) ↓. If h were partial recursive, then d would be partial recursive as well. Thus, by the Kleene normal form theorem, it has an index ed . Consider the value of h(ed , ed ). There are two possible cases, 0 and 1. 54 CHAPTER 2. RECURSIVE FUNCTIONS 1. If h(ed , ed ) = 1 then φed (ed ) ↓. But φed ≃ d , and d (ed ) is defined iff h(ed , ed ) = 0. So h(ed , ed ) ≠ 1. 2. If h(ed , ed ) = 0 then either ed is not the index of a partial recursive function, or it is and φed (ed ) ↑. But again, φed ≃ d , and d (ed ) is undefined iff φed (ed ) ↓. The upshot is that ed cannot, after all, be the index of a partial recursive function. But if h were partial recursive, d would be too, and so our definition of ed as an index of it would be admissible. We must conclude that h cannot be partial recursive. □ 2.18 General Recursive Functions There is another way to obtain a set of total functions. Say a total function f (x, z⃗ ) is regular if for every sequence of natural numbers z⃗ , there is an x such that f (x, z⃗ ) = 0. In other words, the regular functions are exactly those functions to which one can apply unbounded search, and end up with a total function. One can, conservatively, restrict unbounded search to regular functions: Definition 2.30. The set of general recursive functions is the smallest set of functions from the natural numbers to the natural numbers (of various arities) containing zero, successor, and projections, and closed under composition, primitive recursion, and unbounded search applied to regular functions. Clearly every general recursive function is total. The difference between Definition 2.30 and Definition 2.27 is that in the latter one is allowed to use partial recursive functions along the way; the only requirement is that the function you end up with at the end is total. So the word "general," a historic relic, is a misnomer; on the surface, Definition 2.30 is less general than Definition 2.27. But, fortunately, the difference is illusory; though the definitions are different, the set of general recursive functions and the set of recursive functions are one and the same. 55 2.18. GENERAL RECURSIVE FUNCTIONS Summary In order to show that Q represents all computable functions, we need a precise model of computability that we can take as the basis for a proof. There are, of course, many models of computability, such as Turing machines. One model that plays a significant role historically-it's one of the first models proposed, and is also the one used by Gödel himself-is that of the recursive functions. The recursive functions are a class of arithmetical functions-that is, their domain and range are the natural numbers-that can be defined from a few basic functions using a few operations. The basic functions are zero, succ, and the projection functions. The operations are composition, primitive recursion, and regular minimization. Composition is simply a general version of "chaining together" functions: first apply one, then apply the other to the result. Primitive recursion defines a new function f from two functions g , h already defined, by stipulating that the value of f for 0 is given by g , and the value for any number n + 1 is given by h applied to f (n). Functions that can be defined using just these two principles are called primitive recursive. A relation is primitive recursive iff its characteristic function is. It turns out that a whole list of interesting functions and relations is primitive recursive (such as addition, multiplication, exponentiation, divisibility), and that we can define new primitive recursive functions and relations from old ones using principles such as bounded quantification and bounded minimization. In particular, this allowed us to show that we can deal with sequences of numbers in primitive recursive ways. That is, there is a way to "code" sequences of numbers as single numbers in such a way that we can compute the i -the element, the length, the concatenation of two sequences, etc., all using primitive recursive functions operating on these codes. To obtain all the computable functions, we finally added definition by regular minimization to composition and primitive recursion. A function g (x, y) is regular iff, for every y it takes the value 0 for at last one x . If f is regular, the least x such that g (x, y) = 0 al56 CHAPTER 2. RECURSIVE FUNCTIONS ways exists, and can be found simply by computing all the values of g (0, y), g (1, y), etc., until one of them is = 0. The resulting function f (y) = μx g (x, y) = 0 is the function defined by regular minimization from g . It is always total and computable. The resulting set of functions are called general recursive. One version of the Church-Turing Thesis says that the computable arithmetical functions are exactly the general recursive ones. Problems Problem 2.1. Prove Proposition 2.5 by showing that the primitive recursive definition of mult is can be put into the form required by Definition 2.1 and showing that the corresponding functions f and g are primitive recursive. Problem 2.2. Give the complete primitive recursive notation for mult. Problem 2.3. Prove Proposition 2.13. Problem 2.4. Show that f (x, y) = 2(2 . . .2 x ) }︃ y 2's is primitive recursive. Problem 2.5. Show that integer division d (x, y) = ⌊x/y⌋ (i.e., division, where you disregard everything after the decimal point) is primitive recursive. When y = 0, we stipulate d (x, y) = 0. Give an explicit definition of d using primitive recursion and composition. Problem 2.6. Suppose R(x⃗, z ) is primitive recursive. Define the function m ′R(x⃗, y) which returns the least z less than y such that R(x⃗, z ) holds, if there is one, and 0 otherwise, by primitive recursion from χR . 57 2.18. GENERAL RECURSIVE FUNCTIONS Problem 2.7. Define integer division d (x, y) using bounded minimization. Problem 2.8. Show that there is a primitive recursive function sconcat(s ) with the property that sconcat(⟨s0, . . . , sk ⟩) = s0 ⌒ . . . ⌒ sk . Problem 2.9. Show that there is a primitive recursive function tail(s ) with the property that tail(Λ) = 0 and tail(⟨s0, . . . , sk ⟩) = ⟨s1, . . . , sk ⟩. Problem 2.10. Prove Proposition 2.24. Problem 2.11. The definition of hSubtreeSeq in the proof of Proposition 2.25 in general includes repetitions. Give an alternative definition which guarantees that the code of a subtree occurs only once in the resulting list. CHAPTER 3 Arithmetization of Syntax 3.1 Introduction In order to connect computability and logic, we need a way to talk about the objects of logic (symbols, terms, formulas, derivations), operations on them, and their properties and relations, in a way amenable to computational treatment. We can do this directly, by considering computable functions and relations on symbols, sequences of symbols, and other objects built from them. Since the objects of logical syntax are all finite and built from a countable sets of symbols, this is possible for some models of computation. But other models of computation-such as the recursive functions--are restricted to numbers, their relations and functions. Moreover, ultimately we also want to be able to deal with syntax within certain theories, specifically, in theories formulated in the language of arithmetic. In these cases it is necessary to arithmetize syntax, i.e., to represent syntactic objects, operations on them, and their relations, as numbers, arithmetical functions, and arithmetical relations, respectively. The idea, which goes back to Leibniz, is to assign numbers to syntactic objects. It is relatively straightforward to assign numbers to symbols as their "codes." Some symbols pose a bit of a challenge, since, 58 59 3.1. INTRODUCTION e.g., there are infinitely many variables, and even infinitely many function symbols of each arity n. But of course it's possible to assign numbers to symbols systematically in such a way that, say, v2 and v3 are assigned different codes. Sequences of symbols (such as terms and formulas) are a bigger challenge. But if can deal with sequences of numbers purely arithmetically (e.g., by the powers-of-primes coding of sequences), we can extend the coding of individual symbols to coding of sequences of symbols, and then further to sequences or other arrangements of formulas, such as derivations. This extended coding is called "Gödel numbering." Every term, formula, and derivation is assigned a Gödel number. By coding sequences of symbols as sequences of their codes, and by chosing a system of coding sequences that can be dealt with using computable functions, we can then also deal with Gödel numbers using computable functions. In practice, all the relevant functions will be primitive recursive. For instance, computing the length of a sequence and computing the i -th element of a sequence from the code of the sequence are both primitive recursive. If the number coding the sequence is, e.g., the Gödel number of a formula A, we immediately see that the length of a formula and the (code of the) i -th symbol in a formula can also be computed from the Gödel number of A. It is a bit harder to prove that, e.g., the property of being the Gödel number of a correctly formed term, of being the Gödel number of a corret derivation is primitive recursive. It is nevertheless possible, because the sequences of interest (terms, formulas, derivations) are inductively defined. As an example, consider the operation of substitution. If A is a formula, x a variable, and t a term, then A[t/x] is the result of replacing every free occurrence of x in A by t . Now suppose we have assigned Gödel numbers to A, x , t-say, k , l , and m, respectively. The same scheme assigns a Gödel number toA[t/x], say, n. This mapping-of k , l , and m to n-is the arithmetical analog of the substitution operation. When the substitution operation maps A, x , t to A[t/x], the arithmetized substitution functions maps the Gödel numbers k , l , m to the Gödel number n. We will 60 CHAPTER 3. ARITHMETIZATION OF SYNTAX see that this function is primitive recursive. Arithmetization of syntax is not just of abstract interest, although it was originally a non-trivial insight that languages like the language of arithmetic, which do not come with mechanisms for "talking about" languages can, after all, formalize complex properties of expressions. It is then just a small step to ask what a theory in this language, such as Peano arithmetic, can prove about its own language (including, e.g., whether sentences are provable or true). This leads us to the famous limitative theorems of Gödel (about unprovability) and Tarski (the undefinability of truth). But the trick of arithmetizing syntax is also important in order to prove some important results in computability theory, e.g., about the computational prower of theories or the relationship between different models of computability. The arithmetization of syntax serves as a model for arithmetizing other objects and properties. For instance, it is similarly possible to arithmetize configurations and computations (say, of Turing machines). This makes it possible to simulate computations in one model (e.g., Turing machines) in another (e.g., recursive functions). 3.2 Coding Symbols The basic languageLof first order logic makes use of the symbols ⊥ ¬ ∨ ∧ → ∀ ∃ = ( ) , together with countable sets of variables and constant symbols, and countable sets of function symbols and predicate symbols of arbitrary arity. We can assign codes to each of these symbols in such a way that every symbol is assigned a unique number as its code, and no two different symbols are assigned the same number. We know that this is possible since the set of all symbols is countable and so there is a bijection between it and the set of natural numbers. But we want to make sure that we can recover the symbol (as well as some information about it, e.g., the arity of a function symbol) from its code in a computable way. There are 61 3.2. CODING SYMBOLS many possible ways of doing this, of course. Here is one such way, which uses primitive recursive functions. (Recall that ⟨n0, . . . ,nk ⟩ is the number coding the sequence of numbers n0, . . . , nk .) Definition 3.1. If s is a symbol of L, let the symbol code cs be defined as follows: 1. If s is among the logical symbols, cs is given by the following table: ⊥ ¬ ∨ ∧ → ∀ ⟨0,0⟩ ⟨0,1⟩ ⟨0,2⟩ ⟨0,3⟩ ⟨0,4⟩ ⟨0,5⟩ ∃ = ( ) , ⟨0,6⟩ ⟨0,7⟩ ⟨0,8⟩ ⟨0,9⟩ ⟨0,10⟩ 2. If s is the i -th variable vi , then cs = ⟨1, i ⟩. 3. If s is the i -th constant symbol cni , then cs = ⟨2, i ⟩. 4. If s is the i -th n-ary function symbol f ni , then cs = ⟨3,n, i ⟩. 5. If s is the i -th n-ary predicate symbol P ni , then cs = ⟨4,n, i ⟩. Proposition 3.2. The following relations are primitive recursive: 1. Fn(x,n) iff x is the code of f ni for some i , i.e., x is the code of an n-ary function symbol. 2. Pred(x,n) iff x is the code of P ni for some i or x is the code of = and n = 2, i.e., x is the code of an n-ary predicate symbol. 62 CHAPTER 3. ARITHMETIZATION OF SYNTAX Definition 3.3. If s0, . . . , sn−1 is a sequence of symbols, its Gödel number is ⟨cs0, . . . , csn−1⟩. Note that codes and Gödel numbers are different things. For instance, the variable v5 has a code cv5 = ⟨1,5⟩ = 2 2 * 36. But the variable v5 considered as a term is also a sequence of symbols (of length 1). The Gödel number #v5# of the term v5 is ⟨cv5⟩ = 2 cv5+1 = 22 2 *36+1. Example 3.4. Recall that if k0, . . . , kn−1 is a sequence of numbers, then the code of the sequence ⟨k0, . . . ,kn−1⟩ in the powerof-primes coding is 2k0+1 * 3k1+1 * * * * * pkn−1n−1 , where pi is the i -th prime (starting with p0 = 2). So for instance, the formula v0 = 0, or, more explicitly, =(v0, c0), has the Gödel number ⟨c=, c(, cv0, c,, cc0, c)⟩. Here, c= is ⟨0,7⟩ = 20+1 * 37=1, cv0 is ⟨1,0⟩ = 2 1+1 * 30+1, etc. So #= (v0, c0) # is 2c=+1 * 3c(+1 * 5cv0+1 * 7c,+1 * 11cc0+1 * 13c)+1 = 22 1 *38+1 * 32 1 *39+1 * 52 2 *31+1 * 72 1 *311+1 * 112 3 *31+1 * 132 1 *310+1 = 213 123 * 339 367 * 513 * 7354 295 * 1125 * 13118 099. 3.3 Coding Terms A term is simply a certain kind of sequence of symbols: it is built up inductively from constants and variables according to the formation rules for terms. Since sequences of symbols can be coded as numbers-using a coding scheme for the symbols plus a way to code sequences of numbers-assigning Gödel numbers to terms is not difficult. The challenge is rather to show that the 63 3.3. CODING TERMS property a number has if it is the Gödel number of a correctly formed term is computable, or in fact primitive recursive. Variables and constant symbols are the simplest terms, and testing whether x is the Gödel number of such a term is easy: Var(x) holds if x is #vi # for some i . In other words, x is a sequence of length 1 and its single element (x)0 is the code of some variable vi , i.e., x is ⟨⟨1, i ⟩⟩ for some i . Similarly, Const(x) holds if x is #ci # for some i . Both of these relations are primitive recursive, since if such an i exists, it must be < x : Var(x) ⇔ (∃i < x) x = ⟨⟨1, i ⟩⟩ Const(x) ⇔ (∃i < x) x = ⟨⟨2, i ⟩⟩ Proposition 3.5. The relations Term(x) and ClTerm(x) which hold iff x is the Gödel number of a term or a closed term, respectively, are primitive recursive. Proof. A sequence of symbols s is a term iff there is a sequence s0, . . . , sk−1 = s of terms which records how the term s was formed from constant symbols and variables according to the formation rules for terms. To express that such a putative formation sequence follows the formation rules it has to be the case that, for each i < k , either 1. si is a variable v j , or 2. si is a constant symbol c j , or 3. si is built from n terms t1, . . . , tn occurring prior to place i using an n-place function symbol f nj . To show that the corresponding relation on Gödel numbers is primitive recursive, we have to express this condition primitive recursively, i.e., using primitive recursive functions, relations, and bounded quantification. Suppose y is the number that codes the sequence s0, . . . , sk−1, i.e., y = ⟨ #s0#, . . . , #sk #⟩. It codes a formation sequence for the term with Gödel number x iff for all i < k : 64 CHAPTER 3. ARITHMETIZATION OF SYNTAX 1. Var((y)i ), or 2. Const((y)i ), or 3. there is an n and a number z = ⟨z1, . . . , zn⟩ such that each zl is equal to some (y)i ′ for i ′ < i and (y)i = #f nj ( # ⌒ flatten(z )⌒ #)#, and moreover (y)k−1 = x . (The function flatten(z ) turns the sequence ⟨ #t1#, . . . , #tn#⟩ into #t1, . . . , tn# and is primitive recursive.) The indices j , n, the Gödel numbers zl of the terms tl , and the code z of the sequence ⟨z1, . . . , zn⟩, in (3) are all less than y . We can replace k above with len(y). Hence we can express "y is the code of a formation sequence of the term with Gödel number x" in a way that shows that this relation is primitive recursive. We now just have to convince ourselves that there is a primitive recursive bound on y . But if x is the Gödel number of a term, it must have a formation sequence with at most len(x) terms (since every term in the formation sequence of s must start at some place in s , and no two subterms can start at the same place). The Gödel number of each subterm of s is of course ≤ x . Hence, there always is a formation sequence with code ≤ x len(x). For ClTerm, simply leave out the clause for variables. □ Proposition 3.6. The function num(n) = #n# is primitive recursive. Proof. We define num(n) by primitive recursion: num(0) = #0# num(n + 1) = #′(# ⌒ num(n)⌒ #)#. □ 65 3.4. CODING FORMULAS 3.4 Coding Formulas Proposition 3.7. The relation Atom(x) which holds iff x is the Gödel number of an atomic formula, is primitive recursive. Proof. The number x is the Gödel number of an atomic formula iff one of the following holds: 1. There are n, j < x , and z < x such that for each i < n, Term((z )i ) and x = #P nj ( # ⌒ flatten(z )⌒ #)#. 2. There are z1, z2 < x such that Term(z1), Term(z2), and x = #=(# ⌒ z1 ⌒ #,# ⌒ z2 ⌒ #)#. 3. x = #⊥#. □ Proposition 3.8. The relation Frm(x) which holds iff x is the Gödel number of a formula is primitive recursive. Proof. A sequence of symbols s is a formula iff there is formation sequence s0, . . . , sk−1 = s of formula which records how s was formed from atomic formulas according to the formation rules. The code for each si (and indeed of the code of the sequence ⟨s0, . . . , sk−1⟩ is less than the code x of s . □ Proposition 3.9. The relation FreeOcc(x, z, i ), which holds iff the i -th symbol of the formula with Gödel number x is a free occurrence of the variable with Gödel number z , is primitive recursive. Proof. Exercise. □ 66 CHAPTER 3. ARITHMETIZATION OF SYNTAX Proposition 3.10. The property Sent(x) which holds iff x is the Gödel number of a sentence is primitive recursive. Proof. A sentence is a formula without free occurrences of variables. So Sent(x) holds iff (∀i < len(x)) (∀z < x) ((∃ j < z ) z = #v j # →¬FreeOcc(x, z, i )). □ 3.5 Substitution Recall that substitution is the operation of replacing all free occurrences of a variable u in a formula A by a term t , written A[t/u]. This operation, when carried out on Gödel numbers of variables, formulas, and terms, is primitive recursive. Proposition 3.11. There is a primitive recursive function Subst(x, y, z ) with the property that Subst( #A#, #t #, #u#) = #A[t/u]# Proof. We can then define a function hSubst by primitive recursion as follows: hSubst(x, y, z,0) = Λ hSubst(x, y, z, i + 1) ={︄ hSubst(x, y, z, i )⌒ y if FreeOcc(x, z, i ) append(hSubst(x, y, z, i ), (x)i ) otherwise. Subst(x, y, z ) can now be defined as hSubst(x, y, z, len(x)). □ Proposition 3.12. The relation FreeFor(x, y, z ), which holds iff the term with Gödel number y is free for the variable with Gödel number z 67 3.6. DERIVATIONS IN NATURAL DEDUCTION in the formula with Gödel number x , is primitive recursive. Proof. Exercise. □ 3.6 Derivations in Natural Deduction In order to arithmetize derivations, we must represent derivations as numbers. Since derivations are trees of formulas where each inference carries one or two labels, a recursive representation is the most obvious approach: we represent a derivation as a tuple, the components of which are the number of immediate sub-derivations leading to the premises of the last inference, the representations of these sub-derivations, and the end-formula, the discharge label of the last inference, and a number indicating the type of the last inference. Definition 3.13. If δ is a derivation in natural deduction, then #δ# is defined inductively as follows: 1. If δ consists only of the assumption A, then #δ# is ⟨0, #A#,n⟩. The number n is 0 if it is an undischarged assumption, and the numerical label otherwise. 2. If δ ends in an inference with one, two, or three premises, then #δ# is ⟨1, #δ1#, #A#,n,k⟩, ⟨2, #δ1#, #δ2#, #A#,n,k⟩, or ⟨3, #δ1#, #δ2#, #δ3#, #A#,n,k⟩, respectively. Here δ1, δ2, δ3 are the sub-derivations ending in the premise(s) of the last inference in δ, A is the conclusion of the last inference in δ, n is the discharge label of the last inference (0 if the inference does not discharge any as68 CHAPTER 3. ARITHMETIZATION OF SYNTAX sumptions), and k is given by the following table according to which rule was used in the last inference. Rule: ∧Intro ∧Elim ∨Intro ∨Elim k : 1 2 3 4 Rule: →Intro →Elim ¬Intro ¬Elim k : 5 6 7 8 Rule: ⊥I ⊥C ∀Intro ∀Elim k : 9 10 11 12 Rule: ∃Intro ∃Elim =Intro =Elim k : 13 14 15 16 Example 3.14. Consider the very simple derivation [A ∧ B]1 ∧ElimA 1 →Intro (A ∧ B) → A The Gödel number of the assumption would be d0 = ⟨0, #A ∧ B#,1⟩. The Gödel number of the derivation ending in the conclusion of ∧Elim would be d1 = ⟨1,d0, #A#,0,2⟩ (1 since ∧Elim has one premise, the Gödel number of conclusion A, 0 because no assumption is discharged, and 2 is the number coding ∧Elim). The Gödel number of the entire derivation then is ⟨1,d1, #((A ∧ B) → A)#,1,5⟩, i.e., ⟨1, ⟨1, ⟨0, #(A ∧ B)#,1⟩, #A#,0,2⟩, #((A ∧ B) → A)#,1,5⟩. Having settled on a representation of derivations, we must also show that we can manipulate Gödel numbers of such derivations primitive recursively, and express their essential properties and relations. Some operations are simple: e.g., given a Gödel number d of a derivation, EndFmla(d ) = (d )(d )0+1 gives us the Gödel number of its end-formula, DischargeLabel(d ) = (d )(d )0+2 gives us the discharge label and LastRule(d ) = (d )(d )0+3 the number indicating the type of the last inference. Some are much 69 3.6. DERIVATIONS IN NATURAL DEDUCTION harder. We'll at least sketch how to do this. The goal is to show that the relation "δ is a derivation of A from Γ" is a primitive recursive relation of the Gödel numbers of δ and A. Proposition 3.15. The following relations are primitive recursive: 1. A occurs as an assumption in δ with label n. 2. All assumptions in δ with label n are of the form A (i.e., we can discharge the assumption A using label n in δ). Proof. We have to show that the corresponding relations between Gödel numbers of formulas and Gödel numbers of derivations are primitive recursive. 1. We want to show that Assum(x,d,n), which holds if x is the Gödel number of an assumption of the derivation with Gödel number d labelled n, is primitive recursive. This is the case if the derivation with Gödel number ⟨0,x,n⟩ is a sub-derivation of d . Note that the way we code derivations is a special case of the coding of trees introduced in section 2.12, so the primitive recursive function SubtreeSeq(d ) gives a sequence of Gödel numbers of all sub-derivations of d (of length a most d). So we can define Assum(x,d,n) ⇔ (∃i < d ) (SubtreeSeq(d ))i = ⟨0,x,n⟩. 2. We want to show that Discharge(x,d,n), which holds if all assumptions with label n in the derivation with Gödel number d all are the formula with Gödel number x . But this relation holds iff (∀y < d ) (Assum(y,d,n) → y = x). □ 70 CHAPTER 3. ARITHMETIZATION OF SYNTAX Proposition 3.16. The property Correct(d ) which holds iff the last inference in the derivation δ with Gödel number d is correct, is primitive recursive. Proof. Here we have to show that for each rule of inference R the relation FollowsByR(d ) is primitive recursive, where FollowsByR(d ) holds iff d is the Gödel number of derivation δ, and the end-formula of δ follows by a correct application of R from the immediate sub-derivations of δ. A simple case is that of the ∧Intro rule. If δ ends in a correct ∧Intro inference, it looks like this: δ1 A δ2 B ∧IntroA ∧ B Then the Gödel number d of δ is ⟨2,d1,d2, #(A ∧ B)#,0,k⟩ where EndFmla(d1) = #A#, EndFmla(d2) = #B#, n = 0, and k = 1. So we can define FollowsBy∧Intro(d ) as (d )0 = 2 ∧DischargeLabel(d ) = 0 ∧ LastRule(d ) = 1 ∧ EndFmla(d ) = #(# ⌒ EndFmla((d )1)⌒ #∧# ⌒ EndFmla((d )2)⌒ #)#. Another simple example if the =Intro rule. Here the premise is an empty derivation, i.e., (d )1 = 0, and no discharge label, i.e., n = 0. However, A must be of the form t = t , for a closed term t . Here, a primitive recursive definition is (d )0 = 1 ∧ (d )1 = 0 ∧DischargeLabel(d ) = 0 ∧ (∃t < d ) (ClTerm(t )∧EndFmla(d ) = #=(# ⌒ t ⌒ #,# ⌒ t ⌒ #)#) For a more complicated example, FollowsBy→Intro(d ) holds iff the end-formula of δ is of the form (A→B), where the end-formula of δ1 is B , and any assumption in δ labelled n is of the form A. We can express this primitive recursively by 71 3.6. DERIVATIONS IN NATURAL DEDUCTION (d )0 = 1 ∧ (∃a < d ) (Discharge(a, (d )1,DischargeLabel(d )) ∧ EndFmla(d ) = ( #(# ⌒ a ⌒ #→# ⌒ EndFmla((d )1)⌒ #)#)) (Think of a as the Gödel number of A). For another example, consider ∃Intro. Here, the last inference in δ is correct iff there is a formula A, a closed term t and a variable x such that A[t/x] is the end-formula of the derivation δ1 and ∃x A is the conclusion of the last inference. So, FollowsBy∃Intro(d ) holds iff (d )0 = 1 ∧DischargeLabel(d ) = 0 ∧ (∃a < d ) (∃x < d ) (∃t < d ) (ClTerm(t ) ∧Var(x) ∧ Subst(a, t,x) = EndFmla((d )1)∧EndFmla(d ) = ( #∃# ⌒ x ⌒ a)). We then define Correct(d ) as Sent(EndFmla(d )) ∧ (LastRule(d ) = 1 ∧ FollowsBy∧Intro(d )) ∨ * * * ∨ (LastRule(d ) = 16 ∧ FollowsBy=Elim(d )) ∨ (∃n < d ) (∃x < d ) (d = ⟨0,x,n⟩). The first line ensures that the end-formula of d is a sentence. The last line covers the case where d is just an assumption. □ Proposition 3.17. The relation Deriv(d ) which holds if d is the Gödel number of a correct derivation δ, is primitive recursive. Proof. A derivation δ is correct if every one of its inferences is a correct application of a rule, i.e., if every one of its subderivations ends in a correct inference. So, Deriv(d ) iff (∀i < len(SubtreeSeq(d ))) Correct((SubtreeSeq(d ))i ) □ 72 CHAPTER 3. ARITHMETIZATION OF SYNTAX Proposition 3.18. The relation OpenAssum(z,d ) that holds if z is the Gödel number of an undischarged assumption A of the derivation δ with Gödel number d , is primitive recursive. Proof. An occurrence of an assumption is discharged if it occurs with label n in a sub-derivation of δ that ends in a rule with discharge label n. So A is an undischarged assumption of δ if at least one of its occurrences is not discharged in δ. We must be careful: δ may contain both discharged and undischarged occurrences of A. Consider a sequence δ0, . . . , δk where δ0 = d , δk is the assumption [A]n (for some n), and δi is an immediate subderivation of δi+1. If such a sequence exists in which no δi ends in an inference with discharge label n, then A is an undischarged assumption of δ. The primitive recursive function SubtreeSeq(d ) provides us with a sequence of Gödel numbers of all sub-derivations of δ. Any sequence of Gödel numbers of sub-derivations of δ is a subsequence of it. Being a subsequence of is a primitive recursive relation: Subseq(s , s ′) holds iff (∀i < len(s )) ∃ j < len(s ′) (s )i = (s ) j . Being an immediate sub-derivation is as well: Subderiv(d,d ′) iff (∃ j < (d ′)0) d = (d ′) j . So we can define OpenAssum(z,d ) by (∃s < SubtreeSeq(d )) (Subseq(s ,SubtreeSeq(d )) ∧ (s )0 = d ∧ (∃n < d ) ((s )len(s )−1 = ⟨0, z,n⟩ ∧ (∀i < (len(s ) − 1)) (Subderiv((s )i , (s )i+1)] ∧ DischargeLabel((s )i+1) ≠ n))). □ Proposition 3.19. Suppose Γ is a primitive recursive set of sentences. Then the relation PrfΓ (x, y) expressing "x is the code of a derivation δ of A from undischarged assumptions in Γ and y is the Gödel number of A" is primitive recursive. Proof. Suppose "y ∈ Γ" is given by the primitive recursive predicate RΓ (y). We have to show that PrfΓ (x, y) which holds iff y is 73 3.6. DERIVATIONS IN NATURAL DEDUCTION the Gödel number of a sentence A and x is the code of a natural deduction derivation with end formula A and all undischarged assumptions in Γ is primitive recursive. By Proposition 3.17, the property Deriv(x) which holds iff x is the Gödel number of a correct derivation δ in natural deduction is primitive recursive. Thus we can define PrfΓ (x, y) by PrfΓ (x, y) ⇔ Deriv(x) ∧ EndFmla(x) = y ∧ (∀z < x) (OpenAssum(z,x) →RΓ (z )). □ Summary The proof of the incompleteness theorems requires that we have a way to talk about provability in a theory (such as PA) in the language of the theory itself, i.e., in the language of arithmetic. But the language of arithmetic only deals with numbers, not with formulas or derivations. The solution to this problem is to define a systematic mapping from formulas and derivations to numbers. The number associated with a formula or a derivation is called its Gödel number. If A is a formula, #A# is its Gödel number. We showed that important operations on formulas turn into primitive recursive functions on the respective Gödel numbers. For instance, A[t/x], the operation of substituting a term t for every free occurrence of x in A, corresponds to an arithmetical function subst(n,m,k ) which, if applied to the Gödel numbers of A, t , and x , yields the Gödel number of A[t/x]. In other words, subst( #A#, #t #, #x#) = #A[t/x]#. Likewise, properties of derivations turn into primitive recursive relations on the respective Gödel numbers. In particular, the property Deriv(n) that holds of n if it is the Gödel number of a correct derivation in natural deduction, is primitive recursive. Showing that these are primitive recursive required a fair amount of work, and at times some ingenuity, and depended essentially on the fact that operating with sequences is primitive recursive. If a theory T is decidable, then we can use Deriv to define a decidable relation PrfT(n,m) which holds if n 74 CHAPTER 3. ARITHMETIZATION OF SYNTAX is the Gödel number of a derivation of the sentence with Gödel number m fromT. This relation is primitive recursive if the set of axioms of T is, and merely general recursive if the axioms of T are decidable but not primitive recursive. Problems Problem 3.1. Show that the function flatten(z ), which turns the sequence ⟨ #t1#, . . . , #tn#⟩ into #t1, . . . , tn#, is primitive recursive. Problem 3.2. Give a detailed proof of Proposition 3.8 along the lines of the first proof of Proposition 3.5 Problem 3.3. Give a detailed proof of Proposition 3.8 along the lines of the alternate proof of Proposition 3.5 Problem 3.4. Prove Proposition 3.9. You may make use of the fact that any substring of a formula which is a formula is a subformula of it. Problem 3.5. Prove Proposition 3.12 Problem 3.6. Define the following properties as in Proposition 3.16: 1. FollowsBy→Elim(d ), 2. FollowsBy=Elim(d ), 3. FollowsBy∨Elim(d ), 4. FollowsBy∀Intro(d ). For the last one, you will have to also show that you can test primitive recursively if the last inference of the derivation with Gödel number d satisfies the eigenvariable condition, i.e., the eigenvariable a of the ∀Intro inference occurs neither in the endformula of d nor in an open assumption of d . You may use the 75 3.6. DERIVATIONS IN NATURAL DEDUCTION primitive recursive predicate OpenAssum from Proposition 3.18 for this. CHAPTER 4 Representability in Q 4.1 Introduction The incompleteness theorems apply to theories in which basic facts about computable functions can be expressed and proved. We will describe a very minimal such theory called "Q " (or, sometimes, "Robinson's Q ," after Raphael Robinson). We will say what it means for a function to be representable in Q , and then we will prove the following: A function is representable in Q if and only if it is computable. For one thing, this provides us with another model of computability. But we will also use it to show that the set {A : Q ⊢ A} is not decidable, by reducing the halting problem to it. By the time we are done, we will have proved much stronger things than this. The language of Q is the language of arithmetic; Q consists of the following axioms (to be used in conjunction with the other axioms and rules of first-order logic with identity predicate): ∀x ∀y (x ′ = y ′ → x = y) (Q1) ∀x 0 ≠ x ′ (Q2) 76 77 4.1. INTRODUCTION ∀x (x = 0 ∨ ∃y x = y ′) (Q3) ∀x (x + 0) = x (Q4) ∀x ∀y (x + y ′) = (x + y)′ (Q5) ∀x (x × 0) = 0 (Q6) ∀x ∀y (x × y ′) = ((x × y) + x) (Q7) ∀x ∀y (x < y ↔∃z (z ′ + x) = y) (Q8) For each natural number n, define the numeral n to be the term 0′′...′ where there are n tick marks in all. So, 0 is the constant symbol 0 by itself, 1 is 0′, 2 is 0′′, etc. As a theory of arithmetic, Q is extremely weak; for example, you can't even prove very simple facts like ∀x x ≠ x ′ or ∀x ∀y (x + y) = (y + x). But we will see that much of the reason that Q is so interesting is because it is so weak. In fact, it is just barely strong enough for the incompleteness theorem to hold. Another reason Q is interesting is because it has a finite set of axioms. A stronger theory than Q (called Peano arithmetic PA) is obtained by adding a schema of induction to Q : (A(0) ∧ ∀x (A(x) → A(x ′))) → ∀x A(x) where A(x) is any formula. If A(x) contains free variables other than x , we add universal quantifiers to the front to bind all of them (so that the corresponding instance of the induction schema is a sentence). For instance, if A(x, y) also contains the variable y free, the corresponding instance is ∀y ((A(0) ∧ ∀x (A(x) → A(x ′))) → ∀x A(x)) Using instances of the induction schema, one can prove much more from the axioms of PA than from those of Q . In fact, it takes a good deal of work to find "natural" statements about the natural numbers that can't be proved in Peano arithmetic! 78 CHAPTER 4. REPRESENTABILITY IN Q Definition 4.1. A function f (x0, . . . ,xk ) from the natural numbers to the natural numbers is said to be representable inQ if there is a formula A f (x0, . . . ,xk , y) such that whenever f (n0, . . . ,nk ) = m, Q proves 1. A f (n0, . . . ,nk ,m) 2. ∀y (A f (n0, . . . ,nk , y) →m = y). There are other ways of stating the definition; for example, we could equivalently require that Q proves ∀y (A f (n0, . . . ,nk , y) ↔ y = m). Theorem 4.2. A function is representable in Q if and only if it is computable. There are two directions to proving the theorem. The leftto-right direction is fairly straightforward once arithmetization of syntax is in place. The other direction requires more work. Here is the basic idea: we pick "general recursive" as a way of making "computable" precise, and show that every general recursive function is representable in Q . Recall that a function is general recursive if it can be defined from zero, the successor function succ, and the projection functions P ni , using composition, primitive recursion, and regular minimization. So one way of showing that every general recursive function is representable in Q is to show that the basic functions are representable, and whenever some functions are representable, then so are the functions defined from them using composition, primitive recursion, and regular minimization. In other words, we might show that the basic functions are representable, and that the representable functions are "closed under" composition, primitive recursion, and regular minimization. This guarantees that every general recursive function is representable. It turns out that the step where we would show that representable functions are closed under primitive recursion is hard. 79 4.2. FUNCTIONS REPRESENTABLE IN Q ARE COMPUTABLE In order to avoid this step, we show first that in fact we can do without primitive recursion. That is, we show that every general recursive function can be defined from basic functions using composition and regular minimization alone. To do this, we show that primitive recursion can actually be done by a specific regular minimization. However, for this to work, we have to add some additional basic functions: addition, multiplication, and the characteristic function of the identity relation χ=. Then, we can prove the theorem by showing that all of these basic functions are representable in Q , and the representable functions are closed under composition and regular minimization. 4.2 Functions Representable in Q are Computable Lemma 4.3. Every function that is representable inQ is computable. Proof. Let's first give the intuitive idea for why this is true. If f (x0, . . . ,xk ) is representable in Q , there is a formula A(x0, . . . ,xk , y) such that Q ⊢ A f (n0, . . . ,nk ,m) iff m = f (n0, . . . ,nk ). To compute f , we do the following. List all the possible derivations δ in the language of arithmetic. This is possible to do mechanically. For each one, check if it is a derivation of a formula of the form A f (n0, . . . ,nk ,m). If it is, m must be = f (n0, . . . ,nk ) and we've found the value of f . The search terminates because Q ⊢ A f (n0, . . . ,nk , f (n0, . . . ,nk )), so eventually we find a δ of the right sort. This is not quite precise because our procedure operates on derivations and formulas instead of just on numbers, and we haven't explained exactly why "listing all possible derivations" is mechanically possible. But as we've seen, it is possible to code terms, formulas, and derivations by Gödel numbers. We've also 80 CHAPTER 4. REPRESENTABILITY IN Q introduced a precise model of computation, the general recursive functions. And we've seen that the relation PrfQ (d, y), which holds iff d is the Gödel number of a derivation of the formula with Gödel number x from the axioms of Q , is (primitive) recursive. Other primitive recursive functions we'll need are num (Proposition 3.6) and Subst (Proposition 3.11). From these, it is possible to define f by minimization; thus, f is recursive. First, define A(n0, . . . ,nk ,m) = Subst(Subst(. . . Subst( #A f #,num(n0), #x0#), . . . ),num(nk ), #xk #),num(m), #y#) This looks complicated, but it's just the function A(n0, . . . ,nk ,m) = #A f (n0, . . . ,nk ,m)#. Now, consider the relation R(n0, . . . ,nk , s ) which holds if (s )0 is the Gödel number of a derivation from Q of A f (n0, . . . ,nk , (s )1): R(n0, . . . ,nk , s ) iff PrfQ ((s )0,A(n0, . . . ,nk , (s )1) If we can find an s such that R(n0, . . . ,nk , s ) hold, we have found a pair of numbers-(s )0 and (s1)-such that (s )0 is the Gödel number of a derivation of A f (n0, . . . ,nk , (s )1). So looking for s is like looking for the pair d and m in the informal proof. And a computable function that "looks for" such an s can be defined by regular minimization. Note that R is regular: for every n0, . . . , nk , there is a derivation δ of Q ⊢ A f (n0, . . . ,nk , f (n0, . . . ,nk )), so R(n0, . . . ,nk , s ) holds for s = ⟨ #δ#, f (n0, . . . ,nk )⟩. So, we can write f as f (n0, . . . ,nk ) = (μs R(n0, . . . ,nk , s ))1. □ 4.3 The Beta Function Lemma In order to show that we can carry out primitive recursion if addition, multiplication, and χ= are available, we need to develop 81 4.3. THE BETA FUNCTION LEMMA functions that handle sequences. (If we had exponentiation as well, our task would be easier.) When we had primitive recursion, we could define things like the "n-th prime," and pick a fairly straightforward coding. But here we do not have primitive recursion-in fact we want to show that we can do primitive recursion using minimization-so we need to be more clever. Lemma 4.4. There is a function β (d, i ) such that for every sequence a0, . . . , an there is a number d , such that for every i ≤ n, β (d, i ) = ai . Moreover, β can be defined from the basic functions using just composition and regular minimization. Think of d as coding the sequence ⟨a0, . . . ,an⟩, and β (d, i ) returning the i -th element. (Note that this "coding" does not use the prower-of-primes coding we're already familiar with!). The lemma is fairly minimal; it doesn't say we can concatenate sequences or append elements, or even that we can compute d from a0, . . . , an using functions definable by composition and regular minimization. All it says is that there is a "decoding" function such that every sequence is "coded." The use of the notation β is Gödel's. To repeat, the hard part of proving the lemma is defining a suitable β using the seemingly restricted resources, i.e., using just composition and minimization-however, we're allowed to use addition, multiplication, and χ=. There are various ways to prove this lemma, but one of the cleanest is still Gödel's original method, which used a number-theoretic fact called the Chinese Remainder theorem. Definition 4.5. Two natural numbers a and b are relatively prime if their greatest common divisor is 1; in other words, they have no other divisors in common. 82 CHAPTER 4. REPRESENTABILITY IN Q Definition 4.6. a ≡ b mod c means c | (a−b), i.e., a and b have the same remainder when divided by c . Here is the Chinese Remainder theorem: Theorem 4.7. Suppose x0, . . . , xn are (pairwise) relatively prime. Let y0, . . . , yn be any numbers. Then there is a number z such that z ≡ y0 mod x0 z ≡ y1 mod x1 ... z ≡ yn mod xn . Here is how we will use the Chinese Remainder theorem: if x0, . . . , xn are bigger than y0, . . . , yn respectively, then we can take z to code the sequence ⟨y0, . . . , yn⟩. To recover yi , we need only divide z by xi and take the remainder. To use this coding, we will need to find suitable values for x0, . . . , xn . A couple of observations will help us in this regard. Given y0, . . . , yn , let j = max(n, y0, . . . , yn) + 1, and let x0 = 1 + j ! x1 = 1 + 2 * j ! x2 = 1 + 3 * j ! ... xn = 1 + (n + 1) * j ! Then two things are true: 1. x0, . . . , xn are relatively prime. 2. For each i , yi < xi . 83 4.3. THE BETA FUNCTION LEMMA To see that (1) is true, note that if p is a prime number and p | xi and p | xk , then p | 1 + (i + 1) j ! and p | 1 + (k + 1) j !. But then p divides their difference, (1 + (i + 1) j !) − (1 + (k + 1) j !) = (i − k ) j !. Since p divides 1+ (i + 1) j !, it can't divide j ! as well (otherwise, the first division would leave a remainder of 1). So p divides i −k , since p divides (i − k ) j !. But |i − k | is at most n, and we have chosen j > n, so this implies that p | j !, again a contradiction. So there is no prime number dividing both xi and xk . Clause (2) is easy: we have yi < j < j ! < xi . Now let us prove the β function lemma. Remember that we can use 0, successor, plus, times, χ=, projections, and any function defined from them using composition and minimization applied to regular functions. We can also use a relation if its characteristic function is so definable. As before we can show that these relations are closed under boolean combinations and bounded quantification; for example: 1. not(x) = χ=(x,0) 2. (min x ≤ z )R(x, y) = μx (R(x, y) ∨ x = z ) 3. (∃x ≤ z ) R(x, y) ⇔ R((min x ≤ z )R(x, y), y) We can then show that all of the following are also definable without primitive recursion: 1. The pairing function, J (x, y) = 12 [(x + y)(x + y + 1)] + x 2. Projections K (z ) = (min x ≤ q ) (∃y ≤ z [z = J (x, y)]) and L(z ) = (min y ≤ q ) (∃x ≤ z [z = J (x, y)]). 3. x < y 84 CHAPTER 4. REPRESENTABILITY IN Q 4. x | y 5. The function rem(x, y) which returns the remainder when y is divided by x Now define β∗(d0,d1, i ) = rem(1 + (i + 1)d1,d0) and β (d, i ) = β∗(K (d ),L(d ), i ). This is the function we need. Given a0, . . . ,an , as above, let j = max(n,a0, . . . ,an) + 1, and let d1 = j !. By the observations above, we know that 1 + d1,1+ 2d1, . . . ,1+ (n + 1)d1 are relatively prime and all are bigger than a0, . . . ,an . By the Chinese Remainder theorem there is a value d0 such that for each i , d0 ≡ ai mod (1 + (i + 1)d1) and so (because d1 is greater than ai ), ai = rem(1 + (i + 1)d1,d0). Let d = J (d0,d1). Then for each i ≤ n, we have β (d, i ) = β∗(d0,d1, i ) = rem(1 + (i + 1)d1,d0) = ai which is what we need. This completes the proof of the β -function lemma. 85 4.4. SIMULATING PRIMITIVE RECURSION 4.4 Simulating Primitive Recursion Now we can show that definition by primitive recursion can be "simulated" by regular minimization using the beta function. Suppose we have f (x⃗) and g (x⃗, y, z ). Then the function h(x, z⃗ ) defined from f and g by primitive recursion is h(x⃗, y) = f (z⃗ ) h(x⃗, y + 1) = g (x⃗, y,h(x⃗, y)). We need to show that h can be defined from f and g using just composition and regular minimization, using the basic functions and functions defined from them using composition and regular minimization (such as β). Lemma 4.8. If h can be defined from f and g using primitive recursion, it can be defined from f , g , the functions zero, succ, P ni , add, mult, χ=, using composition and regular minimization. Proof. First, define an auxiliary function ĥ(x⃗, y) which returns the least number d such that d codes a sequence which satisfies 1. (d )0 = f (x⃗), and 2. for each i < x , (d )i+1 = g (x⃗, i, (d )i ), where now (d )i is short for β (d, i ). In other words, ĥ returns the sequence ⟨h(x⃗,0),h(x⃗,1), . . . ,h(x⃗, y)⟩. We can write ĥ as ĥ(x⃗, y) = μd (β (d,0) = f (x⃗) ∧ (∀i < y) β (d, i + 1) = g (x⃗, i, β (d, i )). Note: no primitive recursion is needed here, just minimization. The function we minimize is regular because of the beta function lemma Lemma 4.4. But now we have h(x⃗, y) = β (ĥ(x⃗, y), y), so h can be defined from the basic functions using just composition and regular minimization. □ 86 CHAPTER 4. REPRESENTABILITY IN Q 4.5 Basic Functions are Representable in Q First we have to show that all the basic functions are representable inQ . In the end, we need to show how to assign to each k -ary basic function f (x0, . . . ,xk−1) a formula A f (x0, . . . ,xk−1, y) that represents it. We will be able to represent zero, successor, plus, times, the characteristic function for equality, and projections. In each case, the appropriate representing function is entirely straightforward; for example, zero is represented by the formula y = 0, successor is represented by the formula x ′0 = y , and addition is represented by the formula (x0 + x1) = y . The work involves showing that Q can prove the relevant sentences; for example, saying that addition is represented by the formula above involves showing that for every pair of natural numbers m and n, Q proves n +m = n +m and ∀y ((n +m) = y → y = n +m). Proposition 4.9. The zero function zero(x) = 0 is represented in Q by y = 0. Proposition 4.10. The successor function succ(x) = x + 1 is represented in Q by y = x ′. Proposition 4.11. The projection function P ni (x0, . . . ,xn−1) = xi is represented in Q by y = xi . Proposition 4.12. The characteristic function of =, χ=(x0,x1) = {︄ 1 if x0 = x1 0 otherwise 87 4.5. BASIC FUNCTIONS ARE REPRESENTABLE IN Q is represented in Q by (x0 = x1 ∧ y = 1) ∨ (x0 ≠ x1 ∧ y = 0). The proof requires the following lemma. Lemma 4.13. Given natural numbers n and m, if n ≠ m, then Q ⊢ n ≠ m. Proof. Use induction on n to show that for every m, if n ≠ m, then Q ⊢ n ≠ m. In the base case, n = 0. If m is not equal to 0, then m = k + 1 for some natural number k . We have an axiom that says ∀x 0 ≠ x ′. By a quantifier axiom, replacing x by k , we can conclude 0 ≠ k ′ . But k ′ is just m. In the induction step, we can assume the claim is true for n, and consider n + 1. Let m be any natural number. There are two possibilities: either m = 0 or for some k we have m = k + 1. The first case is handled as above. In the second case, suppose n + 1 ≠ k + 1. Then n ≠ k . By the induction hypothesis for n we have Q ⊢ n ≠ k . We have an axiom that says ∀x ∀y x ′ = y ′ → x = y . Using a quantifier axiom, we have n ′ = k ′ → n = k . Using propositional logic, we can conclude, in Q , n ≠ k → n ′ ≠ k ′ . Using modus ponens, we can conclude n ′ ≠ k ′ , which is what we want, since k ′ is m. □ Note that the lemma does not say much: in essence it says thatQ can prove that different numerals denote different objects. For example, Q proves 0′′ ≠ 0′′′. But showing that this holds in general requires some care. Note also that although we are using induction, it is induction outside of Q . Proof of Proposition 4.12. If n = m, then n and m are the same term, and χ=(n,m) = 1. But Q ⊢ (n = m ∧ 1 = 1), so it proves A=(n,m,1). If n ≠ m, then χ=(n,m) = 0. By Lemma 4.13, Q ⊢ n ≠ m and so also (n ≠ m ∧ 0 = 0). Thus Q ⊢ A=(n,m,0). 88 CHAPTER 4. REPRESENTABILITY IN Q For the second part, we also have two cases. If n = m, we have to show thatQ ⊢ ∀(A=(n,m, y)→y = 1). Arguing informally, suppose A=(n,m, y), i.e., (n = n ∧ y = 1) ∨ (n ≠ n ∧ y = 0) The left disjunct implies y = 1 by logic; the right contradicts n = n which is provable by logic. Suppose, on the other hand, that n ≠ m. Then A=(n,m, y) is (n = m ∧ y = 1) ∨ (n ≠ m ∧ y = 0) Here, the left disjunct contradicts n ≠ m, which is provable in Q by Lemma 4.13; the right disjunct entails y = 0. □ Proposition 4.14. The addition function add(x0,x1) = x0 + x1 is represented in Q by y = (x0 + x1). Lemma 4.15. Q ⊢ (n +m) = n +m Proof. We prove this by induction on m. If m = 0, the claim is thatQ ⊢ (n+0) = n. This follows by axiom Q4. Now suppose the claim for m; let's prove the claim for m + 1, i.e., prove that Q ⊢ (n +m + 1) = n +m + 1. Note that m + 1 is just m ′, and n +m + 1 is just n +m ′. By axiom Q5,Q ⊢ (n+m ′ ) = (n+m)′. By induction hypothesis, Q ⊢ (n +m) = n +m. So Q ⊢ (n +m ′) = n +m ′. □ Proof of Proposition 4.14. The formula Aadd(x0,x1, y) representing add is y = (x0 + x1). First we show that if add(n,m) = k , then Q ⊢ Aadd(n,m,k ), i.e., Q ⊢ k = (n + m). But since k = n + m, k just is n +m, and we've shown in Lemma 4.15 thatQ ⊢ (n +m) = n +m. We also have to show that if add(n,m) = k , then Q ⊢ ∀y (Aadd(n,m, y) → y = k ). 89 4.6. COMPOSITION IS REPRESENTABLE IN Q Suppose we have (n +m) = y . Since Q ⊢ (n +m) = n +m, we can replace the left side with n +m and get n +m = y , for arbitrary y . □ Proposition 4.16. The multiplication functionmult(x0,x1) = x0 *x1 is represented in Q by y = (x0 × x1). Proof. Exercise. □ Lemma 4.17. Q ⊢ (n ×m) = n * m Proof. Exercise. □ Recall that we use × for the function symbol of the language of arithmetic, and * for the ordinary multiplication operation on numbers. So * can appear between expressions for numbers (such as in m * n) while × appears only between terms of the language of arithmetic (such as in (m × n)). Even more confusingly, + is used for both the function symbol and the addition operation. When it appears between terms-e.g., in (n + m)-it is the 2place function symbol of the language of arithmetic, and when it appears between numbers-e.g., in n + m-it is the addition operation. This includes the case n +m: this is the standard numeral corresponding to the number n +m. 4.6 Composition is Representable in Q Suppose h is defined by h(x0, . . . ,xl−1) = f (g0(x0, . . . ,xl−1), . . . , gk−1(x0, . . . ,xl−1)). 90 CHAPTER 4. REPRESENTABILITY IN Q where we have already found formulas A f ,Ag0, . . . ,Agk−1 representing the functions f , and g0, . . . , gk−1, respectively. We have to find a formula Ah representing h. Let's start with a simple case, where all functions are 1-place, i.e., consider h(x) = f (g (x)). If A f (y, z ) represents f , and Ag (x, y) represents g , we need a formula Ah(x, z ) that represents h. Note that h(x) = z iff there is a y such that both z = f (y) and y = g (x). (If h(x) = z , then g (x) is such a y ; if such a y exists, then since y = g (x) and z = f (y), z = f (g (x)).) This suggests that ∃y (Ag (x, y) ∧ A f (y, z )) is a good candidate for Ah(x, z ). We just have to verify that Q proves the relevant formulas. Proposition 4.18. If h(n) = m, then Q ⊢ Ah(n,m). Proof. Suppose h(n) = m, i.e., f (g (n)) = m. Let k = g (n). Then Q ⊢ Ag (n,k ) since Ag represents g , and Q ⊢ A f (k,m) since A f represents f . Thus, Q ⊢ Ag (n,k ) ∧ A f (k,m) and consequently also Q ⊢ ∃y (Ag (n, y) ∧ A f (y,m)), i.e., Q ⊢ Ah(n,m). □ 91 4.7. REGULAR MINIMIZATION IS REPRESENTABLE IN Q Proposition 4.19. If h(n) = m, then Q ⊢ ∀z (Ah(n, z ) → z = m). Proof. Suppose h(n) = m, i.e., f (g (n)) = m. Let k = g (n). Then Q ⊢ ∀y (Ag (n, y) → y = k ) since Ag represents g , and Q ⊢ ∀z (A f (k, z ) → z = m) since A f represents f . Using just a little bit of logic, we can show that also Q ⊢ ∀z (∃y (Ag (n, y) ∧ A f (y, z )) → z = m). i.e., Q ⊢ ∀y (Ah(n, y) → y = m). □ The same idea works in the more complex case where f and gi have arity greater than 1. Proposition 4.20. If A f (y0, . . . , yk−1, z ) represents f (y0, . . . , yk−1) in Q , and Agi (x0, . . . ,xl−1, y) represents gi (x0, . . . ,xl−1) in Q , then ∃y0, . . . ∃yk−1 (Ag0(x0, . . . ,xl−1, y0) ∧ * * * ∧ Agk−1(x0, . . . ,xl−1, yk−1) ∧ A f (y0, . . . , yk−1, z )) represents h(x0, . . . ,xk−1) = f (g0(x0, . . . ,xk−1), . . . , g0(x0, . . . ,xk−1)). Proof. Exercise. □ 4.7 Regular Minimization is Representable in Q Let's consider unbounded search. Suppose g (x, z ) is regular and representable in Q , say by the formula Ag (x, z, y). Let f be defined by f (z ) = μx [g (x, z ) = 0]. We would like to find a formula A f (z, y) representing f . The value of f (z ) is that number x 92 CHAPTER 4. REPRESENTABILITY IN Q which (a) satisfies g (x, z ) = 0 and (b) is the least such, i.e., for any w < x , g (w, z ) ≠ 0. So the following is a natural choice: A f (z, y) ≡ Ag (y, z,0) ∧ ∀w (w < y →¬Ag (w, z,0)). In the general case, of course, we would have to replace z with z0, . . . , zk . The proof, again, will involve some lemmas about things Q is strong enough to prove. Lemma 4.21. For every constant symbol a and every natural number n, Q ⊢ (a ′ + n) = (a + n)′. Proof. The proof is, as usual, by induction on n. In the base case, n = 0, we need to show that Q proves (a ′ + 0) = (a + 0)′. But we have: Q ⊢ (a ′ + 0) = a ′ by axiom Q4 (4.1) Q ⊢ (a + 0) = a by axiom Q4 (4.2) Q ⊢ (a + 0)′ = a ′ by eq. (4.2) (4.3) Q ⊢ (a ′ + 0) = (a + 0)′ by eq. (4.1) and eq. (4.3) In the induction step, we can assume that we have shown that Q ⊢ (a ′ + n) = (a + n)′. Since n + 1 is n ′, we need to show that Q proves (a ′ + n ′) = (a + n ′)′. We have: Q ⊢ (a ′ + n ′) = (a ′ + n)′ by axiom Q5 (4.4) Q ⊢ (a ′ + n ′) = (a + n ′)′ inductive hypothesis (4.5) Q ⊢ (a ′ + n)′ = (a + n ′)′ by eq. (4.4) and eq. (4.5). □ It is again worth mentioning that this is weaker than saying that Q proves ∀x ∀y (x ′ + y) = (x + y)′. Although this sentence is true in N, Q does not prove it. 93 4.7. REGULAR MINIMIZATION IS REPRESENTABLE IN Q Lemma 4.22. Q ⊢ ∀x ¬x < 0. Proof. We give the proof informally (i.e., only giving hints as to how to construct the formal derivation). We have to prove ¬a < 0 for an arbitrary a. By the definition of <, we need to prove ¬∃y (y ′ + a) = 0 in Q . We'll assume ∃y (y ′ + a) = 0 and prove a contradiction. Suppose (b ′ + a) = 0. Using Q3, we have that a = 0 ∨ ∃y a = y ′. We distinguish cases. Case 1: a = 0 holds. From (b ′ + a) = 0, we have (b ′ + 0) = 0. By axiom Q4 of Q , we have (b ′ + 0) = b ′, and hence b ′ = 0. But by axiom Q2 we also have b ′ ≠ 0, a contradiction. Case 2: For some c , a = c ′. But then we have (b ′ + c ′) = 0. By axiom Q5, we have (b ′ + c )′ = 0, again contradicting axiom Q2.□ Lemma 4.23. For every natural number n, Q ⊢ ∀x (x < n + 1→ (x = 0 ∨ * * * ∨ x = n)). Proof. We use induction on n. Let us consider the base case, when n = 0. In that case, we need to show a < 1 → a = 0, for arbitrary a. Suppose a < 1. Then by the defining axiom for <, we have ∃y (y ′ + a) = 0′ (since 1 ≡ 0′). Suppose b has that property, i.e., we have (b ′ + a) = 0′. We need to show a = 0. By axiom Q3, we have either a = 0 or that there is a c such that a = c ′. In the former case, there is nothing to show. So suppose a = c ′. Then we have (b ′ + c ′) = 0′. By axiom Q5 of Q , we have (b ′ + c )′ = 0′. By axiom Q1, we have (b ′+c ) = 0. But this means, by axiomQ8, that c < 0, contradicting Lemma 4.22. Now for the inductive step. We prove the case for n + 1, assuming the case for n. So suppose a < n + 2. Again using Q3 we can distinguish two cases: a = 0 and for some b , a = c ′. In the first case, a = 0 ∨ * * * ∨ a = n + 1 follows trivially. In the second case, we have c ′ < n + 2, i.e., c ′ < n + 1 ′ . By axiom Q8, for some d , (d ′ + c ′) = n + 1 ′ . By axiom Q5, (d ′ + c )′ = n + 1 ′ . 94 CHAPTER 4. REPRESENTABILITY IN Q By axiom Q1, (d ′ + c ) = n + 1, and so c < n + 1 by axiom Q8. By inductive hypothesis, c = 0 ∨ * * * ∨ c = n. From this, we get c ′ = 0′∨ * * * ∨ c ′ = n ′ by logic, and so a = 1∨ * * * ∨a = n + 1 since a = c ′. □ Lemma 4.24. For every m ∈ N, Q ⊢ ∀y ((y < m ∨m < y) ∨ y = m). Proof. By induction on m. First, consider the case m = 0. Q ⊢ ∀y (y = 0 ∨ ∃z y = z ′) by Q3. Let a be arbitrary. Then either a = 0 or for some b , a = b ′. In the former case, we also have (a < 0∨0 < a)∨a = 0. But if a = b ′, then (b ′+0) = (a +0) by the logic of =. By Q4, (a + 0) = a, so we have (b ′ + 0) = a, and hence ∃z (z ′ + 0) = a. By the definition of < in Q8, 0 < a. If 0 < a, then also (0 < a ∨ a < 0) ∨ a = 0. Now suppose we have Q ⊢ ∀y ((y < m ∨m < y) ∨ y = m) and we want to show Q ⊢ ∀y ((y < m + 1 ∨m + 1 < y) ∨ y = m + 1) Let a be arbitrary. By Q3, either a = 0 or for some b , a = b ′. In the first case, we have m ′ + a = m + 1 by Q4, and so a < m + 1 by Q8. Now consider the second case, a = b ′. By the induction hypothesis, (b < m ∨m < b) ∨ b = m. The first disjunct b < m is equivalent (byQ8) to ∃z (z ′+b) = m. Suppose c has this property. If (c ′ + b) = m, then also (c ′ + b)′ = m ′. By Q5, (c ′ + b)′ = (c ′ + b ′). Hence, (c ′ + b ′) = m ′. We get ∃u (u ′+b ′) = m + 1 by existentially generalizing on c ′ and keeping in mind that m ′ ≡ m + 1. Hence, if b < m then b ′ < m + 1 and so a < m + 1. 95 4.7. REGULAR MINIMIZATION IS REPRESENTABLE IN Q Now suppose m < b , i.e., ∃z (z ′ + m) = b . Suppose c is such a z , i.e., (c ′+m) = b . By logic, (c ′+m)′ = b ′. By Q5, (c ′+m ′ ) = b ′. Since a = b ′ and m ′ ≡ m + 1, (c ′ +m + 1) = a. By Q8, m + 1 < a. Finally, assume b = m. Then, by logic, b ′ = m ′, and so a = m + 1. Hence, from each disjunct of the case for m and b , we can obtain the corresponding disjunct for for m + 1 and a. □ Proposition 4.25. If Ag (x, z, y) represents g (x, y) in Q , then A f (z, y) ≡ Ag (y, z,0) ∧ ∀w (w < y →¬Ag (w, z,0)). represents f (z ) = μx [g (x, z ) = 0]. Proof. First we show that if f (n) = m, then Q ⊢ A f (n,m), i.e., Q ⊢ Ag (m,n,0) ∧ ∀w (w < m→¬Ag (w,n,0)). Since Ag (x, z, y) represents g (x, z ) and g (m,n) = 0 if f (n) = m, we have Q ⊢ Ag (m,n,0). If f (n) = m, then for every k < m, g (k,n) ≠ 0. So Q ⊢ ¬Ag (k,n,0). We get that Q ⊢ ∀w (w < m→¬Ag (w,n,0)). (4.6) by Lemma 4.22 in case m = 0 and by Lemma 4.23 otherwise. Now let's show that if f (n) = m, then Q ⊢ ∀y (A f (n, y) → y = m). We again sketch the argument informally, leaving the formalization to the reader. 96 CHAPTER 4. REPRESENTABILITY IN Q Suppose A f (n,b). From this we get (a) Ag (b,n,0) and (b) ∀w (w < b→¬Ag (w,n,0)). By Lemma 4.24, (b < m∨m < b)∨b = m. We'll show that both b < m andm < b leads to a contradiction. If m < b , then ¬Ag (m,n,0) from (b). But m = f (n), so g (m,n) = 0, and so Q ⊢ Ag (m,n,0) since Ag represents g . So we have a contradiction. Now suppose b < m. Then since Q ⊢ ∀w (w < m → ¬Ag (w,n,0)) by eq. (4.6), we get ¬Ag (b,n,0). This again contradicts (a). □ 4.8 Computable Functions are Representable in Q Theorem 4.26. Every computable function is representable in Q . Proof. For definiteness, and using the Church-Turing Thesis, let's say that a function is computable iff it is general recursive. The general recursive functions are those which can be defined from the zero function zero, the successor function succ, and the projection function P ni using composition, primitive recursion, and regular minimization. By Lemma 4.8, any function h that can be defined from f and g can also be defined using composition and regular minimization from f , g , and zero, succ, P ni , add, mult, χ=. Consequently, a function is general recursive iff it can be defined from zero, succ, P ni , add, mult, χ= using composition and regular minimization. We've furthermore shown that the basic functions in question are representable in Q (Propositions 4.9 to 4.12, 4.14 and 4.16), and that any function defined from representable functions by composition or regular minimization (Proposition 4.20, Proposition 4.25) is also representable. Thus every general recursive function is representable in Q . □ We have shown that the set of computable functions can be characterized as the set of functions representable in Q . In fact, 97 4.9. REPRESENTING RELATIONS the proof is more general. From the definition of representability, it is not hard to see that any theory extendingQ (or in which one can interpret Q ) can represent the computable functions. But, conversely, in any proof system in which the notion of proof is computable, every representable function is computable. So, for example, the set of computable functions can be characterized as the set of functions representable in Peano arithmetic, or even Zermelo-Fraenkel set theory. As Gödel noted, this is somewhat surprising. We will see that when it comes to provability, questions are very sensitive to which theory you consider; roughly, the stronger the axioms, the more you can prove. But across a wide range of axiomatic theories, the representable functions are exactly the computable ones; stronger theories do not represent more functions as long as they are axiomatizable. 4.9 Representing Relations Let us say what it means for a relation to be representable. Definition 4.27. A relation R(x0, . . . ,xk ) on the natural numbers is representable in Q if there is a formula AR(x0, . . . ,xk ) such that whenever R(n0, . . . ,nk ) is true, Q proves AR(n0, . . . ,nk ), and whenever R(n0, . . . ,nk ) is false, Q proves ¬AR(n0, . . . ,nk ). Theorem 4.28. A relation is representable in Q if and only if it is computable. Proof. For the forwards direction, suppose R(x0, . . . ,xk ) is represented by the formula AR(x0, . . . ,xk ). Here is an algorithm for computing R: on input n0, . . . , nk , simultaneously search for a proof of AR(n0, . . . ,nk ) and a proof of ¬AR(n0, . . . ,nk ). By our hypothesis, the search is bound to find one or the other; if it is the first, report "yes," and otherwise, report "no." In the other direction, suppose R(x0, . . . ,xk ) is computable. By definition, this means that the function χR(x0, . . . ,xk ) is 98 CHAPTER 4. REPRESENTABILITY IN Q computable. By Theorem 4.2, χR is represented by a formula, say AχR (x0, . . . ,xk , y). Let AR(x0, . . . ,xk ) be the formula AχR (x0, . . . ,xk ,1). Then for any n0, . . . , nk , if R(n0, . . . ,nk ) is true, then χR(n0, . . . ,nk ) = 1, in which case Q proves AχR (n0, . . . ,nk ,1), and so Q proves AR(n0, . . . ,nk ). On the other hand, if R(n0, . . . ,nk ) is false, then χR(n0, . . . ,nk ) = 0. This means that Q proves ∀y (AχR (n0, . . . ,nk , y) → y = 0). Since Q proves 0 ≠ 1, Q proves ¬AχR (n0, . . . ,nk ,1), and so it proves ¬AR(n0, . . . ,nk ). □ 4.10 Undecidability We call a theory T undecidable if there is no computational procedure which, after finitely many steps and unfailingly, provides a correct answer to the question "does T prove A?" for any sentence A in the language of T. So Q would be decidable iff there were a computational procedure which decides, given a sentence A in the language of arithmetic, whether Q ⊢ A or not. We can make this more precise by asking: Is the relation ProvQ (y), which holds of y iff y is the Gödel number of a sentence provable in Q , recursive? The answer is: no. Theorem 4.29. Q is undecidable, i.e., the relation ProvQ (y) ⇔ Sent(y) ∧ ∃x PrfQ (x, y) is not recursive. Proof. Suppose it were. Then we could solve the halting problem as follows: Given e and n, we know that φe (n) ↓ iff there is an s such that T (e,n, s ), where T is Kleene's predicate from Theorem 2.28. Since T is primitive recursive it is representable in Q by a formula BT , that is, Q ⊢ BT (e,n, s ) iff T (e,n, s ). If Q ⊢ BT (e,n, s ) then also Q ⊢ ∃y BT (e,n, y). If no such s exists, 99 4.10. UNDECIDABILITY then Q ⊢ ¬BT (e,n, s ) for every s . But Q is ω-consistent, i.e., if Q ⊢ ¬A(n) for every n ∈ N, then Q ⊬ ∃y A(y). We know this because the axioms of Q are true in the standard model N. So, Q ⊬ ∃y BT (e,n, y). In other words, Q ⊢ ∃y BT (e,n, y) iff there is an s such thatT (e,n, s ), i.e., iff φe (n) ↓. From e and n we can compute #∃y BT (e,n, y)#, let g (e,n) be the primitive recursive function which does that. So h(e,n) = {︄ 1 if ProvQ (g (e,n)) 0 otherwise. This would show that h is recursive if ProvQ is. But h is not recursive, by Theorem 2.29, so ProvQ cannot be either. □ Corollary 4.30. First-order logic is undecidable. Proof. If first-order logic were decidable, provability in Q would be as well, since Q ⊢ A iff ⊢ O → A, where O is the conjunction of the axioms of Q . □ Summary In order to show how theories like Q can "talk" about computable functions-and especially about provability (via Gödel numbers)-we established that Q represents all computable functions. By "Q represents f (n)" we mean that there is a formula A f (x, y) in LA which expresses that f (x) = y , and Q can prove that it does. This, in turn, means that whenever f (n) = m, then T ⊢ A f (n,m) and T ⊢ ∀y (A f (n, y) → y = m). (Here, n is the standard numeral for n, i.e., the term 0′...′ with n ′s. The term n picks out the number n in the standard model N, so it's a convenient way of representing the number n in LA.) To prove that Q represents all computable functions we go back to the characterization of computable functions as those that can be defined from zero, succ, and the projection functions, by composition, primitive recursion, and regular minimization. While it is relatively 100 CHAPTER 4. REPRESENTABILITY IN Q easy to prove that the basic functions are representable and that functions defined by composition and regular minimization from representable functions are also representable, primitive recursion is harder. We showed that we can actually avoid definition by primitive recursion, if we allow a few additional basic functions (namely, addition, multiplication, and the characteristic function of =). This required a beta function which allows us to deal with sequences of numbers in a rudimentary way, and which can be defined without using primitive recursion. Problems Problem 4.1. Prove that y = 0, y = x ′, and y = xi represent zero, succ, and P ni , respectively. Problem 4.2. Prove Lemma 4.17. Problem 4.3. Use Lemma 4.17 to prove Proposition 4.16. Problem 4.4. Using the proofs of Proposition 4.19 and Proposition 4.19 as a guide, carry out the proof of Proposition 4.20 in detail. Problem 4.5. Show that if R is representable in Q , so is χR . CHAPTER 5 Incompleteness and Provability 5.1 Introduction Hilbert thought that a system of axioms for a mathematical structure, such as the natural numbers, is inadequate unless it allows one to derive all true statements about the structure. Combined with his later interest in formal systems of deduction, this suggests that he thought that we should guarantee that, say, the formal systems we are using to reason about the natural numbers is not only consistent, but also complete, i.e., every statement in its language is either derivable or its negation is. Gödel's first incompleteness theorem shows that no such system of axioms exists: there is no complete, consistent, axiomatizable formal system for arithmetic. In fact, no "sufficiently strong," consistent, axiomatizable mathematical theory is complete. A more important goal of Hilbert's, the centerpiece of his program for the justification of modern ("classical") mathematics, was to find finitary consistency proofs for formal systems representing classical reasoning. With regard to Hilbert's program, then, Gödel's second incompleteness theorem was a much bigger blow. The second incompleteness theorem can be stated in vague terms, like the first incompleteness theorem. Roughly speaking, 101 102 CHAPTER 5. INCOMPLETENESS AND PROVABILITY it says that no sufficiently strong theory of arithmetic can prove its own consistency. We will have to take "sufficiently strong" to include a little bit more than Q . The idea behind Gödel's original proof of the incompleteness theorem can be found in the Epimenides paradox. Epimenides, a Cretan, asserted that all Cretans are liars; a more direct form of the paradox is the assertion "this sentence is false." Essentially, by replacing truth with derivability, Gödel was able to formalize a sentence which, in a roundabout way, asserts that it itself is not derivable. If that sentence were derivable, the theory would then be inconsistent. Gödel showed that the negation of that sentence is also not derivable from the system of axioms he was considering. (For this second part, Gödel had to assume that the theory T is what's called "ω-consistent." ω-Consistency is related to consistency, but is a stronger property. A few years after Gödel, Rosser showed that assuming simple consistency ofT is enough.) The first challenge is to understand how one can construct a sentence that refers to itself. For every formula A in the language of Q , let ⌜A⌝ denote the numeral corresponding to #A#. Think about what this means: A is a formula in the language of Q , #A# is a natural number, and ⌜A⌝ is a term in the language of Q . So every formula A in the language of Q has a name, ⌜A⌝, which is a term in the language of Q ; this provides us with a conceptual framework in which formulas in the language of Q can "say" things about other formulas. The following lemma is known as the fixed-point lemma. Lemma 5.1. Let T be any theory extending Q , and let B(x) be any formula with only the variable x free. Then there is a sentence A such that T ⊢ A↔ B(⌜A⌝). The lemma asserts that given any property B(x), there is a sentence A that asserts "B(x) is true of me," and T "knows" this. How can we construct such a sentence? Consider the following version of the Epimenides paradox, due to Quine: 103 5.2. THE FIXED-POINT LEMMA "Yields falsehood when preceded by its quotation" yields falsehood when preceded by its quotation. This sentence is not directly self-referential. It simply makes an assertion about the syntactic objects between quotes, and, in doing so, it is on par with sentences like 1. "Robert" is a nice name. 2. "I ran." is a short sentence. 3. "Has three words" has three words. But what happens when one takes the phrase "yields falsehood when preceded by its quotation," and precedes it with a quoted version of itself? Then one has the original sentence! In short, the sentence asserts that it is false. 5.2 The Fixed-Point Lemma The fixed-point lemma says that for any formula B(x), there is a sentence A such that T ⊢ A↔ B(⌜A⌝), provided T extends Q . In the case of the liar sentence, we'd want A to be equivalent (provably in T) to "⌜A⌝ is false," i.e., the statement that #A# is the Gödel number of a false sentence. To understand the idea of the proof, it will be useful to compare it with Quine's informal gloss of A as, "'yields a falsehood when preceded by its own quotation' yields a falsehood when preceded by its own quotation." The operation of taking an expression, and then forming a sentence by preceding this expression by its own quotation may be called diagonalizing the expression, and the result its diagonalization. So, the diagonalization of 'yields a falsehood when preceded by its own quotation' is "'yields a falsehood when preceded by its own quotation' yields a falsehood when preceded by its own quotation." Now note that Quine's liar sentence is not the diagonalization of 'yields a falsehood' but of 'yields a falsehood 104 CHAPTER 5. INCOMPLETENESS AND PROVABILITY when preceded by its own quotation.' So the property being diagonalized to yield the liar sentence itself involves diagonalization! In the language of arithmetic, we form quotations of a formula with one free variable by computing its Gödel numbers and then substituting the standard numeral for that Gödel number into the free variable. The diagonalization of E(x) is E(n), where n = #E(x)#. (From now on, let's abbreviate #E(x)# as ⌜E(x)⌝.) So if B(x) is "is a falsehood," then "yields a falsehood if preceded by its own quotation," would be "yields a falsehood when applied to the Gödel number of its diagonalization." If we had a symbol diag for the function diag(n) which computes the Gödel number of the diagonalization of the formula with Gödel number n, we could write E(x) as B(diag(x)). And Quine's version of the liar sentence would then be the diagonalization of it, i.e., E(⌜E⌝) or B(diag(⌜B(diag(x))⌝)). Of course, B(x) could now be any other property, and the same construction would work. For the incompleteness theorem, we'll take B(x) to be "x is not derivable in T." Then E(x) would be "yields a sentence not derivable in T when applied to the Gödel number of its diagonalization." To formalize this in T, we have to find a way to formalize diag. The function diag(n) is computable, in fact, it is primitive recursive: if n is the Gödel number of a formula E(x), diag(n) returns the Gödel number of E(⌜E(x)⌝). (Recall, ⌜E(x)⌝ is the standard numeral of the Gödel number of E(x), i.e., #E(x)#). If diag were a function symbol in T representing the function diag, we could take A to be the formula B(diag(⌜B(diag(x))⌝)). Notice that diag( #B(diag(x))#) = #B(diag(⌜B(diag(x))⌝)# = #A#. Assuming T can derive diag(⌜B(diag(x))⌝) = ⌜A⌝, it can derive B(diag(⌜B(diag(x))⌝))↔B(⌜A⌝). But the left hand side is, by definition, A. 105 5.2. THE FIXED-POINT LEMMA Of course, diag will in general not be a function symbol of T, and certainly is not one of Q . But, since diag is computable, it is representable in Q by some formula Ddiag(x, y). So instead of writing B(diag(x)) we can write ∃y (Ddiag(x, y) ∧ B(y)). Otherwise, the proof sketched above goes through, and in fact, it goes through already in Q . Lemma 5.2. Let B(x) be any formula with one free variable x . Then there is a sentence A such that Q ⊢ A↔ B(⌜A⌝). Proof. Given B(x), let E(x) be the formula ∃y (Ddiag(x, y) ∧ B(y)) and let A be its diagonalization, i.e., the formula E(⌜E(x)⌝). Since Ddiag represents diag, and diag( #E(x)#) = #A#, Q can derive Ddiag(⌜E(x)⌝,⌜A⌝) (5.1) ∀y (Ddiag(⌜E(x)⌝, y) → y = ⌜A⌝). (5.2) Now we show that Q ⊢ A↔ B(⌜A⌝). We argue informally, using just logic and facts derivable in Q . First, suppose A, i.e., E(⌜E(x)⌝). Going back to the definition of E(x), we see that E(⌜E(x)⌝) just is ∃y (Ddiag(⌜E(x)⌝, y) ∧ B(y)). Consider such a y . Since Ddiag(⌜E(x)⌝, y), by eq. (5.2), y = ⌜A⌝. So, from B(y) we have B(⌜A⌝). Now suppose B(⌜A⌝). By eq. (5.1), we have Ddiag(⌜E(x)⌝,⌜A⌝) ∧B(⌜A⌝). It follows that ∃y (Ddiag(⌜E(x)⌝, y) ∧ B(y)). But that's just E(⌜E⌝), i.e., A. □ You should compare this to the proof of the fixed-point lemma in computability theory. The difference is that here we want to define a statement in terms of itself, whereas there we wanted to define a function in terms of itself; this difference aside, it is really the same idea. 106 CHAPTER 5. INCOMPLETENESS AND PROVABILITY 5.3 The First Incompleteness Theorem We can now describe Gödel's original proof of the first incompleteness theorem. Let T be any computably axiomatized theory in a language extending the language of arithmetic, such that T includes the axioms of Q . This means that, in particular, T represents computable functions and relations. We have argued that, given a reasonable coding of formulas and proofs as numbers, the relation PrfT (x, y) is computable, where PrfT (x, y) holds if and only if x is the Gödel number of a derivation of the formula with Gödel number y in T. In fact, for the particular theory that Gödel had in mind, Gödel was able to show that this relation is primitive recursive, using the list of 45 functions and relations in his paper. The 45th relation, xBy , is just PrfT (x, y) for his particular choice of T. Remember that where Gödel uses the word "recursive" in his paper, we would now use the phrase "primitive recursive." Since PrfT (x, y) is computable, it is representable in T. We will use PrfT (x, y) to refer to the formula that represents it. Let ProvT (y) be the formula ∃x PrfT (x, y). This describes the 46th relation, Bew(y), on Gödel's list. As Gödel notes, this is the only relation that "cannot be asserted to be recursive." What he probably meant is this: from the definition, it is not clear that it is computable; and later developments, in fact, show that it isn't. Let T be an axiomatizable theory containing Q . Then PrfT (x, y) is decidable, hence representable in Q by a formula PrfT (x, y). Let ProvT (y) be the formula we described above. By the fixed-point lemma, there is a formulaGT such thatQ (and hence T) derives GT ↔¬ProvT (⌜GT⌝). (5.3) Note that GT says, in essence, "GT is not derivable in T." 107 5.3. THE FIRST INCOMPLETENESS THEOREM Lemma 5.3. If T is a consistent, axiomatizable theory extendingQ , then T ⊬ GT. Proof. Suppose T derives GT. Then there is a derivation, and so, for some number m, the relation PrfT (m, #GT#) holds. But then Q derives the sentence PrfT (m,⌜GT⌝). So Q derives ∃x PrfT (x,⌜GT⌝), which is, by definition, ProvT (⌜GT⌝). By eq. (5.3), Q derives ¬GT, and since T extends Q , so does T. We have shown that if T derives GT, then it also derives ¬GT, and hence it would be inconsistent. □ Definition 5.4. A theory T is ω-consistent if the following holds: if ∃x A(x) is any sentence and T derives ¬A(0), ¬A(1), ¬A(2), . . . then T does not prove ∃x A(x). Note that every ω-consistent theory is also consistent. This follows simply from the fact that if T is inconsistent, then T ⊢ A for every A. In particular, if T is inconsistent, it derives both ¬A(n) for every n and also derives ∃x A(x). So, if T is inconsistent, it is ω-inconsistent. By contraposition, if T is ω-consistent, it must be consistent. Lemma 5.5. If T is an ω-consistent, axiomatizable theory extending Q , then T ⊬ GT. Proof. We show that if T derives ¬GT, then it is ω-inconsistent. Suppose T derives ¬GT. If T is inconsistent, it is ω-inconsistent, and we are done. Otherwise, T is consistent, so it does not derive GT by Lemma 5.3. Since there is no derivation of GT in T, Q derives ¬PrfT (0,⌜GT⌝),¬PrfT (1,⌜GT⌝),¬PrfT (2,⌜GT⌝), . . . and so doesT. On the other hand, by eq. (5.3), ¬GT is equivalent to ∃x PrfT (x,⌜GT⌝). So T is ω-inconsistent. □ 108 CHAPTER 5. INCOMPLETENESS AND PROVABILITY Theorem 5.6. Let T be any ω-consistent, axiomatizable theory extending Q . Then T is not complete. Proof. If T is ω-consistent, it is consistent, so T ⊬ GT by Lemma 5.3. By Lemma 5.5, T ⊬ ¬GT. This means that T is incomplete, since it derives neither GT nor ¬GT. □ 5.4 Rosser's Theorem Can we modify Gödel's proof to get a stronger result, replacing "ω-consistent" with simply "consistent"? The answer is "yes," using a trick discovered by Rosser. Rosser's trick is to use a "modified" derivability predicate RProvT (y) instead of ProvT (y). Theorem 5.7. Let T be any consistent, axiomatizable theory extending Q . Then T is not complete. Proof. Recall that ProvT (y) is defined as ∃x PrfT (x, y), where PrfT (x, y) represents the decidable relation which holds iff x is the Gödel number of a derivation of the sentence with Gödel number y . The relation that holds between x and y if x is the Gödel number of a refutation of the sentence with Gödel number y is also decidable. Let not(x) be the primitive recursive function which does the following: if x is the code of a formula A, not(x) is a code of ¬A. Then RefT (x, y) holds iff PrfT (x,not(y)). Let RefT (x, y) represent it. Then, if T ⊢ ¬A and δ is a corresponding derivation, Q ⊢ RefT (⌜δ⌝,⌜A⌝). We define RProvT (y) as ∃x (PrfT (x, y) ∧ ∀z (z < x →¬RefT (z, y))). Roughly, RProvT (y) says "there is a proof of y inT, and there is no shorter refutation of y ." Assuming T is consistent, RProvT (y) is true of the same numbers as ProvT (y); but from the point of view of provability in T (and we now know that there is a difference between truth and provability!) the two have different properties. If T is inconsistent, then the two do not hold of the same numbers! (RProvT (y) is often read as "y is Rosser provable." Since, 109 5.4. ROSSER'S THEOREM as just discussed, Rosser provability is not some special kind of provability-in inconsistent theories, there are sentences that are provable but not Rosser provable-this may be confusing. To avoid the confusion, you could instead read it as "y is shmovable.") By the fixed-point lemma, there is a formula RT such that Q ⊢ RT ↔¬RProvT (⌜RT⌝). (5.4) In contrast to the proof of Theorem 5.6, here we claim that if T is consistent, T doesn't derive RT, and T also doesn't derive ¬RT. (In other words, we don't need the assumption of ω-consistency.) First, let's show that T ⊬ RT . Suppose it did, so there is a derivation of RT from T ; let n be its Gödel number. Then Q ⊢ PrfT (n,⌜RT ⌝), since PrfT represents PrfT in Q . Also, for each k < n, k is not the Gödel number of ¬RT , since T is consistent. So for each k < n, Q ⊢ ¬RefT (k,⌜RT ⌝). By Lemma 4.23, Q ⊢ ∀z (z < n→¬RefT (z,⌜RT ⌝)). Thus, Q ⊢ ∃x (PrfT (x,⌜RT ⌝) ∧ ∀z (z < x →¬RefT (z,⌜RT ⌝))), but that's just RProvT (⌜RT ⌝). By eq. (5.4), Q ⊢ ¬RT . Since T extends Q , also T ⊢ ¬RT . We've assumed that T ⊢ RT , so T would be inconsistent, contrary to the assumption of the theorem. Now, let's show that T ⊬ ¬RT . Again, suppose it did, and suppose n is the Gödel number of a derivation of ¬RT . Then RefT (n, #RT #) holds, and since RefT represents RefT in Q , Q ⊢ RefT (n,⌜RT ⌝). We'll again show that T would then be inconsistent because it would also derive RT . Since Q ⊢ RT ↔¬RProvT (⌜RT ⌝), and since T extends Q , it suffices to show that Q ⊢ ¬RProvT (⌜RT ⌝). The sentence ¬RProvT (⌜RT ⌝), i.e., ¬∃x (PrfT (x,⌜RT ⌝) ∧ ∀z (z < x →¬RefT (z,⌜RT ⌝))) 110 CHAPTER 5. INCOMPLETENESS AND PROVABILITY is logically equivalent to ∀x (PrfT (x,⌜RT ⌝) → ∃z (z < x ∧ RefT (z,⌜RT ⌝))) We argue informally using logic, making use of facts about what Q derives. Suppose x is arbitrary and PrfT (x,⌜RT ⌝). We already know that T ⊬ RT , and so for every k , Q ⊢ ¬PrfT (k,⌜RT ⌝). Thus, for every k it follows that x ≠ k . In particular, we have (a) that x ≠ n. We also have ¬(x = 0 ∨ x = 1 ∨ * * * ∨ x = n − 1) and so by Lemma 4.23, (b) ¬(x < n). By Lemma 4.24, n < x . Since Q ⊢ RefT (n,⌜RT ⌝), we have n < x ∧ RefT (n,⌜RT ⌝), and from that ∃z (z < x ∧ RefT (z,⌜RT ⌝)). Since x was arbitrary we get, as required, that ∀x (PrfT (x,⌜RT ⌝) → ∃z (z < x ∧ RefT (z,⌜RT ⌝))). □ 5.5 Comparison with Gödel's Original Paper It is worthwhile to spend some time with Gödel's 1931 paper. The introduction sketches the ideas we have just discussed. Even if you just skim through the paper, it is easy to see what is going on at each stage: first Gödel describes the formal system P (syntax, axioms, proof rules); then he defines the primitive recursive functions and relations; then he shows that xBy is primitive recursive, and argues that the primitive recursive functions and relations are represented in P. He then goes on to prove the incompleteness theorem, as above. In section 3, he shows that one can take the unprovable assertion to be a sentence in the language of arithmetic. This is the origin of the β -lemma, which is what we also used to handle sequences in showing that the recursive functions are representable in Q . Gödel doesn't go so far to isolate a minimal set of axioms that suffice, but we now know that Q will do the trick. Finally, in Section 4, he sketches a proof of the second incompleteness theorem. 111 5.6. THE DERIVABILITY CONDITIONS FOR PA 5.6 The Derivability Conditions for PA Peano arithmetic, or PA, is the theory extending Q with induction axioms for all formulas. In other words, one adds to Q axioms of the form (A(0) ∧ ∀x (A(x) → A(x ′))) → ∀x A(x) for every formula A. Notice that this is really a schema, which is to say, infinitely many axioms (and it turns out that PA is not finitely axiomatizable). But since one can effectively determine whether or not a string of symbols is an instance of an induction axiom, the set of axioms for PA is computable. PA is a muchmore robust theory than Q . For example, one can easily prove that addition and multiplication are commutative, using induction in the usual way. In fact, most finitary number-theoretic and combinatorial arguments can be carried out in PA. Since PA is computably axiomatized, the derivability predicate PrfPA(x, y) is computable and hence represented in Q (and so, in PA). As before, we will take PrfPA(x, y) to denote the formula representing the relation. Let ProvPA(y) be the formula ∃x PrfPA(x, y), which, intuitively says, "y is provable from the axioms of PA." The reason we need a little bit more than the axioms of Q is we need to know that the theory we are using is strong enough to derive a few basic facts about this derivability predicate. In fact, what we need are the following facts: P1. If PA ⊢ A, then PA ⊢ ProvPA(⌜A⌝) P2. For all formulas A and B , PA ⊢ ProvPA(⌜A→ B⌝) → (ProvPA(⌜A⌝) → ProvPA(⌜B⌝)) P3. For every formula A, PA ⊢ ProvPA(⌜A⌝) → ProvPA(⌜ProvPA(⌜A⌝)⌝). 112 CHAPTER 5. INCOMPLETENESS AND PROVABILITY The only way to verify that these three properties hold is to describe the formula ProvPA(y) carefully and use the axioms of PA to describe the relevant formal proofs. Conditions (1) and (2) are easy; it is really condition (3) that requires work. (Think about what kind of work it entails . . . ) Carrying out the details would be tedious and uninteresting, so here we will ask you to take it on faith that PA has the three properties listed above. A reasonable choice of ProvPA(y) will also satisfy P4. If PA ⊢ ProvPA(⌜A⌝), then PA ⊢ A. But we will not need this fact. Incidentally, Gödel was lazy in the same way we are being now. At the end of the 1931 paper, he sketches the proof of the second incompleteness theorem, and promises the details in a later paper. He never got around to it; since everyone who understood the argument believed that it could be carried out (he did not need to fill in the details.) 5.7 The Second Incompleteness Theorem How can we express the assertion that PA doesn't prove its own consistency? Saying PA is inconsistent amounts to saying that PA ⊢ 0 = 1. So we can take the consistency statement ConPA to be the sentence ¬ProvPA(⌜0 = 1⌝), and then the following theorem does the job: Theorem 5.8. Assuming PA is consistent, then PA does not derive ConPA. It is important to note that the theorem depends on the particular representation of ConPA (i.e., the particular representation of ProvPA(y)). All we will use is that the representation of ProvPA(y) satisfies the three derivability conditions, so the theorem generalizes to any theory with a derivability predicate having these properties. 113 5.7. THE SECOND INCOMPLETENESS THEOREM It is informative to read Gödel's sketch of an argument, since the theorem follows like a good punch line. It goes like this. Let GPA be the Gödel sentence that we constructed in the proof of Theorem 5.6. We have shown "If PA is consistent, then PA does not derive GPA." If we formalize this in PA, we have a proof of ConPA →¬ProvPA(⌜GPA⌝). Now suppose PA derives ConPA. Then it derives ¬ProvPA(⌜GPA⌝). But since GPA is a Gödel sentence, this is equivalent to GPA. So PA derives GPA. But: we know that if PA is consistent, it doesn't derive GPA! So if PA is consistent, it can't derive ConPA. To make the argument more precise, we will let GPA be the Gödel sentence for PA and use the derivability conditions (P1)– (P3) to show that PA derives ConPA →GPA. This will show that PA doesn't derive ConPA. Here is a sketch of the proof, in PA. (For simplicity, we drop the PA subscripts.) G ↔¬Prov(⌜G⌝) (5.5) G is a Gödel sentence G →¬Prov(⌜G⌝) (5.6) from eq. (5.5) G → (Prov(⌜G⌝) → ⊥) (5.7) from eq. (5.6) by logic Prov(⌜G → (Prov(⌜G⌝) → ⊥)⌝) (5.8) by from eq. (5.7) by condition P1 Prov(⌜G⌝) → Prov(⌜(Prov(⌜G⌝) → ⊥)⌝) (5.9) from eq. (5.8) by condition P2 Prov(⌜G⌝) → (Prov(⌜Prov(⌜G⌝)⌝) → Prov(⌜⊥⌝)) (5.10) from eq. (5.9) by condition P2 and logic Prov(⌜G⌝) → Prov(⌜Prov(⌜G⌝)⌝) (5.11) by P3 Prov(⌜G⌝) → Prov(⌜⊥⌝) (5.12) 114 CHAPTER 5. INCOMPLETENESS AND PROVABILITY from eq. (5.10) and eq. (5.11) by logic Con →¬Prov(⌜G⌝) (5.13) contraposition of eq. (5.12) and Con ≡ ¬Prov(⌜⊥⌝) Con →G from eq. (5.5) and eq. (5.13) by logic The use of logic in the above just elementary facts from propositional logic, e.g., eq. (5.7) uses ⊢ ¬A↔ (A→⊥) and eq. (5.12) uses A → (B → C ),A → B ⊢ A → C . The use of condition P2 in eq. (5.9) and eq. (5.10) relies on instances of P2, Prov(⌜A→ B⌝)→(Prov(⌜A⌝)→Prov(⌜B⌝)). In the first one, A ≡ G and B ≡ Prov(⌜G⌝)→⊥; in the second, A ≡ Prov(⌜G⌝) and B ≡ ⊥. The more abstract version of the second incompleteness theorem is as follows: Theorem 5.9. Let T be any consistent, axiomatized theory extending Q and let ProvT (y) be any formula satisfying derivability conditions P1–P3 for T. Then T does not derive ConT . The moral of the story is that no "reasonable" consistent theory for mathematics can derive its own consistency statement. Suppose T is a theory of mathematics that includes Q and Hilbert's "finitary" reasoning (whatever that may be). Then, the whole of T cannot derive the consistency statement of T, and so, a fortiori, the finitary fragment can't derive the consistency statement of T either. In that sense, there cannot be a finitary consistency proof for "all of mathematics." There is some leeway in interpreting the term "finitary," and Gödel, in the 1931 paper, grants the possibility that something we may consider "finitary" may lie outside the kinds of mathematics Hilbert wanted to formalize. But Gödel was being charitable; today, it is hard to see how we might find something that can reasonably be called finitary but is not formalizable in, say, ZFC. 115 5.8. LÖB'S THEOREM 5.8 Löb's Theorem The Gödel sentence for a theory T is a fixed point of ¬ProvT (x), i.e., a sentence G such that T ⊢ ¬ProvT (⌜G⌝) ↔G . It is not derivable, because if T ⊢ G , (a) by derivability condition (1), T ⊢ ProvT (⌜G⌝), and (b) T ⊢ G together with T ⊢ ¬ProvT (⌜G⌝) ↔G gives T ⊢ ¬ProvT (⌜G⌝), and so T would be inconsistent. Now it is natural to ask about the status of a fixed point of ProvT (x), i.e., a sentence H such that T ⊢ ProvT (⌜H ⌝) ↔H . If it were derivable, T ⊢ ProvT (⌜H ⌝) by condition (1), but the same conclusion follows if we apply modus ponens to the equivalence above. Hence, we don't get that T is inconsistent, at least not by the same argument as in the case of the Gödel sentence. This of course does not show that T does derive H . We can make headway on this question if we generalize it a bit. The left-to-right direction of the fixed point equivalence, ProvT (⌜H ⌝) → H , is an instance of a general schema called a reflection principle: ProvT (⌜A⌝) → A. It is called that because it expresses, in a sense, thatT can "reflect" about what it can derive; basically it says, "If T can derive A, then A is true," for any A. This is true for sound theories only, of course, and this suggests that theories will in general not derive every instance of it. So which instances can a theory (strong enough, and satisfying the derivability conditions) derive? Certainly all those where A itself is derivable. And that's it, as the next result shows. Theorem 5.10. Let T be an axiomatizable theory extendingQ , and suppose ProvT (y) is a formula satisfying conditions P1–P3 from section 5.7. If T derives ProvT (⌜A⌝) → A, then in fact T derives A. Put differently, if T ⊬ A, then T ⊬ ProvT (⌜A⌝) → A. This result is known as Löb's theorem. 116 CHAPTER 5. INCOMPLETENESS AND PROVABILITY The heuristic for the proof of Löb's theorem is a clever proof that Santa Claus exists. (If you don't like that conclusion, you are free to substitute any other conclusion you would like.) Here it is: 1. Let X be the sentence, "If X is true, then Santa Claus exists." 2. Suppose X is true. 3. Then what it says holds; i.e., we have: if X is true, then Santa Claus exists. 4. Since we are assuming X is true, we can conclude that Santa Claus exists, by modus ponens from (2) and (3). 5. We have succeeded in deriving (4), "Santa Claus exists," from the assumption (2), "X is true." By conditional proof, we have shown: "If X is true, then Santa Claus exists." 6. But this is just the sentence X . So we have shown that X is true. 7. But then, by the argument (2)–(4) above, Santa Claus exists. A formalization of this idea, replacing "is true" with "is derivable," and "Santa Claus exists" with A, yields the proof of Löb's theorem. The trick is to apply the fixed-point lemma to the formula ProvT (y) → A. The fixed point of that corresponds to the sentence X in the preceding sketch. Proof of Theorem 5.10. SupposeA is a sentence such thatT derives ProvT (⌜A⌝)→A. Let B(y) be the formula ProvT (y)→A, and use the fixed-point lemma to find a sentence D such that T derives D ↔ B(⌜D⌝). Then each of the following is derivable in T: D ↔ (ProvT (⌜D⌝) → A) (5.14) D is a fixed point of B(y) 117 5.8. LÖB'S THEOREM D → (ProvT (⌜D⌝) → A) (5.15) from eq. (5.14) ProvT (⌜D → (ProvT (⌜D⌝) → A)⌝) (5.16) from eq. (5.15) by condition P1 ProvT (⌜D⌝) → ProvT (⌜ProvT (⌜D⌝) → A⌝) (5.17) from eq. (5.16) using condition P2 ProvT (⌜D⌝) → (ProvT (⌜ProvT (⌜D⌝)⌝) → ProvT (⌜A⌝)) (5.18) from eq. (5.17) using P2 again ProvT (⌜D⌝) → ProvT (⌜ProvT (⌜D⌝)⌝) (5.19) by derivability condition P3 ProvT (⌜D⌝) → ProvT (⌜A⌝) (5.20) from eq. (5.18) and eq. (5.19) ProvT (⌜A⌝) → A (5.21) by assumption of the theorem ProvT (⌜D⌝) → A (5.22) from eq. (5.20) and eq. (5.21) (ProvT (⌜D⌝) → A) →D (5.23) from eq. (5.14) D (5.24) from eq. (5.22) and eq. (5.23) ProvT (⌜D⌝) (5.25) from eq. (5.24) by condition P1 A from eq. (5.21) and eq. (5.25) □ With Löb's theorem in hand, there is a short proof of the first incompleteness theorem (for theories having a derivability predicate satisfying conditions P1–P3: if T ⊢ ProvT (⌜⊥⌝) → ⊥, then T ⊢ ⊥. If T is consistent, T ⊬ ⊥. So, T ⊬ ProvT (⌜⊥⌝) → ⊥, i.e., T ⊬ ConT. We can also apply it to show that H , the fixed point of ProvT (x), is derivable. For since T ⊢ ProvT (⌜H ⌝) ↔H 118 CHAPTER 5. INCOMPLETENESS AND PROVABILITY in particular T ⊢ ProvT (⌜H ⌝) →H and so by Löb's theorem, T ⊢ H . 5.9 The Undefinability of Truth The notion of definability depends on having a formal semantics for the language of arithmetic. We have described a set of formulas and sentences in the language of arithmetic. The "intended interpretation" is to read such sentences as making assertions about the natural numbers, and such an assertion can be true or false. Let N be the structure with domain N and the standard interpretation for the symbols in the language of arithmetic. Then N ⊨ A means "A is true in the standard interpretation." Definition 5.11. A relation R(x1, . . . ,xk ) of natural numbers is definable in N if and only if there is a formula A(x1, . . . ,xk ) in the language of arithmetic such that for every n1, . . . ,nk ,R(n1, . . . ,nk ) if and only if N ⊨ A(n1, . . . ,nk ). Put differently, a relation is definable in N if and only if it is representable in the theory TA, where TA = {A : N ⊨ A} is the set of true sentences of arithmetic. (If this is not immediately clear to you, you should go back and check the definitions and convince yourself that this is the case.) Lemma 5.12. Every computable relation is definable in N. Proof. It is easy to check that the formula representing a relation in Q defines the same relation in N. □ Now one can ask, is the converse also true? That is, is every relation definable in N computable? The answer is no. For example: 119 5.9. THE UNDEFINABILITY OF TRUTH Lemma 5.13. The halting relation is definable in N. Proof. Let H be the halting relation, i.e., H = {⟨e,x⟩ : ∃s T (e,x, s )}. Let DT define T in N. Then H = {⟨e,x⟩ : N ⊨ ∃s DT (e,x, s )}, so ∃s DT (z,x, s ) defines H in N. □ What about TA itself? Is it definable in arithmetic? That is: is the set { #A# : N ⊨ A} definable in arithmetic? Tarski's theorem answers this in the negative. Theorem 5.14. The set of true sentences of arithmetic is not definable in arithmetic. Proof. Suppose D(x) defined it, i.e., N ⊨ A iff N ⊨ D(⌜A⌝). By the fixed-point lemma, there is a formula A such that Q ⊢ A↔ ¬D(⌜A⌝), and hence N ⊨ A ↔ ¬D(⌜A⌝). But then N ⊨ A if and only if N ⊨ ¬D(⌜A⌝), which contradicts the fact that D(y) is supposed to define the set of true statements of arithmetic. □ Tarski applied this analysis to a more general philosophical notion of truth. Given any language L, Tarski argued that an adequate notion of truth for L would have to satisfy, for each sentence X , 'X ' is true if and only if X . Tarski's oft-quoted example, for English, is the sentence 'Snow is white' is true if and only if snow is white. 120 CHAPTER 5. INCOMPLETENESS AND PROVABILITY However, for any language strong enough to represent the diagonal function, and any linguistic predicate T (x), we can construct a sentence X satisfying "X if and only if not T ('X ')." Given that we do not want a truth predicate to declare some sentences to be both true and false, Tarski concluded that one cannot specify a truth predicate for all sentences in a language without, somehow, stepping outside the bounds of the language. In other words, a the truth predicate for a language cannot be defined in the language itself. Summary The first incompleteness theorem states that for any consistent, axiomatizable theory T that extends Q , there is a sentence GT such that T ⊬ GT. GT is constructed in such a way that GT, in a roundabout way, says "T does not prove GT." Since T does not, in fact, prove it, what it says is true. If N ⊨ T, then T does not prove any false claims, so T ⊬ ¬GT. Such a sentence is independent or undecidable in T. Gödel's original proof established that GT is independent on the assumption that T is ω-consistent. Rosser improved the result by finding a different sentence RT with is neither provable nor refutable in T as long as T hT is simply consistent. The construction of GT is effective: given an axiomatization of T we could, in principle, write down GT. The "roundabout way" in whichGT states its own unprovability, is a special case of a general result, the fixed-point lemma. It states that for any formula B(y) inLA, there is a sentence A such thatQ ⊢ A↔B(⌜A⌝). (Here, ⌜A⌝ is the standard numeral for the Gödel number of A, i.e., #A#.) To obtain GT, we use the formula ¬ProvT(y) as B(y). We get ProvT as the culmination of our previous efforts: We know that PrfT(n,m), which holds if n is the Gödel number of a derivation of the sentence with Gödel number m from T, is primitive recursive. We also know that Q represents all primitive recursive relations, and so there is some formula PrfT(x, y) that repre121 5.9. THE UNDEFINABILITY OF TRUTH sents PrfT in Q . The provability predicate for T is ProvT(y) is ∃x PrfT hT (x, y) then expresses provability in T. (It doesn't represent it though: if T ⊢ A, then Q ⊢ ProvT(⌜A⌝); but if T ⊬ A, then Q does not in general prove ¬ProvT(⌜A⌝).) The second incompleteness theorem establishes that the sentence ConT that expresses that T is consistent, i.e., T also does not prove ¬ProvT(⌜⊥⌝). The proof of the second incompleteness theorem requires some additional conditions on T, the provability conditions. PA satisfies them, althoughQ does not. Theories that satisfy the provability conditions also satisfy Löb's theorem: T ⊢ ProvT(⌜A⌝) → A iff T ⊢ A. The fixed-point theorem also has another important consequence. We say a property Rn is definable in LA if there is a formula AR(x) such that N ⊨ AR(n) iff Rn holds. For instance, ProvT is definable, since ProvT defines it. The property n has iff it is the Gödel number of a sentence true in N, however, is not definable. This is Tarski's theorem about the undefinability of truth. Problems Problem 5.1. Every ω-consistent theory is consistent. Show that the converse does not hold, i.e., that there are consistent but ωinconsistent theories. Do this by showing that Q ∪ {¬GQ } is consistent but ω-inconsistent. Problem 5.2. Show that PA derives GPA → ConPA. Problem 5.3. Let T be a computably axiomatized theory, and let ProvT be a derivability predicate forT. Consider the following four statements: 1. If T ⊢ A, then T ⊢ ProvT (⌜A⌝). 2. T ⊢ A→ ProvT (⌜A⌝). 3. If T ⊢ ProvT (⌜A⌝), then T ⊢ A. 122 CHAPTER 5. INCOMPLETENESS AND PROVABILITY 4. T ⊢ ProvT (⌜A⌝) → A Under what conditions are each of these statements true? Problem 5.4. Show that Q (n) ⇔ n ∈ { #A# : Q ⊢ A} is definable in arithmetic. CHAPTER 6 Models of Arithmetic 6.1 Introduction The standard model of arithmetic is the structure N with |N | = N in which 0, ′, +, ×, and < are interpreted as you would expect. That is, 0 is 0, ′ is the successor function, + is interpeted as addition and × as multiplication of the numbers inN. Specifically, 0N = 0 ′N(n) = n + 1 +N(n,m) = n +m ×N(n,m) = nm Of course, there are structures for LA that have domains other than N. For instance, we can take M with domain |M | = {a}∗ (the finite sequences of the single symbol a, i.e., ∅, a, aa, aaa, . . . ), and interpretations 0M = ∅ ′M(s ) = s ⌒ a +M(n,m) = an+m 123 124 CHAPTER 6. MODELS OF ARITHMETIC ×M(n,m) = anm These two structures are "essentially the same" in the sense that the only difference is the elements of the domains but not how the elements of the domains are related among each other by the interpretation functions. We say that the two structures are isomorphic. It is an easy consequence of the compactness theorem that any theory true in N also has models that are not isomorphic to N. Such structures are called non-standard. The interesting thing about them is that while the elements of a standard model (i.e., N, but also all structures isomorphic to it) are exhausted by the values of the standard numerals n, i.e., |N | = {ValN(n) : n ∈ N} that isn't the case in non-standard models: if M is non-standard, then there is at least one x ∈ |M | such that x ≠ ValM(n) for all n. These non-standard elements are pretty neat: they are "infinite natural numbers." But their existence also explains, in a sense, the incompleteness phenomena. Consider an example, e.g., the consistency statement for Peano arithmetic, ConPA, i.e., ¬∃x PrfPA(x,⌜⊥⌝). Since PA neither proves ConPA nor ¬ConPA, either can be consistently added to PA. Since PA is consistent, N ⊨ ConPA, and consequently N ⊭ ¬ConPA. So N is not a model of PA∪{¬ConPA}, and all its models must be nonstandard. Models of PA ∪ {¬ConPA} must contain some element that serves as the witness that makes ∃x PrfPA(⌜⊥⌝) true, i.e., a Gödel number of a derivation of a contradiction from PA. Such an element can't be standard-since PA ⊢ ¬PrfPA(n,⌜⊥⌝) for every n. 6.2 Reducts and Expansions Often it is useful or necessary to compare languages which have symbols in common, as well as structures for these languages. The most comon case is when all the symbols in a language L 125 6.3. ISOMORPHIC STRUCTURES are also part of a language L′, i.e., L ⊆ L′. An L-structure M can then always be expanded to an L′-structure by adding interpretations of the additional symbols while leaving the interpretations of the common symbols the same. On the other hand, from an L′-structure M′ we can obtain an L-structure simpy by "forgetting" the interpretations of the symbols that do not occur in L. Definition 6.1. Suppose L ⊆ L′, M is an L-structure and M′ is an L′-structure. M is the reduct of M′ to L, and M′ is an expansion of M to L′ iff 1. |M | = |M′ | 2. For every constant symbol c ∈ L, cM = cM ′ . 3. For every function symbol f ∈ L, f M = f M ′ . 4. For every predicate symbol P ∈ L, PM = PM ′ . Proposition 6.2. If an L-structure M is a reduct of an L′-structure M′, then for all L-sentences A, M ⊨ A iff M′ ⊨ A. Proof. Exercise. □ Definition 6.3. When we have an L-structure M, and L′ = L∪ {P } is the expansion of L obtained by adding a single n-place predicate symbol P , and R ⊆ |M |n is an n-place relation, then we write (M,R) for the expansion M′ of M with PM ′ = R. 6.3 Isomorphic Structures First-order structures can be alike in one of two ways. One way in which the can be alike is that they make the same sentences 126 CHAPTER 6. MODELS OF ARITHMETIC true. We call such structures elementarily equivalent. But structures can be very different and still make the same sentences true-for instance, one can be countable and the other not. This is because there are lots of features of a structure that cannot be expressed in first-order languages, either because the language is not rich enough, or because of fundamental limitations of first-order logic such as the Löwenheim-Skolem theorem. So another, stricter, aspect in which structures can be alike is if they are fundamentally the same, in the sense that they only differ in the objects that make them up, but not in their structural features. A way of making this precise is by the notion of an isomorphism. Definition 6.4. Given two structures M and M′ for the same languageL, we say that M is elementarily equivalent to M′, written M ≡ M′, if and only if for every sentence A of L, M ⊨ A iff M′ ⊨ A. Definition 6.5. Given two structures M and M′ for the same language L, we say that M is isomorphic to M′, written M ≃ M′, if and only if there is a function h : |M | → |M′ | such that: 1. h is injective: if h(x) = h(y) then x = y ; 2. h is surjective: for every y ∈ |M′ | there is x ∈ |M | such that h(x) = y ; 3. for every constant symbol c : h(cM) = cM ′ ; 4. for every n-place predicate symbol P : ⟨a1, . . . ,an⟩ ∈ PM iff ⟨h(a1), . . . ,h(an)⟩ ∈ PM ′ ; 5. for every n-place function symbol f : h(f M(a1, . . . ,an)) = f M ′ (h(a1), . . . ,h(an)). 127 6.3. ISOMORPHIC STRUCTURES Theorem 6.6. If M ≃ M′ then M ≡ M′. Proof. Let h be an isomorphism of M onto M′. For any assignment s , h ◦ s is the composition of h and s , i.e., the assignment in M′ such that (h ◦ s )(x) = h(s (x)). By induction on t and A one can prove the stronger claims: a. h(ValMs (t )) = Val M′ h◦s (t ). b. M, s ⊨ A iff M′,h ◦ s ⊨ A. The first is proved by induction on the complexity of t . 1. If t ≡ c , then ValMs (c ) = c M and ValM ′ h◦s (c ) = c M′. Thus, h(ValMs (t )) = h(c M) = cM ′ (by (3) of Definition 6.5) = ValM ′ h◦s (t ). 2. If t ≡ x , then ValMs (x) = s (x) and Val M′ h◦s (x) = h(s (x)). Thus, h(ValMs (x)) = h(s (x)) = Val M′ h◦s (x). 3. If t ≡ f (t1, . . . , tn), then ValMs (t ) = f M(ValMs (t1), . . . ,Val M s (tn)) and ValM ′ h◦s (t ) = f M(ValM ′ h◦s (t1), . . . ,Val M′ h◦s (tn)). The induction hypothesis is that for each i , h(ValMs (ti )) = ValM ′ h◦s (ti ). So, h(ValMs (t )) = h(f M(ValMs (t1), . . . ,Val M s (tn)) = h(f M(ValM ′ h◦s (t1), . . . ,Val M′ h◦s (tn)) (6.1) = f M ′ (ValM ′ h◦s (t1), . . . ,Val M′ h◦s (tn)) (6.2) = ValM ′ h◦s (t ) Here, eq. (6.1) follows by induction hypothesis and eq. (6.2) by (5) of Definition 6.5. Part (2) is left as an exercise. If A is a sentence, the assignments s and h ◦ s are irrelevant, and we have M ⊨ A iff M′ ⊨ A. □ 128 CHAPTER 6. MODELS OF ARITHMETIC Definition 6.7. An automorphism of a structure M is an isomorphism of M onto itself. 6.4 The Theory of a Structure Every structure M makes some sentences true, and some false. The set of all the sentences it makes true is called its theory. That set is in fact a theory, since anything it entails must be true in all its models, including M. Definition 6.8. Given a structure M, the theory of M is the set Th(M) of sentences that are true in M, i.e., Th(M) = {A : M ⊨ A}. We also use the term "theory" informally to refer to sets of sentences having an intended interpretation, whether deductively closed or not. Proposition 6.9. For any M, Th(M) is complete. Proof. For any sentence A either M ⊨ A or M ⊨ ¬A, so either A ∈ Th(M) or ¬A ∈ Th(M). □ Proposition 6.10. If N |= A for every A ∈ Th(M), then M ≡ N. Proof. Since N ⊨ A for all A ∈ Th(M), Th(M) ⊆ Th(N). If N ⊨ A, then N ⊭ ¬A, so ¬A ∉ Th(M). Since Th(M) is complete, A ∈ Th(M). So, Th(N) ⊆ Th(M), and we have M ≡ N. □ Remark 1. Consider R = ⟨R, <⟩, the structure whose domain is the set R of the real numbers, in the language comprising only a 2-place predicate symbol interpreted as the < relation over the reals. Clearly R is uncountable; however, since Th(R) is obviously consistent, by the Löwenheim-Skolem theorem it has a countable model, say S, and by Proposition 6.10, R ≡ S. Moreover, since R and S are not isomorphic, this shows that the converse of Theorem 6.6 fails in general. 129 6.5. STANDARD MODELS OF ARITHMETIC 6.5 Standard Models of Arithmetic The language of arithmetic LA is obviously intended to be about numbers, specifically, about natural numbers. So, "the" standard model N is special: it is the model we want to talk about. But in logic, we are often just interested in structural properties, and any two structures taht are isomorphic share those. So we can be a bit more liberal, and consider any structure that is isomorphic to N "standard." Definition 6.11. A structure forLA is standard if it is isomorphic to N. Proposition 6.12. If a structure M standard, its domain is the set of values of the standard numerals, i.e., |M | = {ValM(n) : n ∈ N} Proof. Clearly, every ValM(n) ∈ |M |. We just have to show that every x ∈ |M | is equal toValM(n) for some n. Since M is standard, it is isomorphic to N. Suppose g : N → |M | is an isomorphism. Then g (n) = g (ValN(n)) = ValM(n). But for every x ∈ |M |, there is an n ∈ N such that g (n) = x , since g is surjective. □ If a structure M for LA is standard, the elements of its domain can all be named by the standard numerals 0, 1, 2, . . . , i.e., the terms 0, 0′, 0′′, etc. Of course, this does not mean that the elements of |M | are the numbers, just that we can pick them out the same way we can pick out the numbers in |N |. Proposition 6.13. If M ⊨ Q , and |M | = {ValM(n) : n ∈ N}, then M is standard. Proof. We have to show that M is isomorphic to N. Consider the function g : N→ |M | defined by g (n) = ValM(n). By the hypothesis, g is surjective. It is also injective: Q ⊢ n ≠ m whenever 130 CHAPTER 6. MODELS OF ARITHMETIC n ≠ m. Thus, since M ⊨ Q , M ⊨ n ≠ m, whenever n ≠ m. Thus, if n ≠ m, then ValM(n) ≠ ValM(m), i.e., g (n) ≠ g (m). We also have to verify that g is an isomorphism. 1. We have g (0N) = g (0) since, 0N = 0. By definition of g , g (0) = ValM(0). But 0 is just 0, and the value of a term which happens to be a constant symbol is given by what the structure assigns to that constant symbol, i.e., ValM(0) = 0M. So we have g (0N) = 0M as required. 2. g (′N(n)) = g (n + 1), since ′ in N is the successor function on N. Then, g (n + 1) = ValM(n + 1) by definition of g . But n + 1 is the same term as n ′, so ValM(n + 1) = ValM(n ′). By the definition of the value function, this is = ′M(ValM(n)). Since ValM(n) = g (n) we get g (′N(n)) = ′M(g (n)). 3. g (+N(n,m)) = g (n + m), since + in N is the addition function on N. Then, g (n + m) = ValM(n +m) by definition of g . But Q ⊢ n +m = (n + m), so ValM(n +m) = ValM(n + m). By the definition of the value function, this is = +M(ValM(n),ValM(m)). Since ValM(n) = g (n) and ValM(m) = g (m), we get g (+N(n,m)) = +M(g (n), g (m)). 4. g (×N(n,m)) = ×M(g (n), g (m)): Exercise. 5. ⟨n,m⟩ ∈ <N iff n < m. If n < m, then Q ⊢ n < m, and also M ⊨ n < m. Thus ⟨ValM(n),ValM(m)⟩ ∈ <M, i.e., ⟨g (n), g (m)⟩ ∈ <M. If n ≮ m, then Q ⊢ ¬n < m, and consequently M ⊭ n < m. Thus, as before, ⟨g (n), g (m)⟩ ∉ <M. Together, we get: ⟨n,m⟩ ∈ <N iff ⟨g (n), g (m)⟩ ∈ <M. □ The function g is the most obvious way of defining a mapping from N to the domain of any other structure M for LA, since every such M contains elements named by 0, 1, 2, etc. So it isn't surprising that if M makes at least some basic statements about the n's true in the same way that N does, and g is also bijective, then g will turn into an isomorphism. In fact, if |M | contains no elements other than what the n's name, it's the only one. 131 6.5. STANDARD MODELS OF ARITHMETIC Proposition 6.14. If M is standard, then g from the proof of Proposition 6.13 is the only isomorphism from N to M. Proof. Suppose h : N → |M | is an isomorphism between N and M. We show that g = h by induction on n. If n = 0, then g (0) = 0M by definition of g . But since h is an isomorphism, h(0) = h(0N) = 0M, so g (0) = h(0). Now consider the case for n + 1. We have g (n + 1) = ValM(n + 1) by definition of g = ValM(n ′) since n + 1 ≡ n ′ = ′M(ValM(n)) by definition of ValM(t ′) = ′M(g (n)) by definition of g = ′M(h(n)) by induction hypothesis = h(′N(n)) since h is an isomorphism = h(n + 1) □ For any countably infinite set M , there's a bijection between N and M , so every such set M is potentially the domain of a standard model M. In fact, once you pick an object z ∈ M and a suitable function s as 0M and ′M, the interpretations of +, ×, and < is already fixed. Only functions s : M → M \ {z } that are both injective and surjective are suitable in a standard model as ′M. The range of s cannot contain z , since otherwise ∀x 0 ≠ x ′ would be false. That sentence is true in N, and so M also has to make it true. The function s has to be injective, since the successor function ′N in N is, and that ′N is injective is expressed by a sentence true in N. It has to be surjective because otherwise there would be some x ∈ M \ {z } not in the domain of s , i.e., the sentence ∀x (x = 0∨∃y y ′ = x) would be false in M-but it is true in N. 132 CHAPTER 6. MODELS OF ARITHMETIC 6.6 Non-Standard Models We call a structure for LA standard if it is isomorphic to N. If a structure isn't isomorphic to N, it is called non-standard. Definition 6.15. A structure M for LA is non-standard if it is not isomorphic to N. The elements x ∈ |M | which are equal to ValM(n) for some n ∈ N are called standard numbers (of M), and those not, non-standard numbers. By Proposition 6.12, any standard structure for LA contains only standard elements. Consequently, a non-standard structure must contain at least one non-standard element. In fact, the existence of a non-standard element guarantees that the structure is non-standard. Proposition 6.16. If a structure M for LA contains a non-standard number, M is non-standard. Proof. Suppose not, i.e., suppose M standard but contains a nonstandard number x . Let g : N → |M | be an isomorphism. It is easy to see (by induction on n) that g (ValN(n)) = ValM(n). In other words, g maps standard numbers of N to standard numbers of M. If M contains a non-standard number, g cannot be surjective, contrary to hypothesis. □ It is easy enough to specify non-standard structures for LA. For instance, take the structure with domain Z and interpret all non-logical symbols as usual. Since negative numbers are not values of n for any n, this structure is non-standard. Of course, it will not be a model of arithmetic in the sense that it makes the same sentences true as N. For instance, ∀x x ′ ≠ 0 is false. However, we can prove that non-standard models of arithmetic exist easily enough, using the compactness theorem. 133 6.7. MODELS OF Q Proposition 6.17. Let TA = {A : N ⊨ A} be the theory of N. TA has a countable non-standard model. Proof. Expand LA by a new constant symbol c and consider the set of sentences Γ = TA ∪ {c ≠ 0, c ≠ 1, c ≠ 2, . . . } Any model Mc of Γ would contain an element x = cM which is non-standard, since x ≠ ValM(n) for all n ∈ N. Also, obviously, Mc ⊨ TA, since TA ⊆ Γ . If we turn Mc into a structure M for LA simply by forgetting about c , its domain still contains the nonstandard x , and also M ⊨ TA. The latter is guaranteed since c does not occur in TA. So, it suffices to show that Γ has a model. We use the compactness theorem to show that Γ has a model. If every finite subset of Γ is satisfiable, so is Γ . Consider any finite subset Γ0 ⊆ Γ . Γ0 includes some sentences of TA and some of the form c ≠ n, but only finitely many. Suppose k is the largest number so that c ≠ k ∈ Γ0. Define Nk by expanding N to include the interpretation cNk = k + 1. Nk ⊨ Γ0: if A ∈ TA, Nk ⊨ A since Nk is just like N in all respects except c , and c does not occur in A. And Nk ⊨ c ≠ n, since n ≤ k , and ValNk (c ) = k + 1. Thus, every finite subset of Γ is satisfiable. □ 6.7 Models of Q We know that there are non-standard structures that make the same sentences true as N does, i.e., is a model of TA. Since N ⊨ Q , any model of TA is also a model of Q . Q is much weaker thanTA, e.g.,Q ⊬ ∀x ∀y (x+y) = (y+x). Weaker theories are easier to satisfy: they have more models. E.g., Q has models which make ∀x ∀y (x + y) = (y + x) false, but those cannot also be models of TA, or PA for that matter. Models of Q are also relatively simple: we can specify them explicitly. 134 CHAPTER 6. MODELS OF ARITHMETIC Example 6.18. Consider the structure K with domain |K | = N∪ {a} and interpretations 0K = 0 ′K(x) = {︄ x + 1 if x ∈ N a if x = a +K(x, y) = {︄ x + y if x , y ∈ N a otherwise ×K(x, y) = {︄ xy if x , y ∈ N a otherwise <K = {⟨x, y⟩ : x, y ∈ N and x < y} ∪ {⟨x,a⟩ : x ∈ |K |} To show that K ⊨ Q we have to verify that all axioms of Q are true in K. For convenience, let's write x∗ for ′K(x) (the "successor" of x in K), x ⊕ y for +K(x, y) (the "sum" of x and y in K, x ⊗ y for ×K(x, y) (the "product" of x and y in K), and x4y for ⟨x, y⟩ ∈ <K. With these abbreviations, we can give the operations in K more perspicuously as x x∗ n n + 1 a a x ⊕ y m a n n +m a a a a x ⊗ y m a n nm a a a a We have n 4m iff n < m for n, m ∈ N and x 4 a for all x ∈ |K |. K ⊨ ∀x ∀y (x ′ = y ′ → x = y) since ∗ is injective. K ⊨ ∀x 0 ≠ x ′ since 0 is not a ∗-successor in K. K ⊨ ∀x (x = 0 ∨ ∃y x = y ′) since for every n > 0, n = (n − 1)∗, and a = a∗. K ⊨ ∀x (x + 0) = x since n ⊕ 0 = n + 0 = n, and a ⊕ 0 = a by definition of ⊕. K ⊨ ∀x ∀y (x + y ′) = (x + y)′ is a bit trickier. If n, m are both standard, we have: (n ⊕ m∗) = (n + (m + 1)) = (n +m) + 1 = (n ⊕ m)∗ since ⊕ and ∗ agree with + and ′ on standard numbers. Now suppose x ∈ |K |. Then (x ⊕ a∗) = (x ⊕ a) = a = a∗ = (x ⊕ a)∗ 135 6.7. MODELS OF Q The remaining case is if y ∈ |K | but x = a. Here we also have to distinguish cases according to whether y = n is standard or y = b : (a ⊕ n∗) = (a ⊕ (n + 1)) = a = a∗ = (x ⊕ n)∗ (a ⊕ a∗) = (a ⊕ a) = a = a∗ = (x ⊕ a)∗ This is of course a bit more detailed than needed. For instance, since a ⊕ z = a whatever z is, we can immediately conclude a ⊕ a∗ = a. The remaining axioms can be verified the same way. K is thus a model ofQ . Its "addition" ⊕ is also commutative. But there are other sentences true in N but false in K, and vice versa. For instance, a 4 a, so K ⊨ ∃x x < x and K ⊭ ∀x ¬x < x . This shows that Q ⊬ ∀x ¬x < x . Example 6.19. Consider the structure L with domain |L| = N∪ {a,b} and interpretations ′L = ∗, +L = ⊕ given by x x∗ n n + 1 a a b b x ⊕ y m a b n n +m b a a a b a b b b a Since ∗ is injective, 0 is not in its range, and every x ∈ |L| other than 0 is, axioms Q1–Q3 are true in L. For any x , x ⊕ 0 = x , so Q4 is true as well. For Q5, consider x ⊕ y∗ and (x ⊕ y)∗. They are equal if x and y are both standard, since then ∗ and ⊕ agree with ′ and +. If x is non-standard, and y is standard, we have x ⊕ y∗ = x = x∗ = (x ⊕ y)∗. If x and y are both non-standard, we have four cases: a ⊕ a∗ = b = b∗ = (a ⊕ a)∗ b ⊕ b∗ = a = a∗ = (b ⊕ b)∗ b ⊕ a∗ = b = b∗ = (b ⊕ y)∗ a ⊕ b∗ = a = a∗ = (a ⊕ b)∗ If x is standard, but y is non-standard, we have n ⊕ a∗ = n ⊕ a = b = b∗ = (n ⊕ a)∗ 136 CHAPTER 6. MODELS OF ARITHMETIC n ⊕ b∗ = n ⊕ b = a = a∗ = (n ⊕ b)∗ So, L ⊨ Q5. However, a ⊕ 0 ≠ 0 ⊕ a, so L ⊭ ∀x ∀y (x + y) = (y + x). We've explicitly constructed models of Q in which the nonstandard elements live "beyond" the standard elements. In fact, that much is required by the axioms. A non-standard element x cannot be 4 0, since Q ⊢ ∀x ¬x < 0 (see Lemma 4.22). Also, for every n, Q ⊢ ∀x (x < n ′ → (x = 0 ∨ x = 1 ∨ * * * ∨ x = n)) (Lemma 4.23), so we can't have a 4 n for any n > 0. 6.8 Models of PA Any non-standard model of TA is also one of PA. We know that non-standard models ofTA and hence of PA exist. We also know that such non-standard models contain non-standard "numbers," i.e., elements of the domain that are "beyond" all the standard "numbers." But how are they arranged? How many are there? We've seen that models of the weaker theory Q can contain as few as a single non-standard number. But these simple structures are not models of PA or TA. The key to understanding the structure of models of PA or TA is to see what facts are derivable in these theories. For instance, already PA proves that ∀x x ≠ x ′ and ∀x ∀y (x+y) = (y+x), so this rules out simple structures (in which these sentences are false) as models of PA. Suppose M is a model of PA. Then if PA ⊢ A, M ⊨ A. Let's again use z for 0M, ∗ for ′M, ⊕ for +M, ⊗ for ×M, and 4 for <M. Any sentence A then states some condition about z, ∗, ⊕, ⊗, and 4, and if M ⊨ A that condition must be satisfied. For instance, if M ⊨ Q1, i.e., M ⊨ ∀x ∀y (x ′ = y ′ → x = y), then ∗ must be injective. Proposition 6.20. In M, 4 is a linear strict order, i.e., it satisfies: 1. Not x 4 x for any x ∈ |M |. 137 6.8. MODELS OF PA 2. If x 4 y and y 4 z then x 4 z . 3. For any x ≠ y , x 4 y or y 4 x Proof. PA proves: 1. ∀x ¬x < x 2. ∀x ∀y ∀z ((x < y ∧ y < z ) → x < z ) 3. ∀x ∀y ((x < y ∨ y < x) ∨ x = y)) □ Proposition 6.21. z is the least element of |M | in the 4-ordering. For any x , x4x∗, and x∗ is the 4-least element with that property. For any x , there is a unique y such that y∗ = x . (We call y the "predecessor" of x in M, and denote it by ∗x .) Proof. Exercise. □ Proposition 6.22. All standard elements of M are less than (according to 4) all non-standard elements. Proof. We'll use n as short for ValM(n), a standard element of M. Already Q proves that, for any n ∈ N, ∀x (x < n ′ → (x = 0 ∨ x = 1 ∨ * * * ∨ x = n)). There are no elements that are 4z. So if n is standard and x is non-standard, we cannot have x 4 n. By definition, a non-standard element is one that isn't ValM(n) for any n ∈ N, so x ≠ n as well. Since 4 is a linear order, we must have n 4 x . □ Proposition 6.23. Every nonstandard element x of |M | is an element of the subset . . .∗∗∗ x 4∗∗ x 4∗ x 4 x 4 x∗ 4 x∗∗ 4 x∗∗∗ 4 . . . We call this subset the block of x and write it as [x]. It has no least and 138 CHAPTER 6. MODELS OF ARITHMETIC no greatest element. It can be characterized as the set of those y ∈ |M | such that, for some standard n, x ⊕ n = y or y ⊕ n = x . Proof. Clearly, such a set [x] always exists since every element y of |M | has a unique successor y∗ and unique predecessor ∗y . For successive elements y , y∗ we have y 4 y∗ and y∗ is the 4-least element of |M | such that y is 4-less than it. Since always ∗y 4 y and y 4 y∗, [x] has no least or greatest element. If y ∈ [x] then x ∈ [y], for then either y∗...∗ = x or x∗...∗ = y . If y∗...∗ = x (with n ∗'s), then y ⊕ n = x and conversely, since PA ⊢ ∀x x ′...′ = (x + n) (if n is the number of ′'s). □ Proposition 6.24. If [x] ≠ [y] and x 4 y , then for any u ∈ [x] and any v ∈ [y], u 4 v . Proof. Note that PA ⊢ ∀x ∀y (x < y → (x ′ < y ∨ x ′ = y)). Thus, if u 4 v , we also have u ⊕ n∗ 4 v for any n if [u] ≠ [v ]. Any u ∈ [x] is 4y : x 4 y by assumption. If u 4 x , u 4 y by transitivity. And if x 4 u but u ∈ [x], we have u = x ⊕ n∗ for some n, and so u 4 y by the fact just proved. Now suppose that v ∈ [y] is 4y , i.e., v ⊕ m∗ = y for some standard m. This rules out v 4 x , otherwise y = v ⊕ m∗ 4 x . Clearly also, x ≠ v , otherwise x ⊕ m∗ = v ⊕ m∗ = y and we would have [x] = [y]. So, x 4 v . But then also x ⊕ n∗ 4 v for any n. Hence, if x 4 u and u ∈ [x], we have u 4 v . If u 4 x then u 4 v by transitivity. Lastly, if y 4v , u 4v since, as we've shown, u 4 y and y 4v .□ Corollary 6.25. If [x] ≠ [y], [x] ∩ [y] = ∅. Proof. Suppose z ∈ [x] and x 4 y . Then z 4 u for all u ∈ [y]. If z ∈ [y], we would have z 4 z . Similarly if y 4 x . □ This means that the blocks themselves can be ordered in a way that respects 4: [x]4 [y] iff x4 y , or, equivalently, if u4v for 139 6.8. MODELS OF PA any u ∈ [x] and v ∈ [y]. Clearly, the standard block [0] is the least block. It intersects with no non-standard block, and no two nonstandard blocks intersect either. Specifically, you cannot "reach" a different block by taking repeated successors or predecessors. Proposition 6.26. If x and y are non-standard, then x 4 x ⊕ y and x ⊕ y ∉ [x]. Proof. If y is nonstandard, then y ≠ z. PA ⊢ ∀x (y ≠ 0→ x < (x + y)). Now suppose x ⊕ y ∈ [x]. Since x 4 x ⊕ y , we would have x ⊕ n∗ = x ⊕ y . But PA ⊢ ∀x ∀y ∀z ((x + y) = (x + z ) → y = z ) (the cancellation law for addition). This would mean y = n∗ for some standard n; but y is assumed to be non-standard. □ Proposition 6.27. There is no least non-standard block. Proof. PA ⊢ ∀x ∃y ((y + y) = x ∨ (y + y)′ = x), i.e., that every x is divisible by 2 (possibly with remainder 1). If x is non-standard, so is y . By the preceding proposition, y 4 y ⊕ y and y ⊕ y ∉ [y]. Then also y4(y⊕y)∗ and (y⊕y)∗ ∉ [y]. But x = y⊕y or x = (y⊕y)∗, so y 4 x and y ∉ [x]. □ Proposition 6.28. There is no largest block. Proof. Exercise. □ Proposition 6.29. The ordering of the blocks is dense. That is, if x 4 y and [x] ≠ [y], then there is a block [z ] distinct from both that is between them. Proof. Suppose x4y . As before, x ⊕ y is divisible by two (possibly with remainder): there is a z ∈ |M | such that either x ⊕ y = z ⊕ z or x ⊕ y = (z ⊕ z )∗. The element z is the "average" of x and y , and x 4 z and z 4 y . □ 140 CHAPTER 6. MODELS OF ARITHMETIC The non-standard blocks are therefore ordered like the rationals: they form a countably infinite dense linear ordering without endpoints. One can show that any two such countably infinite orderings are isomorphic. It follows that for any two countable non-standard models M1 and M2 of true arithmetic, their reducts to the language containing < and = only are isomorphic. Indeed, an isomorphism h can be defined as follows: the standard parts of M1 and M2 are isomorphic to the standard model N and hence to each other. The blocks making up the non-standard part are themselves ordered like the rationals and therefore isomorphic; an isomorphism of the blocks can be extended to an isomorphism within the blocks by matching up arbitrary elements in each, and then taking the image of the successor of x in M1 to be the successor of the image of x in M2. Note that it does not follow that M1 and M2 are isomorphic in the full language of arithmetic (indeed, isomorphism is always relative to a language), as there are non-isomorphic ways to define addition and multiplication over |M1 | and |M2 |. (This also follows from a famous theorem due to Vaught that the number of countable models of a complete theory cannot be 2.) 6.9 Computable Models of Arithmetic The standard model N has two nice features. Its domain is the natural numbers N, i.e., its elements are just the kinds of things we want to talk about using the language of arithmetic, and the standard numeral n actually picks out n. The other nice feature is that the interpretations of the non-logical symbols of LA are all computable. The successor, addition, and multiplication functions which serve as ′N , +N , and ×N are computable functions of numbers. (Computable by Turing machines, or definable by primitive recursion, say.) And the less-than relation on N, i.e., <N , is decidable. Non-standard models of arithmetical theories such as Q and PAmust contain non-standard elements. Thus their domains typ141 6.9. COMPUTABLE MODELS OF ARITHMETIC ically include elements in addition to N. However, any countable structure can be built on any countably infinite set, including N. So there are also non-standard models with domain N. In such models M, of course, at least some numbers cannot play the roles they usually play, since some k must be different from ValM(n) for all n ∈ N. Definition 6.30. A structure M for LA is computable iff |M | = N and ′M, +M, ×M are computable functions and <M is a decidable relation. Example 6.31. Recall the structure K from Example 6.18 Its domain was |K | = N ∪ {a} and interpretations 0K = 0 ′K(x) = {︄ x + 1 if x ∈ N a if x = a +K(x, y) = {︄ x + y if x , y ∈ N a otherwise ×K(x, y) = {︄ xy if x , y ∈ N a otherwise <K = {⟨x, y⟩ : x, y ∈ N and x < y} ∪ {⟨x,a⟩ : n ∈ |K |} But |K | is countably infinite and so is equinumerous with N. For instance, g : N→ |K | with g (0) = a and g (n) = n + 1 for n > 0 is a bijection. We can turn it into an isomorphism between a new model K′ ofQ and K. In K′, we have to assign different functions and relations to the symbols of LA, since different elements of N play the roles of standard and non-standard numbers. Specifically, 0 now plays the role of a, not of the smallest standard number. The smallest standard number is now 1. So we assign 0K ′ = 1. The successor function is also different now: given a standard number, i.e., an n > 0, it still returns n+1. But 0 now plays the role of a, which is its own successor. So ′K ′ (0) = 0. 142 CHAPTER 6. MODELS OF ARITHMETIC For addition and multiplication we likewise have +K ′ (x, y) = {︄ x + y if x , y > 0 0 otherwise ×K ′ (x, y) = {︄ xy if x , y > 0 0 otherwise And we have ⟨x, y⟩ ∈ <K ′ iff x < y and x > 0 and y > 0, or if y = 0. All of these functions are computable functions of natural numbers and <K ′ is a decidable relation on N-but they are not the same functions as successor, addition, and multiplication on N, and <K ′ is not the same relation as < on N. This example shows that Q has computable non-standard models with domain N. However, the following result shows that this is not true for models of PA (and thus also for models of TA). Theorem 6.32 (Tennenbaum's Theorem). N is the only computable model of PA. Summary A model of arithmetic is a structure for the language LA of arithmetic. There is one distinguished such model, the standard model N, with |N | = N and interpretations of 0, ′, +, ×, and < given by 0, the successor, addition, and multiplication functions on N, and the less-than relation. N is a model of the theories Q and PA. More generally, a structure for LA is called standard iff it is isomorphic to N. Two structures are isomorphic if there is an isomorphism between them, i.e., a bijective function which preserves the interpretations of constant symbols, function symbols, 143 6.9. COMPUTABLE MODELS OF ARITHMETIC and predicate symbols. By the isomorphism theorem, isomorphic structures are elementarily equivalent, i.e., they make the same sentences true. In standard models, the domain is just the set of values of all the numerals n. Models of Q and PA that are not isomorphic to N are called non-standard. In non-standard models, the domain is not exhausted by the values of the numerals. An element x ∈ |M | where x ≠ ValM(n) for all n ∈ N is called a non-standard element of M. If M ⊨ Q , non-standard elements must obey the axioms of Q , e.g., they have unique successors, they can be added and multiplied, and compared using <. The standard elements of M are all <M all the non-standard elements. Non-standard models exist because of the compactness theorem, and for Q they can relatively easily be given explicitly. Such models can be used to show that, e.g., Q is not strong enough to prove certain sentences, e.g.,Q ⊬ ∀x ∀y (x+y) = (y+x). This is done by defining a non-standard M in which non-standard elements don't obey the law of commutativity. Non-standard models of PA cannot be so easily specified explicitly. By showing that PA proves certain sentences, we can investigate the structure of the non-standard part of a non-standard model of PA. If a non-standard model M of PA is countable, every non-standard element is part of a "block" of non-standard elements which are ordered like Z by <M. These blocks themselves are arranged like Q, i.e., there is no smallest or largest block, and there is always a block in between any two blocks. Any countable model is isomorphic to one with domain N. If the interpretations of ′, +, ×, and < in such a model are computable functions, we say it is a computable model. The standard model N is computable, since the successor, addition, and multiplication functions and the less-than relation on N are computable. It is possible to define computable non-standard models of Q , but N is the only computable model of PA. This is Tannenbaum's Theorem. 144 CHAPTER 6. MODELS OF ARITHMETIC Problems Problem 6.1. Prove Proposition 6.2. Problem 6.2. Carry out the proof of (b) of Theorem 6.6 in detail. Make sure to note where each of the five properties characterizing isomorphisms of Definition 6.5 is used. Problem 6.3. Show that for any structure M, if X is a definable subset of M, and h is an automorphism of M, then X = {h(x) : x ∈ X } (i.e., X is fixed under h). Problem 6.4. Show that the converse of Proposition 6.12 is false, i.e., give an example of a structure M with |M | = {ValM(n) : n ∈ N} that is not isomorphic to N. Problem 6.5. Recall that Q contains the axioms ∀x ∀y (x ′ = y ′ → x = y) (Q1) ∀x 0 ≠ x ′ (Q2) ∀x (x = 0 ∨ ∃y x = y ′) (Q3) Give structures M1, M2, M3 such that 1. M1 ⊨ Q1, M1 ⊨ Q2, M1 ⊭ Q3; 2. M2 ⊨ Q1, M2 ⊭ Q2, M2 ⊨ Q3; and 3. M3 ⊭ Q1, M3 ⊨ Q2, M3 ⊨ Q3; Obviously, you just have to specify 0Mi and ′Mi for each. Problem 6.6. Prove that K from Example 6.18 satisifies the remaining axioms of Q , ∀x (x × 0) = 0 (Q6) ∀x ∀y (x × y ′) = ((x × y) + x) (Q7) ∀x ∀y (x < y ↔∃z (z ′ + x) = y) (Q8) Find a sentence only involving ′ true in N but false in K. 145 6.9. COMPUTABLE MODELS OF ARITHMETIC Problem 6.7. Expand L of Example 6.19 to include ⊗ and 4 that interpret × and <. Show that your structure satisifies the remaining axioms of Q , ∀x (x × 0) = 0 (Q6) ∀x ∀y (x × y ′) = ((x × y) + x) (Q7) ∀x ∀y (x < y ↔∃z (z ′ + x) = y) (Q8) Problem 6.8. In L of Example 6.19, a∗ = a and b∗ = b . Is there a model of Q in which a∗ = b and b∗ = a? Problem 6.9. Find sentences in LA derivable in PA (and hence true in N) which guarantee the properties of z, ∗, and 4 in Proposition 6.21 Problem 6.10. Show that in a non-standard model of PA, there is no largest block. Problem 6.11. Write out a detailed proof of Proposition 6.29. Which sentence must PA derive in order to guarantee the existence of z? Why is x 4 z and z 4 y , and why is [x] ≠ [z ] and [z ] ≠ [y]? Problem 6.12. Give a structure L′ with |L′ | = N isomorphic to L of Example 6.19. CHAPTER 7 Second-Order Logic 7.1 Introduction In first-order logic, we combine the non-logical symbols of a given language, i.e., its constant symbols, function symbols, and predicate symbols, with the logical symbols to express things about first-order structures. This is done using the notion of satisfaction, which relates a structure M, together with a variable assignment s , and a formulaA: M, s ⊨ A holds iff whatA expresses when its constant symbols, function symbols, and predicate symbols are interpreted as M says, and its free variables are interpreted as s says, is true. The interpretation of the identity predicate = is built into the definition of M, s ⊨ A, as is the interpretation of ∀ and ∃. The former is always interpreted as the identity relation on the domain |M | of the structure, and the quantifiers are always interpreted as ranging over the entire domain. But, crucially, quantification is only allowed over elements of the domain, and so only object variables are allowed to follow a quantifier. In second-order logic, both the language and the definition of satisfaction are extended to include free and bound function and predicate variables, and quantification over them. These variables are related to function symbols and predicate symbols the 146 147 7.2. TERMS AND FORMULAS same way that object variables are related to constant symbols. They play the same role in the formation of terms and formulas of second-order logic, and quantification over them is handled in a similar way. In the standard semantics, the secondorder quantifiers range over all possible objects of the right type (n-place functions from |M | to |M | for function variables, n-place relations for predicate variables). For instance, while ∀v0 (P 1 0 (v0)∨¬P 1 0 (v0)) is a formula in both firstand second-order logic, in the latter we can also consider ∀V 10 ∀v0 (V 1 0 (v0)∨¬V 1 0 (v0)) and ∃V 10 ∀v0 (V 1 0 (v0) ∨ ¬V 1 0 (v0)). Since these contain no free varaibles, they are sentences of second-order logic. Here, V 10 is a second-order 1-place predicate variable. The allowable interpretations of V 10 are the same that we can assign to a 1-place predicate symbol like P 10 , i.e., subsets of |M |. Quantification over them then amounts to saying that ∀v0 (V 10 (v0)∨¬V 1 0 (v0)) holds for all ways of assigning a subset of |M | as the value of V 10 , or for at least one. Since every set either contains or fails to contain a given object, both are true in any structure. Since second-order logic can quantify over subsets of the domain as well as functions, it is to be expected that some amount, at least, of set theory can be carried out in second-order logic. By "carry out," we mean that it is possible to express set theoretic properties and statements in second-order logic, and is possible without any special, non-logical vocabulary for sets (e.g., the membership predicate symbol of set theory). For instance, we can define unions and intersections of sets and the subset relationship, but also compare the sizes of sets, and state results such as Cantor's Theorem. 7.2 Terms and Formulas Like in first-order logic, expressions of second-order logic are built up from a basic vocabulary containing variables, constant symbols, predicate symbols and sometimes function symbols. From them, together with logical connectives, quantifiers, and punctu148 CHAPTER 7. SECOND-ORDER LOGIC ation symbols such as parentheses and commas, terms and formulas are formed. The difference is that in addition to variables for objects, second-order logic also contains variables for relations and functions, and allows quantification over them. So the logical symbols of second-order logic are those of first-order logic, plus: 1. A countably infinite set of second-order relation variables of every arity n: V n0 , V n 1 , V n 2 , . . . 2. A countably infinite set of second-order function variables: un0 , u n 1 , u n 2 , . . . Just as we use x , y , z as meta-variables for first-order variables vi , we'll use X ,Y , Z , etc., as metavariables for V ni and u, v , etc., as meta-variables for uni . The non-logical symbols of a second-order language are specified the same way a first-order language is: by listing its constant symbols, function symbols, and predicate symbols In first-order logic, the identity predicate = is usually included. In first-order logic, the non-logical symbols of a language L are crucial to allow us to express anything interesting. There are of course sentences that use no non-logical symbols, but with only = it is hard to say anything interesting. In secondorder logic, since we have an unlimited supply of relation and function variables, we can say anything we can say in a first-order language even without a special supply of non-logical symbols. Definition 7.1 (Second-order Terms). The set of second-order terms of L, Trm2(L), is defined by adding to Definition B.4 the clause 1. If u is an n-place function variable and t1, . . . , tn are terms, then u(t1, . . . , tn) is a term. So, a second-order term looks just like a first-order term, except that where a first-order term contains a function symbol f n i , 149 7.3. SATISFACTION a second-order term may contain a function variable un i in its place. Definition 7.2 (Second-order formula). The set of second-order formulas Frm2(L) of the language L is defined by adding to Definition B.4 the clauses 1. If X is an n-place predicate variable and t1, . . . , tn are second-order terms of L, then X (t1, . . . , tn) is an atomic formula. 2. If A is a formula and u is a function variable, then ∀u A is a formula. 3. If A is a formula and X is a predicate variable, then ∀X A is a formula. 4. If A is a formula and u is a function variable, then ∃u A is a formula. 5. If A is a formula and X is a predicate variable, then ∃X A is a formula. 7.3 Satisfaction To define the satisfaction relation M, s ⊨ A for second-order formulas, we have to extend the definitions to cover second-order variables. The notion of a structure is the same for second-order logic as it is for first-order logic. There is only a diffence for variable assignments s : these now must not just provide values for the first-order variables, but also for the second-order variables. Definition 7.3 (Variable Assignment). A variable assignment s for a structure M is a function which maps each 1. object variable vi to an element of |M |, i.e., s (vi ) ∈ |M | 150 CHAPTER 7. SECOND-ORDER LOGIC 2. n-place relation variable V n i to an n-place relation on |M |, i.e., s (V n i ) ⊆ |M |n ; 3. n-place function variable un i to an n-place function from |M | to |M |, i.e., s (un i ) : |M |n → |M |; A structure assigns a value to each constant symbol and function symbol, and a second-order variable assigns objects and functions to each object and function variable. Together, they let us assign a value to every term. Definition 7.4 (Value of a Term). If t is a term of the language L, M is a structure for L, and s is a variable assignment for M, the value ValMs (t ) is defined as for first-order terms, plus the following clause: t ≡ u(t1, . . . , tn): ValMs (t ) = s (u)(Val M s (t1), . . . ,Val M s (tn)). Definition 7.5 (x -Variant). If s is a variable assignment for a structure M, then any variable assignment s ′ for M which differs from s at most in what it assigns to x is called an x -variant of s . If s ′ is an x -variant of s we write s ∼x s ′. (Similarly for second-order variables X or u .) Definition 7.6 (Satisfaction). For second-order formulas A, the definition of satisfaction is like Definition B.23 with the addition of: 1. A ≡ X n(t1, . . . , tn): M, s ⊨ A iff ⟨ValMs (t1), . . . ,Val M s (tn)⟩ ∈ s (X n). 2. A ≡ ∀X B : M, s ⊨ A iff for every X -variant s ′ of s , M, s ′ ⊨ B . 151 7.3. SATISFACTION 3. A ≡ ∃X B : M, s ⊨ A iff there is an X -variant s ′ of s so that M, s ′ ⊨ B . 4. A ≡ ∀u B : M, s ⊨ A iff for every u -variant s ′ of s , M, s ′ ⊨ B . 5. A ≡ ∃u B : M, s ⊨ A iff there is an u -variant s ′ of s so that M, s ′ ⊨ B . Example 7.7. Consider the formula ∀z (X (z ) ↔ ¬Y (z )). It contains no second-order quantifiers, but does contain the secondorder variables X andY (here understood to be one-place). The corresponding first-order sentence ∀z (P (z ) ↔ ¬R(z )) says that whatever falls under the interpretation of P does not fall under the interpretation ofR and vice versa. In a structure, the interpretation of a predicate symbol P is given by the interpretation PM. But for second-order variables like X andY , the interpretation is provided, not by the structure itself, but by a variable assignment. Since the second-order formula is not a sentence (in includes free variables X and Y ), it is only satisfied relative to a structure M together with a variable assignment s . M, s ⊨ ∀z (X z↔¬Y z ) whenever the elements of s (X ) are not elements of s (Y ), and vice versa, i.e., iff s (Y ) = |M | \ s (X ). So for instance, take |M | = {1,2,3}. Since no predicate symbols, function symbols, or constant symbols are involved, the domain of M is all that is relevant. Now for s1(X ) = {1,2} and s1(Y ) = {3}, we have M, s1 ⊨ ∀z (X (z ) ↔ ¬Y (z )). By contrast, if we have s2(X ) = {1,2} and s2(Y ) = {2,3}, M, s2 ⊭ ∀z (X (z ) ↔ ¬Y (z )). That's because there is a z -variant s ′2 of s2 with s ′ 2(z ) = 2 where M, s ′ 2 ⊨ X (z ) (since 2 ∈ s ′ 2(X )) but M, s ′2 ⊭ ¬Y (z ) (since also s ′ 2(z ) ∈ s ′ 2(Y )). Example 7.8. M, s ⊨ ∃Y (∃y Y (y) ∧ ∀z (X (z ) ↔ ¬Y (z ))) if there is an s ′ ∼Y s such that M, s ′ ⊨ (∃y Y (y)∧∀z (X (z )↔¬Y (z ))). And that is the case iff s ′(Y ) ≠ ∅ (so that M, s ′ ⊨ ∃y Y (y)) and, as in the previous example, s ′(Y ) = |M | \ s ′(X ). In other words, M, s ⊨ ∃Y (∃y Y (y)∧∀z (X (z )↔¬Y (z ))) iff |M | \ s (X ) is non-empty, i.e., 152 CHAPTER 7. SECOND-ORDER LOGIC s (X ) ≠ |M |. So, the formula is satisfied, e.g., if |M | = {1,2,3} and s (X ) = {1,2}, but not if s (X ) = {1,2,3} = |M |. Since the formula is not satisfied whenever s (X ) = |M |, the sentence ∀X ∃Y (∃y Y (y) ∧ ∀z (X (z ) ↔ ¬Y (z ))) is never satisfied: For any structure M, the assignment s (X ) = |M | will make the sentence false. On the other hand, the sentence ∃X ∃Y (∃y Y (y) ∧ ∀z (X (z ) ↔ ¬Y (z ))) is satisfied relative to any assignment s , since we can always find an X -variant s ′ of s with s ′(X ) ≠ |M |. 7.4 Semantic Notions The central logical notions of validity, entailment, and satisfiability are defined the same way for second-order logic as they are for first-order logic, except that the underlying satisfaction relation is now that for second-order formulas. A second-order sentence, of course, is a formula in which all variables, including predicate and function variables, are bound. Definition 7.9 (Validity). A sentence A is valid, ⊨ A, iff M ⊨ A for every structure M. Definition 7.10 (Entailment). A set of sentences Γ entails a sentence A, Γ ⊨ A, iff for every structure M with M ⊨ Γ , M ⊨ A. Definition 7.11 (Satisfiability). A set of sentences Γ is satisfiable if M ⊨ Γ for some structure M. If Γ is not satisfiable it is called unsatisfiable. 153 7.5. EXPRESSIVE POWER 7.5 Expressive Power Quantification over second-order variables is responsible for an immense increase in the expressive power of the language over that of first-order logic. Second-order existential quantification lets us say that functions or relations with certain properties exists. In first-order logic, the only way to do that is to specify a non-logical symbol (i.e., a function symbol or predicate symbol) for this purpose. Second-order universal quantification lets us say that all subsets of, relations on, or functions from the domain to the domain have a property. In first-order logic, we can only say that the subsets, relations, or functions assigned to one of the non-logical symbols of the language have a property. And when we say that subsets, relations, functions exist that have a property, or that all of them have it, we can use second-order quantification in specifying this property as well. This lets us define relations not definable in first-order logic, and express properties of the domain not expressible in first-order logic. Definition 7.12. If M is a structure for a language L, a relation R ⊆ |M |2 is definable in L if there is some formula AR(x, y) with only the variables x and y free, such that R(a,b) holds (i.e., ⟨a,b⟩ ∈ R) iff M, s ⊨ AR(x, y) for s (x) = a and s (y) = b . Example 7.13. In first-order logic we can define the identity relation Id |M | (i.e., {⟨a,a⟩ : a ∈ |M |}) by the formula x = y . In second-order logic, we can define this relation without =. For if a and b are the same element of |M |, then they are elements of the same subsets of |M | (since sets are determined by their elements). Conversely, if a and b are different, then they are not elements of the same subsets: e.g., a ∈ {a} but b ∉ {a} if a ≠ b . So "being elements of the same subsets of |M |" is a relation that holds of a and b iff a = b . It is a relation that can be expressed in second-order logic, since we can quantify over all subsets of |M |. Hence, the following formula defines Id |M | : ∀X (X (x) ↔ X (y)) 154 CHAPTER 7. SECOND-ORDER LOGIC Example 7.14. IfR is a two-place predicate symbol,RM is a twoplace relation on |M |. Perhaps somewhat confusingly, we'll use R as the predicate symbol for R and for the relation RM itself. The transitive closure R∗ of R is the relation that holds between a and b iff for some c1, . . . , ck , R(a, c1), R(c1, c2), . . . , R(ck ,b) holds. This includes the case if k = 0, i.e., if R(a,b) holds, so does R∗(a,b). This means that R ⊆ R∗. In fact, R∗ is the smallest relation that includes R and that is transitive. We can say in second-order logic that X is a transitive relation that includes R: BR(X ) ≡ ∀x ∀y (R(x, y) → X (x, y)) ∧ ∀x ∀y ∀z ((X (x, y) ∧ X (y, z )) → X (x, z )). The first conjunct says that R ⊆ X and the second that X is transitive. To say that X is the smallest such relation is to say that it is itself included in every relation that includes R and is transitive. So we can define the transitive closure of R by the formula R∗(X ) ≡ BR(X ) ∧ ∀Y (BR(Y ) → ∀x ∀y (X (x, y) →Y (x, y))). We have M, s ⊨ R∗(X ) iff s (X ) = R∗. The transitive closure of R cannot be expressed in first-order logic. 7.6 Describing Infinite and Countable Domains A set M is (Dedekind) infinite iff there is an injective function f : M → M which is not surjective, i.e., with dom(f ) ≠ M . In first-order logic, we can consider a one-place function symbol f and say that the function f M assigned to it in a structure M is injective and ran(f ) ≠ |M |: ∀x ∀y (f (x) = f (y) → x = y) ∧ ∃y ∀x y ≠ f (x). If M satisfies this sentence, f M : |M | → |M | is injective, and so |M | must be infinite. If |M | is infinite, and hence such a function 155 7.6. DESCRIBING INFINITE AND COUNTABLE DOMAINS exists, we can let f M be that function and M will satisfy the sentence. However, this requires that our language contains the non-logical symbol f we use for this purpose. In second-order logic, we can simply say that such a function exists. This no-longer requires f , and we obtain the sentence in pure second-order logic Inf ≡ ∃u (∀x ∀y (u(x) = u(y) → x = y) ∧ ∃y ∀x y ≠ u(x)). M ⊨ Inf iff |M | is infinite. We can then define Fin ≡ ¬Inf; M ⊨ Fin iff |M | is finite. No single sentence of pure first-order logic can express that the domain is infinite although an infinite set of them can. There is no set of sentences of pure first-order logic that is satisfied in a structure iff its domain is finite. Proposition 7.15. M ⊨ Inf iff |M | is infinite. Proof. M ⊨ Inf iff M, s ⊨ ∀x ∀y (u(x) = u(y) → x = y) ∧ ∃y ∀x y ≠ u(x) for some s . If it does, s (u) is an injective function, and some y ∈ |M | is not in the domain of s (u). Conversely, if there is an injective f : |M | → |M | with dom(f ) ≠ |M |, then s (u) = f is such a variable assignment. □ A set M is countable if there is an enumeration m0,m1,m2, . . . of its elements (without repetitions but possibly finite). Such an enumeration exists iff there is an element z ∈ M and a function f : M → M such that z , f (z ), f (f (z )), . . . , are all the elements of M . For if the enumeration exists, z = m0 and f (mk ) = mk+1 (or f (mk ) = mk if mk is the last element of the enumeration) are the requisite element and function. On the other hand, if such a z and f exist, then z , f (z ), f (f (z )), . . . , is an enumeration of M , and M is countable. We can express the existence of z and f in second-order logic to produce a sentence true in a structure iff the structure is countable: Count ≡ ∃z ∃u ∀X ((X (z ) ∧ ∀x (X (x) → X (u(x)))) → ∀x X (x)) 156 CHAPTER 7. SECOND-ORDER LOGIC Proposition 7.16. M ⊨ Count iff |M | is countable. Proof. Suppose |M | is countable, and let m0, m1, . . . , be an enumeration. By removing repetions we can guarantee that nomk appears twice. Define f (mk ) = mk+1 and let s (z ) = m0 and s (u) = f . We show that M, s ⊨ ∀X ((X (z ) ∧ ∀x (X (x) → X (u(x)))) → ∀x X (x)) Suppose s ′ ∼X s is arbitrary, and let M = s ′(X ). Suppose further that M, s ′ ⊨ (X (z ) ∧ ∀x (X (x) → X (u(x)))). Then s ′(z ) ∈ M and whenever x ∈ M , also s ′(u)(x) ∈ M . In other words, since s ′ ∼X s , m0 ∈ M and if x ∈ M then f (x) ∈ M , so m0 ∈ M , m1 = f (m0) ∈ M , m2 = f (f (m0)) ∈ M , etc. Thus, M = |M |, and so M ⊨ ∀x X (x)s ′. Since s ′ was an arbitrary X -variant of s , we are done: M ⊨ Count. Now assume that M ⊨ Count, i.e., M, s ⊨ ∀X ((X (z ) ∧ ∀x (X (x) → X (u(x)))) → ∀x X (x)) for some s . Let m = s (z ) and f = s (u) and consider M = {m, f (m), f (f (m)), . . . }. Let s ′ be the X -variant of s with s (X ) = M . Then M, s ′ ⊨ (X (z ) ∧ ∀x (X (x) → X (u(x)))) → ∀x X (x) by assumption. Also, M, s ′ ⊨ X (z ) since s ′(X ) = M ∋ m = s ′(z ), and also M, s ′ ⊨ ∀x (X (x) → X (u(x))) since whenever x ∈ M also f (x) ∈ M . So, since both antecedent and conditional are satisfied, the consequent must also be: M, s ′ ⊨ ∀x X (x). But that means that M = |M |, and so |M | is countable since M is, by definition. □ 7.7 Second-order Arithmetic Recall that the theory PA of Peano arithmetic includes the eight axioms of Q , ∀x x ′ ≠ 0 157 7.7. SECOND-ORDER ARITHMETIC ∀x ∀y (x ′ = y ′ → x = y) ∀x (x = 0 ∨ ∃y x = y ′) ∀x (x + 0) = x ∀x ∀y (x + y ′) = (x + y)′ ∀x (x × 0) = 0 ∀x ∀y (x × y ′) = ((x × y) + x) ∀x ∀y (x < y ↔∃z (z ′ + x) = y) plus all sentences of the form (A(0) ∧ ∀x (A(x) → A(x ′))) → ∀x A(x). The latter is a "schema," i.e., a pattern that generates infinitely many sentences of the language of arithmetic, one for each formula A(x). We call this schema the (first-order) axiom schema of induction. In second-order Peano arithmetic PA2, induction can be stated as a single sentence. PA2 consists of the first eight axioms above plus the (second-order) induction axiom: ∀X (X (0) ∧ ∀x (X (x) → X (x ′))) → ∀x X (x)). It says that if a subset X of the domain contains 0M and with any x ∈ |M | also contains ′M(x) (i.e., it is "closed under successor") it contains everything in the domain (i.e., X = |M |). The induction axiom guarantees that any structure satisfying it contains only those elements of |M | the axioms require to be there, i.e., the values of n for n ∈ N. A model of PA2 contains no non-standard numbers. Theorem 7.17. If M ⊨ PA2 then |M | = {ValM(n) : n ∈ N}. Proof. Let N = {ValM(n) : n ∈ N}, and suppose M ⊨ PA2. Of course, for any n ∈ N, ValM(n) ∈ |M |, so N ⊆ |M |. Now for inclusion in the other direction. Consider a variable assignment s with s (X ) = N . By assumption, M ⊨ ∀X (X (0) ∧ ∀x (X (x) → X (x ′))) → ∀x X (x), thus 158 CHAPTER 7. SECOND-ORDER LOGIC M, s ⊨ (X (0) ∧ ∀x (X (x) → X (x ′))) → ∀x X (x). Consider the antecedent of this conditional. ValM(0) ∈ N , and so M, s ⊨ X (0). The second conjunct, ∀x (X (x) → X (x ′)) is also satisfied. For suppose x ∈ N . By definition of N , x = ValM(n) for some n. That gives ′M(x) = ValM(n + 1) ∈ N . So, ′M(x) ∈ N . We have that M, s ⊨ X (0) ∧∀x (X (x)→X (x ′)). Consequently, M, s ⊨ ∀x X (x). But that means that for every x ∈ |M | we have x ∈ s (X ) = N . So, |M | ⊆ N . □ Corollary 7.18. Any two models of PA2 are isomorphic. Proof. By Theorem 7.17, the domain of any model of PA2 is exhausted by ValM(n). Any such model is also a model of Q . By Proposition 6.13, any such model is standard, i.e., isomorphic to N. □ Above we defined PA2 as the theory that contains the first eight arithmetical axioms plus the second-order induction axiom. In fact, thanks to the expressive power of second-order logic, only the first two of the arithmetical axioms plus induction are needed for second-order Peano arithmetic. Proposition 7.19. Let PA2† be the second-order theory containing the first two arithmetical axioms (the successor axioms) and the second-order induction axiom. Then ≤, +, and × are definable in PA2†. Proof. To show that ≤ is definable, we have to find a formula A≤(x, y) such that N ⊨ A(n,m) iff n < m. Consider the formula B(x,Y ) ≡Y (x) ∧ ∀y (Y (y) →Y (y ′)) Clearly, B(n,Y ) is satisfied by a set Y ⊆ N iff {m : n ≤ m} ⊆ Y , so we can take A≤(x, y) ≡ ∀Y (B(x,Y ) →Y (y)). □ 159 7.8. SECOND-ORDER LOGIC IS NOT AXIOMATIZABLE Corollary 7.20. M ⊨ PA2 iff M ⊨ PA2†. Proof. Immediate from Proposition 7.19. □ 7.8 Second-order Logic is not Axiomatizable Theorem 7.21. Second-order logic is undecidable. Proof. A first-order sentence is valid in first-order logic iff it is valid in second-order logic, and first-order logic is undecidable.□ Theorem 7.22. There is no sound and complete proof system for second-order logic. Proof. Let A be a sentence in the language of arihmetic. N ⊨ A iff PA2 ⊨ A. Let P be the conjunction of the nine axioms of PA2. PA2 ⊨ A iff ⊨ P → A, i.e., M ⊨ P → A . Now consider the sentence ∀z ∀u ∀u ′∀u ′′∀L (P ′ → A′) resulting by replacing 0 by z , ′ by the one-place function variable u, + and × by the two-place function-variables u ′ and u ′′, respectively, and < by the two-place relation variable L and universally quantifying. It is a valid sentence of pure second-order logic iff the original sentence was valid iff PA2 ⊨ A iff N ⊨ A. Thus if there were a sound and complete proof system for second-order logic, we could use it to define a computable enumeration f : N→ Sent(LA) of the sentences true in N. This function would be representable in Q by some firstorder formula B f (x, y). Then the formula ∃x B f (x, y)would define the set of true first-order sentences of N, contradicting Tarski's Theorem. □ 7.9 Second-order Logic is not Compact Call a set of sentences Γ finitely satisfiable if every one of its finite subsets is satisfiable. First-order logic has the property that if a 160 CHAPTER 7. SECOND-ORDER LOGIC set of sentences Γ is finitely satisfiable, it is satisfiable. This property is called compactness. It has an equivalent version involving entailment: if Γ ⊨ A, then already Γ0 ⊨ A for some finite subset Γ0 ⊆ Γ . In this version it is an immediate corollary of the completeness theorem: for if Γ ⊨ A, by completeness Γ ⊢ A. But a derivation can only make use of finitely many sentences of Γ . Compactness is not true for second-order logic. There are sets of second-order sentences that are finitely satisfiable but not satisfiable, and that entail some A without a finite subset entailing A. Theorem 7.23. Second-order logic is not compact. Proof. Recall that Inf ≡ ∃u (∀x ∀y (u(x) = u(y) → x = y) ∧ ∃y ∀x y ≠ u(x)) is satisfied in a structure iff its domain is infinite. Let A≥n be a sentence that asserts that the domain has at least n elements, e.g., A≥n ≡ ∃x1 . . . ∃xn (x1 ≠ x2 ∧ x1 ≠ x3 ∧ * * * ∧ xn−1 ≠ xn). Consider the set of sentences Γ = {¬Inf,A≥1,A≥2,A≥3, . . . }. It is finitely satisfiable, since for any finite subset Γ0 ⊆ Γ there is some k so that A≥k ∈ Γ but no A≥n ∈ Γ for n > k . If |M | has k elements, M ⊨ Γ0. But, Γ is not satisfiable: if M ⊨ ¬Inf, |M | must be finite, say, of size k . Then M ⊭ A≥k+1. □ 7.10 The Löwenheim-Skolem Theorem Fails for Second-order Logic The (Downward) Löwenheim-Skolem Theorem states that every set of sentences with an infinite model has a countable model. It, 161 7.11. COMPARING SETS too, is a consequence of the completeneness theorem: the proof of completeness generates a model for any consistent set of sentences, and that model is countable. There is also an Upward Löwenheim-Skolem Theorem, which guarantees that if a set of sentences has a countably infinite model it also has an uncountable model. Both theorems fail in second-order logic. Theorem 7.24. The Löwenheim-Skolem Theorem fails for secondorder logic: There are sentences with infinite models but no countable models. Proof. Recall that Count ≡ ∃z ∃u ∀X ((X (z ) ∧ ∀x (X (x) → X (u(x)))) → ∀x X (x)) is true in a structure M iff |M | is countable, so ¬Count is true in M iff |M | is uncountable. There are such structures-take any uncountable set as the domain, e.g., ℘(N) or R. So ¬Count has infinite models but no countable models. □ Theorem 7.25. There are sentences with countably infinite but no uncountable models. Proof. Count ∧ Inf is true in N but not in any structure M with |M | uncountable. □ 7.11 Comparing Sets Proposition 7.26. The formula ∀x (X (x)→Y (x)) defines the subset relation, i.e., M, s ⊨ ∀x (X (x) →Y (x)) iff s (X ) ⊆ S (y). Proposition 7.27. The formula ∀x (X (x) ↔Y (x)) defines the identity relation on sets, i.e., M, s ⊨ ∀x (X (x) ↔Y (x)) iff s (X ) = S (y). 162 CHAPTER 7. SECOND-ORDER LOGIC Proposition 7.28. The formula ∃x X (x) defines the property of being non-empty, i.e., M, s ⊨ ∃x X (x) iff s (X ) ≠ ∅. A set X is no larger than a set Y , X ⪯ Y , iff there is an injective function f : X →Y . Since we can express that a function is injective, and also that its values for arguments in X are inY , we can also define the relation of being no larger than on subsets of the domain. Proposition 7.29. The formula ∃u (∀x (X (x) →Y (u(x))) ∧ ∀x ∀y (u(x) = u(y) → x = y)) defines the relation of being no larger than. Two sets are the same size, or "equinumerous," X ≈ Y , iff there is a bijective function f : X →Y . Proposition 7.30. The formula ∃u (∀x (X (x) →Y (u(x))) ∧ ∀x ∀y (u(x) = u(y) → x = y) ∧ ∀y (Y (y) → ∃x (X (x) ∧ y = u(x)))) defines the relation of being equinumerous with. We will abbreviate these formulas, respectively, as X ⊆ Y , X = Y , X ≠ ∅, X ⪯ Y , and X ≈ Y . (This may be slightly confusing, since we use the same notation when we speak informally about sets X and Y-but here the notation is an abbreviation for formulas in second-order logic involving one-place relation variables X andY .) 163 7.12. CARDINALITIES OF SETS Proposition 7.31. The sentence ∀X ∀Y ((X ⪯ Y ∧Y ⪯ X )→X ≈ Y ) is valid. Proof. The sentence is satisfied in a structure M if, for any subsets X ⊆ |M | and Y ⊆ |M |, if X ⪯ Y and Y ⪯ X then X ≈ Y . But this holds for any sets X andY-it is the Schröder-Bernstein Theorem. □ 7.12 Cardinalities of Sets Just as we can express that the domain is finite or infinite, countable or uncountable, we can define the property of a subset of |M | being finite or infinite, countable or uncountable. Proposition 7.32. The formula Inf(X ) ≡ ∃u (∀x ∀y (u(x) = u(y) → x = y) ∧ ∃y (X (y) ∧ ∀x (X (x) → y ≠ u(x))) is satisfied with respect to a variable assignment s iff s (X ) is infinite. Proposition 7.33. The formula Count(X ) ≡ ∃z ∃u (X (z ) ∧ ∀x (X (x) → X (u(x))) ∧ ∀Y ((Y (z ) ∧ ∀x (Y (x) →Y (u(x)))) → X =Y )) is satisfied with respect to a variable assignment s iff s (X ) is countable We know from Cantor's Theorem that there are uncountable sets, and in fact, that there are infinitely many different levels of infinite sizes. Set theory develops an entire arithmetic of sizes of sets, and assigns infinite cardinal numbers to sets. The natural numbers serve as the cardinal numbers measuring the sizes of finite sets. The cardinality of countably infinite sets is the first infinite cardinality, called א0 ("aleph-nought" or "aleph-zero"). 164 CHAPTER 7. SECOND-ORDER LOGIC The next infinite size is א1. It is the smallest size a set can be without being countable (i.e., of size א0). We can define "X has size א0" as Aleph0(X )↔ Inf(X ) ∧Count(X ). X has size א1 iff all its subsets are finite or have size א0, but is not itself of size א0. Hence we can express this by the formula Aleph1(X ) ≡ ∀Y (Y ⊆ X → (¬Inf(Y ) ∨ Aleph0(Y ))) ∧ ¬Aleph0(X ). Being of size א2 is defined similarly, etc. There is one size of special interest, the so-called cardinality of the continuum. It is the size of ℘(N), or, equivalently, the size of R. That a set is the size of the continuum can also be expressed in second-order logic, but requires a bit more work. 7.13 The Power of the Continuum In second-order logic we can quantify over subsets of the domain, but not over sets of subsets of the domain. To do this directly, we would need third-order logic. For instance, if we wanted to state Cantor's Theorem that there is no injective function from the power set of a set to the set itself, we might try to formulate it as "for every set X , and every set P , if P is the power set of X , then not P ⪯ X . And to say that P is the power set of X would require formalizing that the elements of P are all and only the subsets of X , so something like ∀Y (P (Y ) ↔Y ⊆ X ). The problem lies in P (Y ): that is not a formula of second-order logic, since only terms can be arguments to one-place relation variables like P . We can, however, simulate quantification over sets of sets, if the domain is large enough. The idea is to make use of the fact that two-place relations R relates elements of the domain to elements of the domain. Given such an R, we can collect all the elements to which some x is R-related: {y ∈ |M | : R(x, y)} is the set "coded by" x . Converseley, if Z ⊆ ℘(|M |) is some collection of subsets of |M |, and there are at least as many elements of |M | as there are sets in Z , then there is also a relation R ⊆ |M |2 such that everyY ∈ Z is coded by some x using R. 165 7.13. THE POWER OF THE CONTINUUM Definition 7.34. If R ⊆ |M |2, then x R-codes {y ∈ |M | : R(x, y)}. Y R-codes ℘(X ) iff for every Z ⊆ X , some x ∈Y R-codesY , and every x ∈Y R-codes someY ∈ Z . Proposition 7.35. The formula Codes(x,R,Y ) ≡ ∀y (Y (y) ↔R(x, y)) expresses that s (x) s (R)-codes s (Y ). The formula Pow(Y,R,X ) ≡ ∀Z (Z ⊆ X →∃x (Y (x) ∧ Codes(x,R,Z ))) ∧ ∀x (Y (x) → ∀Z (Codes(x,R,Z ) → Z ⊆ X ) expresses that s (Y ) s (R)-codes the power set of s (X ). With this trick, we can express statements about the power set by quantifying over the codes of subsets rather than the subsets themselves. For instance, Cantor's Theorem can now be expressed by saying that there is no injective function from the domain of any relation that codes the power set of X to X itself. Proposition 7.36. The sentence ∀X ∀R (Pow(R,X )→ ¬∃u (∀x ∀y (u(x) = u(y) → x = y)∧ ∀Y (Codes(x,R,Y ) → X (u(x))))) is valid. The power set of a countably infinite set is uncountable, and so its cardinality is larger than that of any countably infinite set (which is א0). The size of ℘(R) is called the "power of the continuum," since it is the same size as the points on the real number line, R. If the domain is large enough to code the power set of 166 CHAPTER 7. SECOND-ORDER LOGIC a countably infinite set, we can express that a set is the size of the continuum by saying that it is equinumerous with any set Y that codes the power set of set X of size א0. (If the domain is not large enough, i.e., it contains no subset equinumerous with R, then there can also be no relation that codes ℘(X ).) Proposition 7.37. If R ⪯ |M |, then the formula Cont(X ) ≡ ∀X ∀Y ∀R ((Aleph0(X ) ∧ Pow(Y,R,X )) → ¬Y ⪯ X ) expresses that s (X ) ≈ R. Proposition 7.38. |M | ≈ R iff M ⊨ ∃X ∃Y ∃R (Aleph0(X ) ∧ Pow(Y,R,X )∧ ∃u (∀x ∀y (u(x) = u(y) → x = y) ∧ ∀y (Y (y) → ∃x y = u(x)))). The Continuum Hypothesis is the statement that the size of the continuum is the first uncountable cardinality, i.e, that ℘(N) has size א1. Proposition 7.39. The Continuum Hypothesis is true iff CH ≡ ∀X (Aleph1(X ) ↔ Cont(x)) is valid. Note that it isn't true that ¬CH is valid iff the Continuum Hypothesis is false. In a countable domain, there are no subsets of size א1 and also no subsets of the size of the continuum, so CH is always true in a countable domain. However, we can give a different sentence that is valid iff the Continuum Hypothesis is false: 167 7.13. THE POWER OF THE CONTINUUM Proposition 7.40. The Continuum Hypothesis is false iff NCH ≡ ∀X (Cont(X ) → ∃Y (Y ⊆ X ∧ ¬Count(X ) ∧ ¬X ≈Y )) is valid. Summary Second-order logic is an extension of first-order logic by variables for relations and functions, which can be quantified. Structures for second-order logic are just like first-order structures and give the interpretations of all non-logical symbols of the language. Variable assignments, however, also assign relations and functions on the domain to the second-order variables. The satisfaction relation is defined for second-order formulas just like in the first-order case, but extended to deal with second-order variables and quantifiers. Second-order quantifiers make second-order logic more expressive than first-order logic. For instance, the identity relation on the domain of a structure can be defined without =, by ∀X (X (x) ↔ X (y)). Second-order logic can express the transitive closure of a relation, which is not expressible in first-order logic. Second-order quantifiers can also express properties of the domain, that it is finite or infinite, countable or uncountable. This means that, e.g., there is a second-order sentence Inf such that M ⊨ Inf iff |M | is infinite. Importantly, these are pure second-order sentences, i.e., they contain no non-logical symbols. Because of the compactness and Löwenheim-Skolem theorems, there are no first-order sentences that have these properties. It also shows that the compactness and Löwenheim-Skolem theorems fail for second-order logic. Second-order quantification also makes it possible to replace first-order schemas by single sentences. For instance, secondorder arithmetic PA2 is comprised of the axioms of Q plus the 168 CHAPTER 7. SECOND-ORDER LOGIC single induction axiom ∀X ((X (0) ∧ ∀x (X (x) → X (x ′))) → ∀x X (x)). In contrast to first-order PA, all second-order models of PA2 are isomorphic to the standard model. In other words, PA2 has no non-standard models. Since second-order logic includes first-order logic, it is undecidable. First-order logic is at least axiomatizable, i.e., it has a sound and complete proof system. Second-order logic does not, it is not axiomatizable. Thus, the set of validities of second-order logic is highly non-computable. In fact, pure second-order logic can express set-theoretic claims like the continuum hypothesis, which are independent of set theory. Problems Problem 7.1. Show that ∀X (X (x) → X (y)) (note: → not ↔!) defines Id |M |. Problem 7.2. The sentence Inf ∧ Count is true in all and only countably infinite domains. Adjust the definition of Count so that it becomes a different sentence that directly expresses that the domain is countably infinite, and prove that it does. Problem 7.3. Complete the proof of Proposition 7.19. Problem 7.4. Give an example of a set Γ and a sentence A so that Γ ⊨ A but for every finite subset Γ0 ⊆ Γ , Γ0 ⊭ A. CHAPTER 8 The Lambda Calculus 8.1 Overview The lambda calculus was originally designed by Alonzo Church in the early 1930s as a basis for constructive logic, and not as a model of the computable functions. But it was soon shown to be equivalent to other definitions of computability, such as the Turing computable functions and the partial recursive functions. The fact that this initially came as a small surprise makes the characterization all the more interesting. Lambda notation is a convenient way of referring to a function directly by a symbolic expression which defines it, instead of defining a name for it. Instead of saying "let f be the function defined by f (x) = x + 3," one can say, "let f be the function λx . (x + 3)." In other words, λx . (x + 3) is just a name for the function that adds three to its argument. In this expression, x is a dummy variable, or a placeholder: the same function can just as well be denoted by λy . (y + 3). The notation works even with other parameters around. For example, suppose g (x, y) is a function of two variables, and k is a natural number. Then λx . g (x,k ) is the function which maps any x to g (x,k ). This way of defining a function from a symbolic expression is 169 170 CHAPTER 8. THE LAMBDA CALCULUS known as lambda abstraction. The flip side of lambda abstraction is application: assuming one has a function f (say, defined on the natural numbers), one can apply it to any value, like 2. In conventional notation, of course, we write f (2) for the result. What happens when you combine lambda abstraction with application? Then the resulting expression can be simplified, by "plugging" the applicand in for the abstracted variable. For example, (λx . (x + 3))(2) can be simplified to 2 + 3. Up to this point, we have done nothing but introduce new notations for conventional notions. The lambda calculus, however, represents a more radical departure from the set-theoretic viewpoint. In this framework: 1. Everything denotes a function. 2. Functions can be defined using lambda abstraction. 3. Anything can be applied to anything else. For example, if F is a term in the lambda calculus, F (F ) is always assumed to be meaningful. This liberal framework is known as the untyped lambda calculus, where "untyped" means "no restriction on what can be applied to what." There is also a typed lambda calculus, which is an important variation on the untyped version. Although in many ways the typed lambda calculus is similar to the untyped one, it is much easier to reconcile with a classical set-theoretic framework, and has some very different properties. Research on the lambda calculus has proved to be central in theoretical computer science, and in the design of programming languages. LISP, designed by John McCarthy in the 1950s, is an early example of a language that was influenced by these ideas. 171 8.2. THE SYNTAX OF THE LAMBDA CALCULUS 8.2 The Syntax of the Lambda Calculus One starts with a sequence of variables x , y , z , . . . and some constant symbols a, b , c , . . . . The set of terms is defined inductively, as follows: 1. Each variable is a term. 2. Each constant is a term. 3. If M and N are terms, so is (MN ). 4. If M is a term and x is a variable, then (λx .M ) is a term. The system without any constants at all is called the pure lambda calculus. We will follow a few notational conventions: Convention 8.1. 1. When parentheses are left out, application takes place from left to right. For example, ifM , N , P , and Q are terms, then MNPQ abbreviates (((MN )P )Q ). 2. Again, when parentheses are left out, lambda abstraction is to be given the widest scope possible. From example, λx .MNP is read λx . (MNP ). 3. A lambda can be used to abstract multiple variables. For example, λxyz .M is short for λx . λy . λz .M . For example, λxy . xxyxλz . xz abbreviates λx . λy . ((((xx)y)x)λz . (xz )). You should memorize these conventions. They will drive you crazy at first, but you will get used to them, and after a while they will drive you less crazy than having to deal with a morass of parentheses. 172 CHAPTER 8. THE LAMBDA CALCULUS Two terms that differ only in the names of the bound variables are called α-equivalent; for example, λx . x and λy . y . It will be convenient to think of these as being the "same" term; in other words, when we say that M and N are the same, we also mean "up to renamings of the bound variables." Variables that are in the scope of a λ are called "bound", while others are called "free." There are no free variables in the previous example; but in (λz . yz )x y and x are free, and z is bound. 8.3 Reduction of Lambda Terms What can one do with lambda terms? Simplify them. IfM and N are any lambda terms and x is any variable, we can useM [N /x] to denote the result of substituting N for x inM , after renaming any bound variables of M that would interfere with the free variables of N after the substitution. For example, (λw . xxw)[yyz/x] = λw . (yyz )(yyz )w . Alternative notations for substitution are [N /x]M , M [N /x], and also M [x/N ]. Beware! Intuitively, (λx .M )N and M [N /x] have the same meaning; the act of replacing the first term by the second is called β contraction. (λx .M )N is called a redex and M [N /x] its contractum. Generally, if it is possible to change a term P to P ′ by β -contraction of some subterm, we say that P β -reduces to P ′ in one step, and write P −→ P ′. If from P we can obtain P ′ with some number of one-step reductions (possibly none), then P β -reduces to P ′; in symbols, P −→ P ′. A term that cannot be β -reduced any further is called β -irreducible, or β -normal. We will say "reduces" instead of "β -reduces," etc., when the context is clear. Let us consider some examples. 173 8.4. THE CHURCH-ROSSER PROPERTY 1. We have (λx . xxy)λz . z −→ (λz . z )(λz . z )y −→ (λz . z )y −→ y . 2. "Simplifying" a term can make it more complex: (λx . xxy)(λx . xxy) −→ (λx . xxy)(λx . xxy)y −→ (λx . xxy)(λx . xxy)yy −→ . . . 3. It can also leave a term unchanged: (λx . xx)(λx . xx) −→ (λx . xx)(λx . xx). 4. Also, some terms can be reduced in more than one way; for example, (λx . (λy . yx)z )v −→ (λy . yv )z by contracting the outermost application; and (λx . (λy . yx)z )v −→ (λx . zx)v by contracting the innermost one. Note, in this case, however, that both terms further reduce to the same term, zv . The final outcome in the last example is not a coincidence, but rather illustrates a deep and important property of the lambda calculus, known as the "Church-Rosser property." 8.4 The Church-Rosser Property 174 CHAPTER 8. THE LAMBDA CALCULUS Theorem 8.2. LetM , N1, and N2 be terms, such thatM −→ N1 and M −→ N2. Then there is a term P such that N1 −→ P and N2 −→ P . Corollary 8.3. Suppose M can be reduced to normal form. Then this normal form is unique. Proof. If M −→ N1 and M −→ N2, by the previous theorem there is a term P such that N1 and N2 both reduce to P . If N1 and N2 are both in normal form, this can only happen if N1 ≡ P ≡ N2.□ Finally, we will say that two terms M and N are β -equivalent, or just equivalent, if they reduce to a common term; in other words, if there is some P such that M −→ P and N −→ P . This is written M β = N . Using Theorem 8.2, you can check that β = is an equivalence relation, with the additional property that for every M and N , if M −→ N or N −→ M , then M β = N . (In fact, one can show that β = is the smallest equivalence relation having this property.) 8.5 Currying A λ -abstract λx .M represents a function of one argument, which is quite a limitation when we want to define function accepting multiple arguments. One way to do this would be by extending the λ -calculus to allow the formation of pairs, triples, etc., in which case, say, a three-place function λx .M would expect its argument to be a triple. However, it is more convenient to do this by Currying. Let's consider an example. If we want to define a function that accepts two arguments and returns the first, we write λx . λy . x , which literally is a function that accepts an argument and returns a function that accepts another argument and returns the first argument while it drops the second. Let's see what happens when 175 8.6. LAMBDA DEFINABILITY we apply it to two arguments: (λx . λy . x)MN β −→(λy .M )N β −→M In general, to write a function with parameters x1, . . . , xn defined by some term N , we can write λx1. λx2. . . . λxn .N . If we apply n arguments to it we get: (λx1. λx2. . . . λxn .N )M1 . . .Mn β −→ β −→ ((λx2. . . . λxn .N )[M1/x1])M2 . . .Mn ≡ (λx2. . . . λxn .N [M1/x1])M2 . . .Mn ... β −→ P [M1/x1] . . . [Mn/xn] The last line literally means substituting Mi for xi in the body of the function definition, which is exactly what we want when applying multiple arguments to a function. 8.6 Lambda Definability At first glance, the lambda calculus is just a very abstract calculus of expressions that represent functions and applications of them to others. Nothing in the syntax of the lambda calculus suggests that these are functions of particular kinds of objects, in particular, the syntax includes no mention of natural numbers. Its basic operations-application and lambda abstractions-are operations that apply to any function, not just functions on natural numbers. Nevertheless, with some ingenuity, it is possible to define arithmetical functions, i.e., functions on the natural numbers, in the lambda calculus. To do this, we define, for each natural number n ∈ N, a special λ -term n, the Church numeral for n. (Church numerals are named for Alonzo Church.) 176 CHAPTER 8. THE LAMBDA CALCULUS Definition 8.4. If n ∈ N, the corresponding Church numeral n represents n: n ≡ λ f x . f n(x) Here, f n(x) stands for the result of applying f to x n times. For example, 0 is λ f x . x , and 3 is λ f x . f (f (f x)). The Church numeral n is encoded as a lambda term which represents a function accepting two arguments f and x , and returns f n(x). Church numerals are evidently in normal form. A represention of natural numbers in the lambda calculus is only useful, of course, if we can compute with them. Computing with Church numerals in the lambda calculus means applying a λ -term F to such a Church numeral, and reducing the combined term F n to a normal form. If it always reduces to a normal form, and the normal form is always a Church numeral m, we can think of the output of the computation as being the number m. We can then think of F as defining a function f : N → N, namely the function such that f (n) = m iff F n −→ m. Because of the Church-Rosser property, normal forms are unique if they exist. So if F n −→ m, there can be no other term in normal form, in particular no other Church numeral, that F n reduces to. Conversely, given a function f : N → N, we can ask if there is a term F that defines f in this way. In that case we say that F λ -defines f , and that f is λ -definable. We can generalize this to many-place and partial functions. 177 8.7. λ-DEFINABLE ARITHMETICAL FUNCTIONS Definition 8.5. Suppose f : Nk → N. We say that a lambda term F λ -defines f if for all n0, . . . , nk−1, F n0m1 . . . nk−1 −→ f (n0,n1, . . . ,nk−1) if f (n0, . . . ,nk−1) is defined, and F n0 n1 . . . nk−1 has no normal form otherwise. A very simple example are the constant functions. The term Ck ≡ λx . k λ -defines the function ck : N → N such that c (n) = k . For Ck n ≡ (λx . k )n −→ k for any n. The identity function is λ defined by λx . x . More complex functions are of course harder to define, and often require a lot of ingenuity. So it is perhaps surprising that every computable function is λ -definable. The converse is also true: if a function is λ -definable, it is computable. 8.7 λ -Definable Arithmetical Functions Proposition 8.6. The successor function succ is λ -definable. Proof. A term that λ -defines the successor function is Succ ≡ λa . λ f x . f (a f x). Succ is a function that accepts as argument a number a, and evaluates to another function, λ f x . f (a f x). That function is not itself a Church numeral. However, if the argument a is a Church numeral, it reduces to one. Consider: (λa . λ f x . f (a f x)) n −→ λ f x . f (n f x). The embedded term n f x is a redex, since n is λ f x . f nx . So n f x −→ f nx and so, for the entire term we have Succ n −→ λ f x . f (f n(x)), i.e., n + 1. □ 178 CHAPTER 8. THE LAMBDA CALCULUS Proposition 8.7. The addition function add is λ -definable. Proof. Addition is λ -defined by the terms Add ≡ λab . λ f x . a f (b f x) or, alternatively, Add′ ≡ λab . a Succ b . The first addition works as follows: Add first accept two numbers a and b . The result is a function that accepts f and x and returns a f (b f x). If a and b are Church numerals n and m, this reduces to f n+m(x), which is identical to f n(f m(x)). Or, slowly: (λab . λ f x . a f (b f x))n m −→ λ f x . n f (m f x) −→ λ f x . n f (f mx) −→ λ f x . f n(f mx) ≡ n +m. The second representation of addition Add′ works differently: Applied to two Church numerals n and m, Add′n m −→ n Succm. But n f x always reduces to f n(x). So, n Succm −→ Succn(m). And since Succ λ -defines the successor function, and the successor function applied n times to m gives n+m, this in turn reduces to n +m. □ Proposition 8.8. Multiplication is λ -definable by the term Mult ≡ λab . λ f x . a(b f )x 179 8.8. PAIRS AND PREDECESSOR Proof. To see how this works, suppose we apply Mult to Church numerals n and m: Mult n m reduces to λ f x . n(m f )x . The term mf defines a function which applies f to its argument m times. Consequently, n(mf )x applies the function "apply f m times" itself n times to x . In other words, we apply f to x , n * m times. But the resulting normal term is just the Church numeral nm. □ We can actually simplify this term further by η-reduction: Mult ≡ λab . λ f . a(b f ). The definition of exponentiation as a λ -term is surprisingly simple: Exp ≡ λbe . eb . The first argument b is the base and the second e is the exponent. Intuitively, e f is f e by our encoding of numbers. If you find it hard to understand, we can still define exponentiation also by iterated multiplication: Exp′ ≡ λbe . e (Mult b)1. Predecessor and subtraction on Church numeral is not as simple as we might think: it requires encoding of pairs. 8.8 Pairs and Predecessor Definition 8.9. The pair ofM and N (written ⟨M ,N ⟩) is defined as follows: ⟨M ,N ⟩ ≡ λ f . f MN . Intuitively it is a function that accepts a function, and applies that function to the two elements of the pair. Following this idea we have this constructor, which takes two terms and returns the pair containing them: Pair ≡ λmn . λ f . f mn 180 CHAPTER 8. THE LAMBDA CALCULUS Given a pair, we also want to recover its elements. For this we need two access functions, which accept a pair as argument and return the first or second elements in it: Fst ≡ λp . p(λmn .m) Snd ≡ λp . p(λmn . n) Now with pairs we can λ -define the predecessor function: Pred ≡ λn . Fst(n(λp . ⟨Snd p,Succ(Snd p)⟩)⟨0,0⟩) Remember that n f x reduces to f n(x); in this case f is a function that accepts a pair p and returns a new pair containing the second component of p and the successor of the second component; x is the pair ⟨0,0⟩. Thus, the result is ⟨0,0⟩ for n = 0, and ⟨n − 1,n⟩ otherwise. Pred then returns the first component of the result. Subtraction can be defined as Pred applied to a, b times: Sub ≡ λab . bPred a . 8.9 Truth Values and Relations We can encode truth values in the pure lambda calculus as follows: true ≡ λx . λy . x false ≡ λx . λy . y Truth values are represented as selectors, i.e., functions that accept two arguments and returning one of them. The truth value true selects its first argument, and false its second. For example, trueMN always reduces to M , while falseMN always reduces to N . Definition 8.10. We call a relation R ⊆ Nn λ -definable if there is a term R such that R n1 . . . nk β −→ true 181 8.9. TRUTH VALUES AND RELATIONS whenever R(n1, . . . ,nk ) and R n1 . . . nk β −→ false otherwise. For instance, the relation IsZero = {0} which holds of 0 and 0 only, is λ -definable by IsZero ≡ λn . n(λx . false) true. How does it work? Since Church numerals are defined as iterators (functions which apply their first argument n times to the second), we set the initial value to be true, and for every step of iteration, we return false regardless of the result of the last iteration. This step will be applied to the initial value n times, and the result will be true if and only if the step is not applied at all, i.e., when n = 0. On the basis of this representation of truth values, we can further define some truth functions. Here are two, the representations of negation and conjunction: Not ≡ λx . x false true And ≡ λx . λy . xy false The function "Not" accepts one argument, and returns true if the argument is false, and false if the argument is true. The function "And" accepts two truth values as arguments, and should return true iff both arguments are true. Truth values are represented as selectors (described above), so when x is a truth value and is applied to two arguments, the result will be the first argument if x is true and the second argument otherwise. Now And takes its two arguments x and y , and in return passes y and false to its first argument x . Assuming x is a truth value, the result will evaluate to y if x is true, and to false if x is false, which is just what is desired. 182 CHAPTER 8. THE LAMBDA CALCULUS Note that we assume here that only truth values are used as arguments to And. If it is passed other terms, the result (i.e., the normal form, if it exists) may well not be a truth value. 8.10 Primitive Recursive Functions are λ -Definable Recall that the primitive recursive functions are those that can be defined from the basic functions zero, succ, and P ni by composition and primitive recursion. Lemma 8.11. The basic primitive recursive functions zero, succ, and projections P ni are λ -definable. Proof. They are λ -defined by the following terms: Zero ≡ λa . λ f x . x Succ ≡ λa . λ f x . f (a f x) Projni ≡ λx0 . . . xn−1. xi □ Lemma 8.12. Suppose the k -ary function f , and n-ary functions g0, . . . , gk−1, are λ -definable by terms F , G0, . . . , Gk , and h is defined from them by composition. Then H is λ -definable Proof. h can be λ -defined by the term H ≡ λx0 . . . xn−1.F (G0x0 . . . xn−1) . . . (Gk−1x0 . . . xn−1) We leave verification of this fact as an exercise. □ Note that Lemma 8.12 did not require that f and g0, . . . , gk−1 are primitive recursive; it is only required that they are total and λ -definable. 183 8.10. PRIMITIVE RECURSIVE FUNCTIONS ARE λ-DEFINABLE Lemma 8.13. Suppose f is an n-ary function and g is an n + 2-ary function, they are λ -definable by terms F and G , and the function h is defined from f and g by primitive recursion. Then h is also λ -definable. Proof. Recall that h is defined by h(x1, . . . ,xn,0) = f (x1, . . . ,xn) h(x1, . . . ,xn, y + 1) = h(x1, . . . ,xn, y,h(x1, . . . ,xn, y)). Informally speaking, the primitive recursive definition iterates the application of the function h y times and applies it to f (x1, . . . ,xn). This is reminiscent of the definition of Church numerals, which is also defined as a iterator. For simplicity, we give the definition and proof for a single additional argument x . The function h is λ -defined by: H ≡λx . λy . Snd(yD ⟨0,Fx⟩) where D ≡λp . ⟨Succ(Fst p), (Gx(Fst p)(Snd p))⟩ The iteration state we maintain is a pair, the first of which is the current y and the second is the corresponding value of h. For every step of iteration we create a pair of new values of y and h; after the iteration is done we return the second part of the pair and that's the final h value. We now prove this is indeed a representation of primitive recursion. We want to prove that for any n and m, H n m −→ h(n,m). To do this we first show that if Dn ≡ D[n/x], then Dmn ⟨0,F n⟩ −→ ⟨m,h(n,m)⟩ We proceed by induction on m. If m = 0, we want D0n ⟨0,F n⟩ −→ ⟨0,h(n,0)⟩. But D 0 n ⟨0,F n⟩ just is ⟨0,F n⟩. Since F λ -defines f , this reduces to ⟨0, f (n)⟩, and since f (n) = h(n,0), this is ⟨0,h(n,0)⟩ Now suppose that Dmn ⟨0,F n⟩ −→ ⟨m,h(n,m)⟩. We want to show that Dm+1n ⟨0,F n⟩ −→ ⟨m + 1,h(n,m + 1)⟩. Dm+1n ⟨0,F n⟩ ≡ Dn(D m n ⟨0,F n⟩) 184 CHAPTER 8. THE LAMBDA CALCULUS −→ Dn ⟨m,h(n,m)⟩ (by IH) ≡ (λp . ⟨Succ(Fst p), (G n(Fst p)(Snd p))⟩)⟨m,h(n,m)⟩ −→ ⟨Succ(Fst ⟨m,h(n,m)⟩), (G n(Fst ⟨m,h(n,m)⟩)(Snd ⟨m,h(n,m)⟩))⟩ −→ ⟨Succm, (G n m h(n,m))⟩ −→ ⟨m + 1, g (n,m,h(n,m))⟩ Since g (n,m,h(n,m)) = h(n,m + 1), we are done. Finally, consider H n m ≡ λx . λy . Snd(y(λp .⟨Succ(Fst p), (G x (Fst p) (Snd p))⟩)⟨0,Fx⟩) n m −→ Snd(m (λp .⟨Succ(Fst p), (G n (Fst p)(Snd p))⟩)⏞                       ⏟⏟                       ⏞ Dn ⟨0,Fn⟩) ≡ Snd(mDn ⟨0,Fn⟩) −→ Snd (Dmn ⟨0,Fn⟩) −→ Snd ⟨m,h(n,m)⟩ −→ h(n,m). □ Proposition 8.14. Every primitive recursive function is λ -definable. Proof. By Lemma 8.11, all basic functions are λ -definable, and by Lemma 8.12 and Lemma 8.13, the λ -definable functions are closed under composition and primitive recursion. □ 8.11 Fixpoints Suppose we wanted to define the factorial function by recursion as a term Fac with the following property: Fac ≡ λn . IsZero n 1(Mult n(Fac(Pred n))) 185 8.11. FIXPOINTS That is, the factorial of n is 1 if n = 0, and n times the factorial of n − 1 otherwise. Of course, we cannot define the term Fac this way since Fac itself occurs in the right-hand side. Such recursive definitions involving self-reference are not part of the lambda calculus. Defining a term, e.g., by Mult ≡ λab . a(Add a)0 only involves previously defined terms in the right-hand side, such as Add. We can always remove Add by replacing it with its defining term. This would give the termMult as a pure lambda term; if Add itself involved defined terms (as, e.g., Add′ does), we could continue this process and finally arrive at a pure lambda term. However this is not true in the case of recursive definitions like the one of Fac above. If we replace the occurrence of Fac on the right-hand side with the definition of Fac itself, we get: Fac ≡ λn . IsZero n 1 (Mult n((λn . IsZero n 1 (Mult n (Fac(Pred n))))(Pred n))) and we still haven't gotten rid of Fac on the right-hand side. Clearly, if we repeat this process, the definition keeps growing longer and the process never results in a pure lambda term. Thus this way of defining factorial (or more generally recursive functions) is not feasible. The recursive definition does tell us something, though: If f were a term representing the factorial function, then the term Fac′ ≡ λg . λn . IsZero n 1 (Mult n (g (Predn))) applied to the term f , i.e., Fac′ f , also represents the factorial function. That is, if we regard Fac′ as a function accepting a function and returning a function, the value of Fac′ f is just f , provided f is the factorial. A function f with the property that Fac′ f β = f is called a fixpoint of Fac′. So, the factorial is a fixpoint of Fac′. There are terms in the lambda calculus that compute the fixpoints of a given term, and these terms can then be used to turn a term like Fac′ into the definition of the factorial. 186 CHAPTER 8. THE LAMBDA CALCULUS Definition 8.15. The Y-combinator is the term: Y ≡ (λux . x(uux))(λux . x(uux)). Theorem 8.16. Y has the property thatY g −→ g (Y g ) for any term g . Thus,Y g is always a fixpoint of g . Proof. Let's abbreviate (λux . x(uux)) by U , so that Y ≡ UU . Then Y g ≡ (λux . x(uux))U g −→ (λx . x(UU x))g −→ g (UU g ) ≡ g (Y g ). Since g (Y g ) and Y g both reduce to g (Y g ), g (Y g ) β = Y g , so Y g is a fixpoint of g . □ Of course, since Y g is a redex, the reduction can continue indefinitely: Y g −→ g (Y g ) −→ g (g (Y g )) −→ g (g (g (Y g ))) . . . So we can think ofY g as g applied to itself infinitely many times. If we apply g to it one additional time, we-so to speak-aren't doing anything extra; g applied to g applied infinitely many times toY g is still g applied toY g infinitely many times. Note that the above sequence of β -reduction steps starting with Y g is infinite. So if we apply Y g to some term, i.e., consider (Y g )N , that term will also reduce to infinitely many different terms, namely (g (Y g ))N , (g (g (Y g )))N , . . . . It is nevertheless possible that some other sequence of reduction steps does terminate in a normal form. 187 8.11. FIXPOINTS Take the factorial for instance. Define Fac as Y Fac′ (i.e., a fixpoint of Fac′). Then: Fac 3 −→ Y Fac′ 3 −→ Fac′(Y Fac′) 3 ≡ (λx . λn . IsZero n 1 (Mult n (x(Pred n))))Fac 3 −→ IsZero 3 1 (Mult 3 (Fac(Pred 3))) −→ Mult 3 (Fac 2). Similarly, Fac 2 −→ Mult 2 (Fac 1) Fac 1 −→ Mult 1 (Fac 0) but Fac 0 −→ Fac′(Y Fac′) 0 ≡ (λx . λn . IsZero n 1 (Mult n (x(Pred n))))Fac 0 −→ IsZero 0 1 (Mult 0 (Fac(Pred 0))). −→ 1. So together Fac 3 −→ Mult 3 (Mult 2 (Mult 1 1)). What goes for Fac′ goes for any recursive definition. Suppose we have a recursive equation g x1 . . . xn β = N where N may contain g and x1, . . . , xn . Then there is always a term G ≡ (Y λg . λx1 . . . xn .N ) such that G x1 . . . xn β = N [G/g ]. 188 CHAPTER 8. THE LAMBDA CALCULUS For by the fixpoint theorem, G ≡ (Y λg . λx1 . . . xn .N ) −→ λg . λx1 . . . xn .N (Y λg . λx1 . . . xn .N ) ≡ (λg . λx1 . . . xn .N )G and consequently G x1 . . . xn −→ (λg . λx1 . . . xn .N )G x1 . . . xn −→ (λx1 . . . xn .N [G/g ]) x1 . . . xn −→ N [G/g ]. The Y combinator of Definition 8.15 is due to Alan Turing. Alonzo Church had proposed a different version which we'll callYC : YC ≡ λg . (λx . g (xx))(λx . g (xx)). Church's combinator is a bit weaker than Turing's in that Y g β = g (Y g ) but not Y g β −→ g (Y g ). Let V be the term λx . g (xx), so thatYC ≡ λg .VV . Then VV ≡ (λx . g (xx))V −→ g (VV ) and thus YC g ≡ (λg .VV )g −→ VV −→ g (VV ), but also g (YC g ) ≡ g ((λg .VV )g ) −→ g (VV ). In other words,YC g and g (YC g ) reduce to a common term g (VV ); soYC g β = g (YC g ). This is often enough for applications. 8.12 Minimization The general recursive functions are those that can be obtained from the basic functions zero, succ, P ni by composition, primitive recursion, and regular minimization. To show that all general recursive functions are λ -definable we have to show that any function defined by regular minimization from a λ -definable function is itself λ -definable. 189 8.12. MINIMIZATION Lemma 8.17. If f (x1, . . . ,xk , y) is regular and λ -definable, then g defined by g (x1, . . . ,xk ) = μy f (x1, . . . ,xk , y) = 0 is also λ -definable. Proof. Suppose the lambda term F λ -defines the regular function f (x⃗, y). To λ -define h we use a search function and a fixpoint combinator: Search ≡ λg . λ f x⃗ y . IsZero(f x⃗ y) y (g x⃗(Succ y) H ≡ λx⃗ . (Y Search)F x⃗ 0, where Y is any fixpoint combinator. Informally speaking, Search is a self-referencing function: starting with y , test whether f x⃗ y is zero: if so, return y , otherwise call itself with Succ y . Thus (Y Search)Fn1 . . . nk 0 returns the least m for which f (n1, . . . ,nk ,m) = 0. Specifically, observe that (Y Search)Fn1 . . . nk m −→ m if f (n1, . . . ,nk ,m) = 0, or −→ (Y Search)F n1 . . . nk m + 1 otherwise. Since f is regular, f (n1, . . . ,nk , y) = 0 for some y , and so (Y Search)Fn1 . . . nk 0 −→ h(n1, . . . ,nk ). □ Proposition 8.18. Every general recursive function is λ -definable. Proof. By Lemma 8.11, all basic functions are λ -definable, and by Lemma 8.12, Lemma 8.13, and Lemma 8.17, the λ -definable functions are closed under composition, primitive recursion, and regular minimization. □ 190 CHAPTER 8. THE LAMBDA CALCULUS 8.13 Partial Recursive Functions are λ -Definable Partial recursive functions are those obtained from the basic functions by composition, primitive recursion, and unbounded minimization. They differ from general recursive function in that the functions used in unbounded search are not required to be regular. Not requiring regularity means that functions defined by minimization may sometimes not be defined. At first glance it might seem that the same methods used to show that the (total) general recursive functions are all λ definable can be used to prove that all partial recursive functions are λ -definable. For instance, the composition of f with g is λ -defined by λx .F (Gx) if f and g are λ -defined by terms F and G , respectively. However, when the functions are partial, this is problematic. When g (x) is undefined, meaning Gx has no normal form. In most cases this means that F (Gx) has no normal forms either, which is what we want. But consider when F is λx . λy . y , in which case F (Gx) does have a normal form (λy . y). This problem is not insurmountable, and there are ways to λ define all partial recursive functions in such a way that undefined values are represented by terms without a normal form. These ways are, however, somewhat more complicated and less intuitive than the approach we have taken for general recursive functions. We record the theorem here without proof: Theorem 8.19. All partial recursive functions are λ -definable. 8.14 λ -Definable Functions are Recursive Not only are all partial recursive functions λ -definable, the converse is true, too. That is, all λ -definable functions are partial recursive. 191 8.14. λ-DEFINABLE FUNCTIONS ARE RECURSIVE Theorem 8.20. If a partial function f is λ -definable, it is partial recursive. Proof. We only sketch the proof. First, we arithmetize λ -terms, i.e., systematially assign Gödel numbers to λ -terms as using the usual power-of-primes coding of sequences. Then we define a partial recursive function normalize(t ) operating on the Gödel number t of a lambda term as argument, and which returns the Gödel number of the normal form if it has one, or is undefined otherwise. Then define two partial recursive functions toChurch and fromChurch that maps natural numbers to and from the Gödel numbers of the corresponding Church numeral. Using these recursive functions, we can define the function f as a partial recursive function. There is a lambda term F that λ -defines f . To compute f (n1, . . . ,nk ), first obtain the Gödel numbers of the corresponding Church numerals using toChurch(ni ), append these to #F # to obtain the Gödel number of the term Fn1 . . . nk . Now use normalize on this Gödel number. If f (n1, . . . ,nk ) is defined, Fn1 . . . nk has a normal form (which must be a Church numeral), and otherwise it has no normal form (and so normalize( #Fn1 . . . nk #) is undefined). Finally, use fromChurch on the Gödel number of the normalized term. □ Problems Problem 8.1. The term Succ′ ≡ λn . λ f x . n f (f x) λ -defines the successor function. Explain why. Problem 8.2. Multiplication can be λ -defined by the term Mult′ ≡ λab . a(Add a)0. Explain why this works. 192 CHAPTER 8. THE LAMBDA CALCULUS Problem 8.3. Explain why the access functions Fst and Snd work. Problem 8.4. Define the functions Or and Xor representing the truth functions of inclusive and exclusive disjunction using the encoding of truth values as λ -terms. Problem 8.5. Complete the proof of Lemma 8.12 by showing that Hn0 . . . nn−1 −→ h(n0, . . . ,nn−1). APPENDIX A Derivations in Arithmetic Theories When we showed that all general recursive functions are representable inQ , and in the proofs of the incompleteness theorems, we claimed that various things are provable in Q and PA. The proofs of these claims, however, just gave the arguments informally without exhibiting actual derivations in natural deduction. We provide some of these derivations in this capter. For instance, in Lemma 4.15 we proved that, for all n and m ∈ N, Q ⊢ (n +m) = n +m. We did this by induction on m. Proof of Lemma 4.15. Base case: m = 0. Then what has to be proved is that, for all n, Q ⊢ n + 0 = n + 0. Since 0 is just 0 and n + 0 is n, this amounts to showing that Q ⊢ (n + 0) = n. The derivation ∀x (x + 0) = x ∀Elim (n + 0) = n is a natural deduction derivation of (n + 0) = n with one undischarged assumption, and that undischarged assumption is an ax193 194 APPENDIX A. DERIVATIONS IN ARITHMETIC THEORIES iom of Q . Inductive step: Suppose that, for any n, Q ⊢ (n +m) = n +m (say, by a derivation δn,m). We have to show that also Q ⊢ (n + m + 1) = n +m + 1. Note that m + 1 ≡ m ′, and that n +m + 1 ≡ n +m ′. So we are looking for a derivation of (n + m ′) = n +m ′ from the axioms of Q . Our derivation may use the derivation δn,m which exists by inductive hypothesis. δn,m (n +m) = n +m ∀x ∀y (x + y ′) = (x + y)′ ∀Elim ∀y (n + y ′) = (n + y)′ ∀Elim (n +m ′) = (n +m)′ =Elim (n +m ′) = n +m ′ In the last =Elim inference, we replace the subterm n +m of the right side (n +m)′ of the right premise by the term n +m. □ In Lemma 4.22, we showed that Q ⊢ ∀x ¬x < 0. What does an actual derivation look like? Proof of Lemma 4.22. To prove a universal claim like this, we use ∀Intro, which requires a derivation of ¬a < 0. Looking at axiom Q8, this means proving ¬∃z (z ′ + a) = 0. Specifically, if we had a proof of the latter, Q8 would allow us to prove the former (recall that A↔ B is short for (A→ B) ∧ (B → A). ¬∃z (z ′ + a) = 0 ∀x ∀y (x < y ↔∃z (z ′ + x) = y) ∀Elim ∀y (a < y ↔∃z (z ′ + a) = y) ∀Elim a < 0↔∃z (z ′ + a) = 0 ∧Elim a < 0→∃z (z ′ + a) = 0 [a < 0]1 →Elim ∃z (z ′ + a) = 0 ¬Elim ⊥ 1 ¬Intro¬a < 0 This is a derivation of ¬a < 0 from ¬∃z (z ′ + a) = 0 (and Q8); let's call it δ1. Now how do we prove ¬∃z (z ′+a) = 0 from the axioms ofQ ? To prove a negated claim like this, we'd need a derivation of the form 195 [∃z (z ′ + a) = 0]2 ⊥ 2 ¬Intro ¬∃z (z ′ + a) = 0 To get a contradiction from an existential claim, we introduce a constant b for the existentially quantified variable z and use ∃Elim: [∃z (z ′ + a) = 0]2 [(b ′ + a) = 0]3 δ2 ⊥ 3 ∃Elim⊥ 2 ¬Intro ¬∃z (z ′ + a) = 0 Now the task is to fill in δ2, i.e., prove ⊥ from (b ′ + a) = 0 and the axioms of Q . Q2 says that 0 can't be the successor of some number, so one way of doing that would be to show that (b ′ + a) is equal to the successor of some number. Since that expression itself is a sum, the axioms for addition must come into play. If a = 0, Q5 would tell us that (b ′+a) = b ′, i.e., b ′+a is the successor of some number, namely of b . On the other hand, if a = c ′ for some c , then (b ′ + a) = (b ′ + c ′) by =Elim, and (b ′ + c ′) = (b ′ + c )′ by Q6. So again, b ′+a is the successor of a number-in this case, b ′+c . So the strategy is to divide the task into these two cases. We also have to verify that Q proves that one of these cases holds, i.e., Q ⊢ a = 0 ∨ ∃y (a = y ′), but this follows directly from Q3 by ∀Elim. Here are the two cases: Case 1: Prove ⊥ from a = 0 and (b ′ + a) = 0 (and axioms Q2, Q5): ∀x ¬0 = x ′ ∀Elim ¬0 = b ′ ∀x (x + 0) = x ∀Elim (b ′ + 0) = b ′ a = 0 (b ′ + a) = 0 =Elim (b ′ + 0) = 0 0 = (b ′ + 0) =Elim 0 = b ′ ¬Elim ⊥ 196 APPENDIX A. DERIVATIONS IN ARITHMETIC THEORIES Call this derivation δ3. (We've abbreviated the derivation of 0 = (b ′ + 0) from (b ′ + 0) = 0 by a double inference line.) Case 2: Prove ⊥ from ∃y a = y ′ and (b ′ + a) = 0 (and axioms Q2,Q6). We first show how to derive⊥ from a = c ′ and (b ′+a) = 0. ∀x ¬0 = x ′ ∀Elim ¬0 = (b ′ + c )′ a = c ′ (b ′ + a) = 0 =Elim (b ′ + c ′) = 0 ∀x ∀y (x + y ′) = (x + y)′ ∀Elim ∀y (b ′ + y ′) = (b ′ + y)′ ∀Elim (b ′ + c ′) = (b ′ + c )′ =Elim 0 = (b ′ + c )′ ¬Elim ⊥ Call this δ4. We get the required derivation δ5 by applying ∃Elim and discharging the assumption a = c ′: ∃y a = y ′ [a = c ′]6 (b ′ + a) = 0 δ4 ⊥ 6 ∃Elim⊥ Putting everything together, the full proof looks like this: [∃z (z ′ + a) = 0]2 ∀x (x = 0 ∨ ∃y (a = y ′)) ∀Elim a = 0 ∨ ∃y (a = y ′) [a = 0]7 [(b ′ + a) = 0]3 δ3 ⊥ [∃y a = y ′]7 [(b ′ + a) = 0]3 δ5 ⊥ 7 ∨Elim ⊥ ⎫⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎭ δ2 3 ∃Elim ⊥ 2 ¬Intro ¬∃z (z ′ + a) = 0 δ1 ¬a < 0 ∀Intro ∀x ¬x < 0 □ 197 In the proof of Theorem 5.7, we defined RProv(y) as ∃x (Prf(x, y) ∧ ∀z (z < x →¬Ref(z, y))). Prf(x, y) is the formula representing the proof relation ofT (a consistent, axiomatizable extension of Q ) in Q , and Ref(z, y) is the formula representing the refutation relation. That means that if n is the Gödel number of a proof of A, then Q ⊢ Prf(n,⌜A⌝), and otherwise Q ⊢ ¬Prf(n,⌜A⌝). Similarly, if n is the Gödel number of a proof of ¬A, then Q ⊢ Ref(n,⌜A⌝), and otherwise Q ⊢ ¬Ref(n,⌜A⌝). We use the Diagonal Lemma to find a sentence R such that Q ⊢ R ↔ ¬RProv(⌜R⌝). Rosser's Theorem states that T ⊬ R and T ⊬ ¬R. Both claims were proved indirectly: we show that if T ⊢ R, T is inconsistent, i.e., T ⊢ ⊥, and the same if T ⊢ ¬R. Proof of Theorem 5.7. First we prove something things about <. By Lemma 4.23, we know thatQ ⊢ ∀x (x < n + 1→(x = 0∨ * * *∨x = n)) for every n. So of course also (if n > 1), Q ⊢ ∀x (x < n→(x = 0∨* * *∨x = n − 1)). We can use this to derive a = 0∨* * *∨a = n − 1 from a < n: a < n ∀x (x < n→ (x = 0 ∨ * * * ∨ x = n − 1)) ∀Elim a < n→ (a = 0 ∨ * * * ∨ a = n − 1) →Elim a = 0 ∨ * * * ∨ a = n − 1 Let's call this derivation λ1. Now, to show that T ⊬ R, we assume that T ⊢ R (with a derivation δ) and show that T then would be inconsistent. Let n be the Gödel number of δ. Since Prf represents the proof relation in Q , there is a derivation δ1 of Prf(n,⌜R⌝). Furthermore, no k < n is the Gödel number of a refutation ofR sinceT is assumed to be consistent, so for each k < n, Q ⊢ ¬Ref(k,⌜R⌝); let ρk be the corresponding derivation. We get a derivation of RProv(⌜R⌝): 198 APPENDIX A. DERIVATIONS IN ARITHMETIC THEORIES δ1 Prf(n,⌜R⌝) [a < n]1 λ1 a = 0 ∨ . . . ∨ a = n − 1 . . . [a = k ]2 ρk ¬Ref(k,⌜R⌝) =Elim ¬Ref(a,⌜R⌝) . . . 2 ∨Elim∗ ¬Ref(a,⌜R⌝) 1 →Intro a < n→¬Ref(a,⌜R⌝) ∀Intro ∀z (z < n→¬Ref(z,⌜R⌝)) ∧Intro Prf(n,⌜R⌝) ∧ ∀z (z < n→¬Ref(z,⌜R⌝)) ∃Intro ∃x (Prf(x,⌜R⌝) ∧ ∀z (z < x →¬Ref(z,⌜R⌝))) (We abbreviate multiple applications of ∨Elim by ∨Elim∗ above.) We've shown that if T ⊢ R there would be a derivation of RProv(⌜R⌝). Then, since T ⊢ R ↔ ¬RProv(⌜R⌝), also T ⊢ RProv(⌜R⌝) → ¬R, we'd have T ⊢ ¬R and T would be inconsistent. Now let's show that T ⊬ ¬R. Again, suppose it did. Then there is a derivation ρ of ¬R with Gödel number m-a refutation of R-and so Q ⊢ Ref(m,⌜R⌝) by a derivation ρ1. Since we assume T is consistent, T ⊬ R. So for all k , k is not a Gödel number of a derivation of R, and hence Q ⊢ ¬Prf(k,⌜R⌝) by a derivation πk . So we have: 199 λ2 a = 0 ∨ * * * ∨ a = m ∨m < a . . . [Prf(a,⌜R⌝)]1 [a = k ]2 π′k ⊥ ⊥I m < a . . . [m < a]2 2 ∨Elim∗ m < a ρ1 Ref(m,⌜R⌝) ∧Intro m < a ∧ Ref(m,⌜R⌝) ∃Intro ∃z (z < a ∧ Ref(z,⌜R⌝)) 1 →Intro Prf(a,⌜R⌝) → ∃z (z < a ∧ Ref(z,⌜R⌝)) ∀Intro ∀x(Prf(x,⌜R⌝) → ∃z (z < x ∧ Ref(z,⌜R⌝))) ¬∃x(Prf(x,⌜R⌝) ∧ ∀z (z < x →¬Ref(z,⌜R⌝))) where π′k is the derivation πk ¬Prf(k,⌜R⌝) a = k Prf(a,⌜R⌝) =Elim Prf(k,⌜R⌝) ¬Elim⊥ and λ2 is λ3 (a < m ∨ a = m) ∨ m < a [a < m]3 λ1 a = 0 ∨ * * * ∨ a = m − 1 a = 0 ∨ * * * ∨ a = m ∨m < a [a = m]3 a = 0 ∨ * * * ∨ a = m ∨m < a [m < a]3 ∨Intro∗ a = 0 ∨ * * * ∨ a = m ∨m < a 3 ∨Elim2 a = 0 ∨ * * * ∨ a = m ∨m < a (The derivation λ3 exists by Lemma 4.24. We abbreviate repeated use of ∨Intro by ∨Intro∗ and the double use of ∨Elim to 200 APPENDIX A. DERIVATIONS IN ARITHMETIC THEORIES derive a = 0∨ * * * ∨ a = m ∨m < a from (a < m ∨ a = m) ∨m < a as ∨Elim2.) □ APPENDIX B First-order Logic B.1 First-Order Languages Expressions of first-order logic are built up from a basic vocabulary containing variables, constant symbols, predicate symbols and sometimes function symbols. From them, together with logical connectives, quantifiers, and punctuation symbols such as parentheses and commas, terms and formulas are formed. Informally, predicate symbols are names for properties and relations, constant symbols are names for individual objects, and function symbols are names for mappings. These, except for the identity predicate =, are the non-logical symbols and together make up a language. Any first-order language L is determined by its non-logical symbols. In the most general case, L contains infinitely many symbols of each kind. In the general case, we make use of the following symbols in first-order logic: 1. Logical symbols a) Logical connectives: ¬ (negation), ∧ (conjunction), ∨ (disjunction), → (conditional), ∀ (universal quanti201 202 APPENDIX B. FIRST-ORDER LOGIC fier), ∃ (existential quantifier). b) The propositional constant for falsity ⊥. c) The two-place identity predicate =. d) A countably infinite set of variables: v0, v1, v2, . . . 2. Non-logical symbols, making up the standard language of first-order logic a) A countably infinite set of n-place predicate symbols for each n > 0: An0 , A n 1 , A n 2 , . . . b) A countably infinite set of constant symbols: c0, c1, c2, . . . . c) A countably infinite set of n-place function symbols for each n > 0: f n0 , f n 1 , f n 2 , . . . 3. Punctuation marks: (, ), and the comma. Most of our definitions and results will be formulated for the full standard language of first-order logic. However, depending on the application, we may also restrict the language to only a few predicate symbols, constant symbols, and function symbols. Example B.1. The language LA of arithmetic contains a single two-place predicate symbol <, a single constant symbol 0, one one-place function symbol ′, and two two-place function symbols + and ×. Example B.2. The language of set theory LZ contains only the single two-place predicate symbol ∈. Example B.3. The language of ordersL≤ contains only the twoplace predicate symbol ≤. Again, these are conventions: officially, these are just aliases, e.g., <, ∈, and ≤ are aliases for A20, 0 for c0, ′ for f 1 0 , + for f 2 0 , × for f 21 . 203 B.2. TERMS AND FORMULAS In addition to the primitive connectives and quantifiers introduced above, we also use the following defined symbols: ↔ (biconditional), truth ⊤ A defined symbol is not officially part of the language, but is introduced as an informal abbreviation: it allows us to abbreviate formulas which would, if we only used primitive symbols, get quite long. This is obviously an advantage. The bigger advantage, however, is that proofs become shorter. If a symbol is primitive, it has to be treated separately in proofs. The more primitive symbols, therefore, the longer our proofs. We might treat all the propositional operators and both quantifiers as primitive symbols of the language. We might instead choose a smaller stock of primitive symbols and treat the other logical operators as defined. "Truth functionally complete" sets of Boolean operators include {¬,∨}, {¬,∧}, and {¬,→}-these can be combined with either quantifier for an expressively complete first-order language. You may be familiar with two other logical operators: the Sheffer stroke | (named after Henry Sheffer), and Peirce's arrow ↓, also known as Quine's dagger. When given their usual readings of "nand" and "nor" (respectively), these operators are truth functionally complete by themselves. B.2 Terms and Formulas Once a first-order language L is given, we can define expressions built up from the basic vocabulary of L. These include in particular terms and formulas. Definition B.4 (Terms). The set of terms Trm(L) of L is defined inductively by: 1. Every variable is a term. 2. Every constant symbol of L is a term. 204 APPENDIX B. FIRST-ORDER LOGIC 3. If f is an n-place function symbol and t1, . . . , tn are terms, then f (t1, . . . , tn) is a term. 4. Nothing else is a term. A term containing no variables is a closed term. The constant symbols appear in our specification of the language and the terms as a separate category of symbols, but they could instead have been included as zero-place function symbols. We could then do without the second clause in the definition of terms. We just have to understand f (t1, . . . , tn) as just f by itself if n = 0. Definition B.5 (Formula). The set of formulas Frm(L) of the language L is defined inductively as follows: 1. ⊥ is an atomic formula. 2. If R is an n-place predicate symbol of L and t1, . . . , tn are terms of L, then R(t1, . . . , tn) is an atomic formula. 3. If t1 and t2 are terms of L, then =(t1, t2) is an atomic formula. 4. If A is a formula, then ¬A is formula. 5. If A and B are formulas, then (A ∧ B) is a formula. 6. If A and B are formulas, then (A ∨ B) is a formula. 7. If A and B are formulas, then (A→ B) is a formula. 8. If A is a formula and x is a variable, then ∀x A is a formula. 9. If A is a formula and x is a variable, then ∃x A is a formula. 10. Nothing else is a formula. The definitions of the set of terms and that of formulas are 205 B.2. TERMS AND FORMULAS inductive definitions. Essentially, we construct the set of formulas in infinitely many stages. In the initial stage, we pronounce all atomic formulas to be formulas; this corresponds to the first few cases of the definition, i.e., the cases for ⊥, R(t1, . . . , tn) and =(t1, t2). "Atomic formula" thus means any formula of this form. The other cases of the definition give rules for constructing new formulas out of formulas already constructed. At the second stage, we can use them to construct formulas out of atomic formulas. At the third stage, we construct new formulas from the atomic formulas and those obtained in the second stage, and so on. A formula is anything that is eventually constructed at such a stage, and nothing else. By convention, we write = between its arguments and leave out the parentheses: t1 = t2 is an abbreviation for =(t1, t2). Moreover, ¬=(t1, t2) is abbreviated as t1 ≠ t2. When writing a formula (B ∗C ) constructed from B , C using a two-place connective ∗, we will often leave out the outermost pair of parentheses and write simply B ∗C . Definition B.6. Formulas constructed using the defined operators are to be understood as follows: 1. ⊤ abbreviates ¬⊥. 2. A↔ B abbreviates (A→ B) ∧ (B → A). If we work in a language for a specific application, we will often write two-place predicate symbols and function symbols between the respective terms, e.g., t1 < t2 and (t1 + t2) in the language of arithmetic and t1 ∈ t2 in the language of set theory. The successor function in the language of arithmetic is even written conventionally after its argument: t ′. Officially, however, these are just conventional abbreviations for A20(t1, t2), f 2 0 (t1, t2), A20(t1, t2) and f 1 0 (t ), respectively. 206 APPENDIX B. FIRST-ORDER LOGIC Definition B.7 (Syntactic identity). The symbol ≡ expresses syntactic identity between strings of symbols, i.e., A ≡ B iff A and B are strings of symbols of the same length and which contain the same symbol in each place. The ≡ symbol may be flanked by strings obtained by concatenation, e.g., A ≡ (B ∨ C ) means: the string of symbols A is the same string as the one obtained by concatenating an opening parenthesis, the string B , the ∨ symbol, the string C , and a closing parenthesis, in this order. If this is the case, then we know that the first symbol of A is an opening parenthesis, A contains B as a substring (starting at the second symbol), that substring is followed by ∨, etc. B.3 Free Variables and Sentences Definition B.8 (Free occurrences of a variable). The free occurrences of a variable in a formula are defined inductively as follows: 1. A is atomic: all variable occurrences in A are free. 2. A ≡ ¬B : the free variable occurrences of A are exactly those of B . 3. A ≡ (B ∗ C ): the free variable occurrences of A are those in B together with those in C . 4. A ≡ ∀x B : the free variable occurrences in A are all of those in B except for occurrences of x . 5. A ≡ ∃x B : the free variable occurrences in A are all of those in B except for occurrences of x . 207 B.3. FREE VARIABLES AND SENTENCES Definition B.9 (Bound Variables). An occurrence of a variable in a formula A is bound if it is not free. Definition B.10 (Scope). If ∀x B is an occurrence of a subformula in a formula A, then the corresponding occurrence of B in A is called the scope of the corresponding occurrence of ∀x . Similarly for ∃x . If B is the scope of a quantifier occurrence ∀x or ∃x in A, then the free occurrences of x in B are bound in ∀x B and ∃x B . We say that these occurrences are bound by the mentioned quantifier occurrence. Example B.11. Consider the following formula: ∃v0 A 2 0(v0, v1)⏞  ⏟⏟  ⏞ B B represents the scope of ∃v0. The quantifier binds the occurence of v0 in B , but does not bind the occurence of v1. So v1 is a free variable in this case. We can now see how this might work in a more complicated formula A: ∀v0 (A 1 0(v0) → A 2 0(v0, v1))⏞          ⏟⏟          ⏞ B →∃v1 (A 2 1(v0, v1) ∨ ∀v0 D⏟ ⏞⏞ ⏟ ¬A11(v0))⏞             ⏟⏟             ⏞ C B is the scope of the first ∀v0, C is the scope of ∃v1, and D is the scope of the second ∀v0. The first ∀v0 binds the occurrences of v0 in B , ∃v1 the occurrence of v1 in C , and the second ∀v0 binds the occurrence of v0 in D . The first occurrence of v1 and the fourth occurrence of v0 are free in A. The last occurrence of v0 is free in D , but bound in C and A. 208 APPENDIX B. FIRST-ORDER LOGIC Definition B.12 (Sentence). A formula A is a sentence iff it contains no free occurrences of variables. B.4 Substitution Definition B.13 (Substitution in a term). We define s [t/x], the result of substituting t for every occurrence of x in s , recursively: 1. s ≡ c : s [t/x] is just s . 2. s ≡ y : s [t/x] is also just s , provided y is a variable and y ≡ x . 3. s ≡ x : s [t/x] is t . 4. s ≡ f (t1, . . . , tn): s [t/x] is f (t1[t/x], . . . , tn[t/x]). Definition B.14. A term t is free for x in A if none of the free occurrences of x in A occur in the scope of a quantifier that binds a variable in t . Example B.15. 1. v8 is free for v1 in ∃v3A24(v3, v1) 2. f 21 (v1, v2) is not free for vo in ∀v2A 2 4(v0, v2) Definition B.16 (Substitution in a formula). If A is a formula, x is a variable, and t is a term free for x in A, then A[t/x] is the result of substituting t for all free occurrences of x in A. 1. A ≡ ⊥: A[t/x] is ⊥. 2. A ≡ P (t1, . . . , tn): A[t/x] is P (t1[t/x], . . . , tn[t/x]). 209 B.4. SUBSTITUTION 3. A ≡ t1 = t2: A[t/x] is t1[t/x] = t2[t/x]. 4. A ≡ ¬B : A[t/x] is ¬B[t/x]. 5. A ≡ (B ∧C ): A[t/x] is (B[t/x] ∧C [t/x]). 6. A ≡ (B ∨C ): A[t/x] is (B[t/x] ∨C [t/x]). 7. A ≡ (B →C ): A[t/x] is (B[t/x] →C [t/x]). 8. A ≡ ∀y B : A[t/x] is ∀y B[t/x], provided y is a variable other than x ; otherwise A[t/x] is just A. 9. A ≡ ∃y B : A[t/x] is ∃y B[t/x], provided y is a variable other than x ; otherwise A[t/x] is just A. Note that substitution may be vacuous: If x does not occur in A at all, then A[t/x] is just A. The restriction that t must be free for x in A is necessary to exclude cases like the following. If A ≡ ∃y x < y and t ≡ y , then A[t/x] would be ∃y y < y . In this case the free variable y is "captured" by the quantifier ∃y upon substitution, and that is undesirable. For instance, we would like it to be the case that whenever ∀x B holds, so does B[t/x]. But consider ∀x ∃y x < y (here B is ∃y x < y). It is sentence that is true about, e.g., the natural numbers: for every number x there is a number y greater than it. If we allowed y as a possible substitution for x , we would end up with B[y/x] ≡ ∃y y < y , which is false. We prevent this by requiring that none of the free variables in t would end up being bound by a quantifier in A. We often use the following convention to avoid cumbersume notation: If A is a formula with a free variable x , we write A(x) to indicate this. When it is clear which A and x we have in mind, and t is a term (assumed to be free for x in A(x)), then we write A(t ) as short for A(x)[t/x]. 210 APPENDIX B. FIRST-ORDER LOGIC B.5 Structures for First-order Languages First-order languages are, by themselves, uninterpreted: the constant symbols, function symbols, and predicate symbols have no specific meaning attached to them. Meanings are given by specifying a structure. It specifies the domain, i.e., the objects which the constant symbols pick out, the function symbols operate on, and the quantifiers range over. In addition, it specifies which constant symbols pick out which objects, how a function symbol maps objects to objects, and which objects the predicate symbols apply to. Structures are the basis for semantic notions in logic, e.g., the notion of consequence, validity, satisfiablity. They are variously called "structures," "interpretations," or "models" in the literature. Definition B.17 (Structures). A structure M, for a language L of first-order logic consists of the following elements: 1. Domain: a non-empty set, |M | 2. Interpretation of constant symbols: for each constant symbol c of L, an element cM ∈ |M | 3. Interpretation of predicate symbols: for each n-place predicate symbol R of L (other than =), an n-place relation RM ⊆ |M |n 4. Interpretation of function symbols: for each n-place function symbol f of L, an n-place function f M : |M |n → |M | Example B.18. A structure M for the language of arithmetic consists of a set, an element of |M |, 0M, as interpretation of the constant symbol 0, a one-place function ′M : |M | → |M |, two twoplace functions +M and ×M, both |M |2 → |M |, and a two-place relation <M ⊆ |M |2. An obvious example of such a structure is the following: 1. |N | = N 211 B.5. STRUCTURES FOR FIRST-ORDER LANGUAGES 2. 0N = 0 3. ′N(n) = n + 1 for all n ∈ N 4. +N(n,m) = n +m for all n,m ∈ N 5. ×N(n,m) = n * m for all n,m ∈ N 6. <N = {⟨n,m⟩ : n ∈ N,m ∈ N,n < m} The structure N for LA so defined is called the standard model of arithmetic, because it interprets the non-logical constants of LA exactly how you would expect. However, there are many other possible structures forLA. For instance, we might take as the domain the set Z of integers instead of N, and define the interpretations of 0, ′, +, ×, < accordingly. But we can also define structures for LA which have nothing even remotely to do with numbers. Example B.19. A structure M for the languageLZ of set theory requires just a set and a single-two place relation. So technically, e.g., the set of people plus the relation "x is older than y" could be used as a structure for LZ , as well as N together with n ≥ m for n,m ∈ N. A particularly interesting structure for LZ in which the elements of the domain are actually sets, and the interpretation of ∈ actually is the relation "x is an element of y" is the structure HF of hereditarily finite sets: 1. |HF | = ∅ ∪ ℘(∅) ∪ ℘(℘(∅)) ∪ ℘(℘(℘(∅))) ∪ . . . ; 2. ∈HF = {⟨x, y⟩ : x, y ∈ |HF | ,x ∈ y}. The stipulations we make as to what counts as a structure impact our logic. For example, the choice to prevent empty domains ensures, given the usual account of satisfaction (or truth) for quantified sentences, that ∃x (A(x)∨¬A(x)) is valid-that is, a logical truth. And the stipulation that all constant symbols must 212 APPENDIX B. FIRST-ORDER LOGIC refer to an object in the domain ensures that the existential generalization is a sound pattern of inference: A(a), therefore ∃x A(x). If we allowed names to refer outside the domain, or to not refer, then we would be on our way to a free logic, in which existential generalization requires an additional premise: A(a) and ∃x x = a, therefore ∃x A(x). B.6 Satisfaction of a Formula in a Structure The basic notion that relates expressions such as terms and formulas, on the one hand, and structures on the other, are those of value of a term and satisfaction of a formula. Informally, the value of a term is an element of a structure-if the term is just a constant, its value is the object assigned to the constant by the structure, and if it is built up using function symbols, the value is computed from the values of constants and the functions assigned to the functions in the term. A formula is satisfied in a structure if the interpretation given to the predicates makes the formula true in the domain of the structure. This notion of satisfaction is specified inductively: the specification of the structure directly states when atomic formulas are satisfied, and we define when a complex formula is satisfied depending on the main connective or quantifier and whether or not the immediate subformulas are satisfied. The case of the quantifiers here is a bit tricky, as the immediate subformula of a quantified formula has a free variable, and structures don't specify the values of variables. In order to deal with this difficulty, we also introduce variable assignments and define satisfaction not with respect to a structure alone, but with respect to a structure plus a variable assignment. 213 B.6. SATISFACTION OF A FORMULA IN A STRUCTURE Definition B.20 (Variable Assignment). A variable assignment s for a structure M is a function which maps each variable to an element of |M |, i.e., s : Var → |M |. A structure assigns a value to each constant symbol, and a variable assignment to each variable. But we want to use terms built up from them to also name elements of the domain. For this we define the value of terms inductively. For constant symbols and variables the value is just as the structure or the variable assignment specifies it; for more complex terms it is computed recursively using the functions the structure assigns to the function symbols. Definition B.21 (Value of Terms). If t is a term of the language L, M is a structure for L, and s is a variable assignment for M, the value ValMs (t ) is defined as follows: 1. t ≡ c : ValMs (t ) = c M. 2. t ≡ x : ValMs (t ) = s (x). 3. t ≡ f (t1, . . . , tn): ValMs (t ) = f M(ValMs (t1), . . . ,Val M s (tn)). Definition B.22 (x -Variant). If s is a variable assignment for a structure M, then any variable assignment s ′ for M which differs from s at most in what it assigns to x is called an x -variant of s . If s ′ is an x -variant of s we write s ∼x s ′. Note that an x -variant of an assignment s does not have to assign something different to x . In fact, every assignment counts as an x -variant of itself. 214 APPENDIX B. FIRST-ORDER LOGIC Definition B.23 (Satisfaction). Satisfaction of a formula A in a structure M relative to a variable assignment s , in symbols: M, s ⊨ A, is defined recursively as follows. (We write M, s ⊭ A to mean "not M, s ⊨ A.") 1. A ≡ ⊥: M, s ⊭ A. 2. A ≡ R(t1, . . . , tn): M, s ⊨ A iff ⟨ValMs (t1), . . . ,Val M s (tn)⟩ ∈ RM. 3. A ≡ t1 = t2: M, s ⊨ A iff ValMs (t1) = Val M s (t2). 4. A ≡ ¬B : M, s ⊨ A iff M, s ⊭ B . 5. A ≡ (B ∧C ): M, s ⊨ A iff M, s ⊨ B and M, s ⊨ C . 6. A ≡ (B ∨C ): M, s ⊨ A iff M, s ⊨ A or M, s ⊨ B (or both). 7. A ≡ (B →C ): M, s ⊨ A iff M, s ⊭ B or M, s ⊨ C (or both). 8. A ≡ ∀x B : M, s ⊨ A iff for every x -variant s ′ of s , M, s ′ ⊨ B . 9. A ≡ ∃x B : M, s ⊨ A iff there is an x -variant s ′ of s so that M, s ′ ⊨ B . The variable assignments are important in the last two clauses. We cannot define satisfaction of ∀x B(x) by "for all a ∈ |M |, M ⊨ B(a)." We cannot define satisfaction of ∃x B(x) by "for at least one a ∈ |M |, M ⊨ B(a)." The reason is that a is not symbol of the language, and so B(a) is not a formula (that is, B[a/x] is undefined). We also cannot assume that we have constant symbols or terms available that name every element of M, since there is nothing in the definition of structures that requires it. Even in the standard language the set of constant symbols is countably infinite, so if |M | is not countable there aren't even enough constant symbols to name every object. Example B.24. Let L = {a,b, f ,R} where a and b are constant symbols, f is a two-place function symbol, and R is a two-place 215 B.6. SATISFACTION OF A FORMULA IN A STRUCTURE predicate symbol. Consider the structure M defined by: 1. |M | = {1,2,3,4} 2. aM = 1 3. bM = 2 4. f M(x, y) = x + y if x + y ≤ 3 and = 3 otherwise. 5. RM = {⟨1,1⟩, ⟨1,2⟩, ⟨2,3⟩, ⟨2,4⟩} The function s (x) = 1 that assigns 1 ∈ |M | to every variable is a variable assignment for M. Then ValMs (f (a,b)) = f M(ValMs (a),Val M s (b)). Since a and b are constant symbols, ValMs (a) = a M = 1 and ValMs (b) = b M = 2. So ValMs (f (a,b)) = f M(1,2) = 1 + 2 = 3. To compute the value of f (f (a,b),a) we have to consider ValMs (f (f (a,b),a)) = f M(ValMs (f (a,b)),Val M s (a)) = f M(3,1) = 3, since 3 + 1 > 3. Since s (x) = 1 and ValMs (x) = s (x), we also have ValMs (f (f (a,b),x)) = f M(ValMs (f (a,b)),Val M s (x)) = f M(3,1) = 3, An atomic formula R(t1, t2) is satisfied if the tuple of values of its arguments, i.e., ⟨ValMs (t1),Val M s (t2)⟩, is an element of R M. So, e.g., we have M, s ⊨ R(b, f (a,b)) since ⟨ValM(b),ValM(f (a,b))⟩ = ⟨2,3⟩ ∈ RM, but M, s ⊭ R(x, f (a,b)) since ⟨1,3⟩ ∉ RM[s ]. To determine if a non-atomic formula A is satisfied, you apply the clauses in the inductive definition that applies to the main connective. For instance, the main connective in R(a,a)→ (R(b,x) ∨ R(x,b) is the →, and M, s ⊨ R(a,a) → (R(b,x) ∨R(x,b)) iff 216 APPENDIX B. FIRST-ORDER LOGIC M, s ⊭ R(a,a) or M, s ⊨ R(b,x) ∨R(x,b) Since M, s ⊨ R(a,a) (because ⟨1,1⟩ ∈ RM) we can't yet determine the answer and must first figure out if M, s ⊨ R(b,x) ∨R(x,b): M, s ⊨ R(b,x) ∨R(x,b) iff M, s ⊨ R(b,x) or M, s ⊨ R(x,b) And this is the case, since M, s ⊨ R(x,b) (because ⟨1,2⟩ ∈ RM). Recall that an x -variant of s is a variable assignment that differs from s at most in what it assigns to x . For every element of |M |, there is an x -variant of s : s1(x) = 1, s2(x) = 2, s3(x) = 3, s4(x) = 4, and with si (y) = s (y) = 1 for all variables y other than x . These are all the x -variants of s for the structure M, since |M | = {1,2,3,4}. Note, in particular, that s1 = s is also an x -variant of s , i.e., s is always an x -variant of itself. To determine if an existentially quantified formula ∃x A(x) is satisfied, we have to determine if M, s ′ ⊨ A(x) for at least one x -variant s ′ of s . So, M, s ⊨ ∃x (R(b,x) ∨R(x,b)), since M, s1 ⊨ R(b,x) ∨R(x,b) (s3 would also fit the bill). But, M, s ⊭ ∃x (R(b,x) ∧R(x,b)) since for none of the si , M, si ⊨ R(b,x) ∧R(x,b). To determine if a universally quantified formula ∀x A(x) is satisfied, we have to determine if M, s ′ ⊨ A(x) for all x -variants s ′ of s . So, M, s ⊨ ∀x (R(x,a) →R(a,x)), since M, si ⊨ R(x,a) → R(a,x) for all si (M, s1 ⊨ R(a,x) and M, s j ⊭ R(x,a) for j = 2, 3, and 4). But, M, s ⊭ ∀x (R(a,x) →R(x,a)) 217 B.7. VARIABLE ASSIGNMENTS since M, s2 ⊭ R(a,x) → R(x,a) (because M, s2 ⊨ R(a,x) and M, s2 ⊭ R(x,a)). For a more complicated case, consider ∀x (R(a,x) → ∃y R(x, y)). Since M, s3 ⊭ R(a,x) and M, s4 ⊭ R(a,x), the interesting cases where we have to worry about the consequent of the conditional are only s1 and s2. Does M, s1 ⊨ ∃y R(x, y) hold? It does if there is at least one y -variant s ′1 of s1 so that M, s ′ 1 ⊨ R(x, y). In fact, s1 is such a y -variant (s1(x) = 1, s1(y) = 1, and ⟨1,1⟩ ∈ RM), so the answer is yes. To determine if M, s2 ⊨ ∃y R(x, y) we have to look at the y -variants of s2. Here, s2 itself does not satisfy R(x, y) (s2(x) = 2, s2(y) = 1, and ⟨2,1⟩ ∉ RM). However, consider s ′2 ∼y s2 with s ′ 2(y) = 3. M, s ′ 2 ⊨ R(x, y) since ⟨2,3⟩ ∈ R M, and so M, s2 ⊨ ∃y R(x, y). In sum, for every x -variant si of s , either M, si ⊭ R(a,x) (i = 3, 4) or M, si ⊨ ∃y R(x, y) (i = 1, 2), and so M, s ⊨ ∀x (R(a,x) → ∃y R(x, y)). On the other hand, M, s ⊭ ∃x (R(a,x) ∧ ∀y R(x, y)). The only x -variants si of s with M, si ⊨ R(a,x) are s1 and s2. But for each, there is in turn a y -variant s ′i ∼y si with s ′ i (y) = 4 so that M, s ′i ⊭ R(x, y) and so M, si ⊭ ∀y R(x, y) for i = 1, 2. In sum, none of the x -variants si ∼x s are such that M, si ⊨ R(a,x) ∧ ∀y R(x, y). B.7 Variable Assignments A variable assignment s provides a value for every variable-and there are infinitely many of them. This is of course not necessary. We require variable assignments to assign values to all variables simply because it makes things a lot easier. The value of a term t , and whether or not a formula A is satisfied in a structure with respect to s , only depend on the assignments s makes to 218 APPENDIX B. FIRST-ORDER LOGIC the variables in t and the free variables of A. This is the content of the next two propositions. To make the idea of "depends on" precise, we show that any two variable assignments that agree on all the variables in t give the same value, and that A is satisfied relative to one iff it is satisfied relative to the other if two variable assignments agree on all free variables of A. Proposition B.25. If the variables in a term t are among x1, . . . , xn , and s1(xi ) = s2(xi ) for i = 1, . . . , n, then ValMs1 (t ) = Val M s2 (t ). Proof. By induction on the complexity of t . For the base case, t can be a constant symbol or one of the variables x1, . . . , xn . If t = c , then ValMs1 (t ) = c M = ValMs2 (t ). If t = xi , s1(xi ) = s2(xi ) by the hypothesis of the proposition, and so ValMs1 (t ) = s1(xi ) = s2(xi ) = ValMs2 (t ). For the inductive step, assume that t = f (t1, . . . , tk ) and that the claim holds for t1, . . . , tk . Then ValMs1 (t ) = Val M s1 (f (t1, . . . , tk )) = = f M(ValMs1 (t1), . . . ,Val M s1 (tk )) For j = 1, . . . , k , the variables of t j are among x1, . . . , xn . So by induction hypothesis, ValMs1 (t j ) = Val M s2 (t j ). So, ValMs1 (t ) = Val M s2 (f (t1, . . . , tk )) = = f M(ValMs1 (t1), . . . ,Val M s1 (tk )) = = f M(ValMs2 (t1), . . . ,Val M s2 (tk )) = = ValMs2 (f (t1, . . . , tk )) = Val M s2 (t ). □ Proposition B.26. If the free variables in A are among x1, . . . , xn , and s1(xi ) = s2(xi ) for i = 1, . . . , n, then M, s1 ⊨ A iff M, s2 ⊨ A. Proof. We use induction on the complexity of A. For the base case, where A is atomic, A can be: ⊥, R(t1, . . . , tk ) for a k -place predicate R and terms t1, . . . , tk , or t1 = t2 for terms t1 and t2. 219 B.7. VARIABLE ASSIGNMENTS 1. A ≡ ⊥: both M, s1 ⊭ A and M, s2 ⊭ A. 2. A ≡ R(t1, . . . , tk ): let M, s1 ⊨ A. Then ⟨ValMs1 (t1), . . . ,Val M s1 (tk )⟩ ∈ R M . For i = 1, . . . , k , ValMs1 (ti ) = Val M s2 (ti ) by Proposition B.25. So we also have ⟨ValMs2 (ti ), . . . ,Val M s2 (tk )⟩ ∈ R M. 3. A ≡ t1 = t2: suppose M, s1 ⊨ A. Then ValMs1 (t1) = Val M s1 (t2). So, ValMs2 (t1) = Val M s1 (t1) (by Proposition B.25) = ValMs1 (t2) (since M, s1 ⊨ t1 = t2) = ValMs2 (t2) (by Proposition B.25), so M, s2 ⊨ t1 = t2. Now assume M, s1 ⊨ B iff M, s2 ⊨ B for all formulas B less complex than A. The induction step proceeds by cases determined by the main operator of A. In each case, we only demonstrate the forward direction of the biconditional; the proof of the reverse direction is symmetrical. In all cases except those for the quantifiers, we apply the induction hypothesis to sub-formulas B of A. The free variables of B are among those of A. Thus, if s1 and s2 agree on the free variables of A, they also agree on those of B , and the induction hypothesis applies to B . 1. A ≡ ¬B : if M, s1 ⊨ A, then M, s1 ⊭ B , so by the induction hypothesis, M, s2 ⊭ B , hence M, s2 ⊨ A. 2. A ≡ B ∧C : exercise. 3. A ≡ B ∨ C : if M, s1 ⊨ A, then M, s1 ⊨ B or M, s1 ⊨ C . By induction hypothesis, M, s2 ⊨ B or M, s2 ⊨ C , so M, s2 ⊨ A. 4. A ≡ B →C : exercise. 220 APPENDIX B. FIRST-ORDER LOGIC 5. A ≡ ∃x B : if M, s1 ⊨ A, there is an x -variant s ′1 of s1 so that M, s ′1 ⊨ B . Let s ′ 2 be the x -variant of s2 that assigns the same thing to x as does s ′1. The free variables of B are among x1, . . . , xn , and x . s ′1(xi ) = s ′ 2(xi ), since s ′ 1 and s ′ 2 are x -variants of s1 and s2, respectively, and by hypothesis s1(xi ) = s2(xi ). s ′1(x) = s ′ 2(x) by the way we have defined s ′ 2. Then the induction hypothesis applies to B and s ′1, s ′ 2, so M, s ′2 ⊨ B . Hence, there is an x -variant of s2 that satisfies B , and so M, s2 ⊨ A. 6. A ≡ ∀x B : exercise. By induction, we get that M, s1 ⊨ A iff M, s2 ⊨ A whenever the free variables in A are among x1, . . . , xn and s1(xi ) = s2(xi ) for i = 1, . . . , n. □ Sentences have no free variables, so any two variable assignments assign the same things to all the (zero) free variables of any sentence. The proposition just proved then means that whether or not a sentence is satisfied in a structure relative to a variable assignment is completely independent of the assignment. We'll record this fact. It justifies the definition of satisfaction of a sentence in a structure (without mentioning a variable assignment) that follows. Corollary B.27. If A is a sentence and s a variable assignment, then M, s ⊨ A iff M, s ′ ⊨ A for every variable assignment s ′. Proof. Let s ′ be any variable assignment. Since A is a sentence, it has no free variables, and so every variable assignment s ′ trivially assigns the same things to all free variables of A as does s . So the condition of Proposition B.26 is satisfied, and we have M, s ⊨ A iff M, s ′ ⊨ A. □ 221 B.8. EXTENSIONALITY Definition B.28. If A is a sentence, we say that a structure M satisfies A, M ⊨ A, iff M, s ⊨ A for all variable assignments s . If M ⊨ A, we also simply say that A is true in M. Proposition B.29. Let M be a structure, A be a sentence, and s a variable assignment. M ⊨ A iff M, s ⊨ A. Proof. Exercise. □ Proposition B.30. Suppose A(x) only contains x free, and M is a structure. Then: 1. M ⊨ ∃x A(x) iff M, s ⊨ A(x) for at least one variable assignment s . 2. M ⊨ ∀x A(x) iff M, s ⊨ A(x) for all variable assignments s . Proof. Exercise. □ B.8 Extensionality Extensionality, sometimes called relevance, can be expressed informally as follows: the only factors that bears upon the satisfaction of formula A in a structure M relative to a variable assignment s , are the size of the domain and the assignments made by M and s to the elements of the language that actually appear in A. One immediate consequence of extensionality is that where two structures M and M′ agree on all the elements of the language appearing in a sentence A and have the same domain, M and M′ must also agree on whether or not A itself is true. Proposition B.31 (Extensionality). Let A be a formula, and M1 and M2 be structures with |M1 | = |M2 |, and s a variable assignment on |M1 | = |M2 |. If cM1 = cM2 , RM1 = RM2 , and f M1 = f M2 for every 222 APPENDIX B. FIRST-ORDER LOGIC constant symbol c , relation symbol R, and function symbol f occurring in A, then M1, s ⊨ A iff M2, s ⊨ A. Proof. First prove (by induction on t) that for every term, ValM1s (t ) = Val M2 s (t ). Then prove the proposition by induction on A, making use of the claim just proved for the induction basis (where A is atomic). □ Corollary B.32 (Extensionality for Sentences). LetA be a sentence and M1, M2 as in Proposition B.31. Then M1 ⊨ A iff M2 ⊨ A. Proof. Follows from Proposition B.31 by Corollary B.27. □ Moreover, the value of a term, and whether or not a structure satisfies a formula, only depends on the values of its subterms. Proposition B.33. Let M be a structure, t and t ′ terms, and s a variable assignment. Let s ′ ∼x s be the x -variant of s given by s ′(x) = ValMs (t ′). Then ValMs (t [t ′/x]) = ValMs ′ (t ). Proof. By induction on t . 1. If t is a constant, say, t ≡ c , then t [t ′/x] = c , and ValMs (c ) = cM = ValMs ′ (c ). 2. If t is a variable other than x , say, t ≡ y , then t [t ′/x] = y , and ValMs (y) = Val M s ′ (y) since s ′ ∼x s . 3. If t ≡ x , then t [t ′/x] = t ′. But ValMs ′ (x) = Val M s (t ′) by definition of s ′. 4. If t ≡ f (t1, . . . , tn) then we have: ValMs (t [t ′/x]) = 223 B.9. SEMANTIC NOTIONS = ValMs (f (t1[t ′/x], . . . , tn[t ′/x])) by definition of t [t ′/x] = f M(ValMs (t1[t ′/x]), . . . ,ValMs (tn[t ′/x])) by definition of ValMs (f (. . . )) = f M(ValMs ′ (t1), . . . ,Val M s ′ (tn)) by induction hypothesis = ValMs ′ (t ) by definition of Val M s ′ (f (. . . )) □ Proposition B.34. LetM be a structure,A a formula, t a term, and s a variable assignment. Let s ′ ∼x s be the x -variant of s given by s ′(x) = ValMs (t ). Then M, s ⊨ A[t/x] iff M, s ′ ⊨ A. Proof. Exercise. □ B.9 Semantic Notions Give the definition of structures for first-order languages, we can define some basic semantic properties of and relationships between sentences. The simplest of these is the notion of validity of a sentence. A sentence is valid if it is satisfied in every structure. Valid sentences are those that are satisfied regardless of how the non-logical symbols in it are interpreted. Valid sentences are therefore also called logical truths-they are true, i.e., satisfied, in any structure and hence their truth depends only on the logical symbols occurring in them and their syntactic structure, but not on the non-logical symbols or their interpretation. Definition B.35 (Validity). A sentence A is valid, ⊨ A, iff M ⊨ A for every structure M. 224 APPENDIX B. FIRST-ORDER LOGIC Definition B.36 (Entailment). A set of sentences Γ entails a sentence A, Γ ⊨ A, iff for every structure M with M ⊨ Γ , M ⊨ A. Definition B.37 (Satisfiability). A set of sentences Γ is satisfiable if M ⊨ Γ for some structure M. If Γ is not satisfiable it is called unsatisfiable. Proposition B.38. A sentence A is valid iff Γ ⊨ A for every set of sentences Γ . Proof. For the forward direction, let A be valid, and let Γ be a set of sentences. Let M be a structure so that M ⊨ Γ . Since A is valid, M ⊨ A, hence Γ ⊨ A. For the contrapositive of the reverse direction, let A be invalid, so there is a structure M with M ⊭ A. When Γ = {⊤}, since ⊤ is valid, M ⊨ Γ . Hence, there is a structure M so that M ⊨ Γ but M ⊭ A, hence Γ does not entail A. □ Proposition B.39. Γ ⊨ A iff Γ ∪ {¬A} is unsatisfiable. Proof. For the forward direction, suppose Γ ⊨ A and suppose to the contrary that there is a structure M so that M ⊨ Γ ∪ {¬A}. Since M ⊨ Γ and Γ ⊨ A, M ⊨ A. Also, since M ⊨ Γ ∪ {¬A}, M ⊨ ¬A, so we have both M ⊨ A and M ⊭ A, a contradiction. Hence, there can be no such structure M, so Γ ∪ {A} is unsatisfiable. For the reverse direction, suppose Γ ∪ {¬A} is unsatisfiable. So for every structure M, either M ⊭ Γ or M ⊨ A. Hence, for every structure M with M ⊨ Γ , M ⊨ A, so Γ ⊨ A. □ 225 B.9. SEMANTIC NOTIONS Proposition B.40. If Γ ⊆ Γ ′ and Γ ⊨ A, then Γ ′ ⊨ A. Proof. Suppose that Γ ⊆ Γ ′ and Γ ⊨ A. Let M be such that M ⊨ Γ ′; then M ⊨ Γ , and since Γ ⊨ A, we get that M ⊨ A. Hence, whenever M ⊨ Γ ′, M ⊨ A, so Γ ′ ⊨ A. □ Theorem B.41 (Semantic Deduction Theorem). Γ∪{A} ⊨ B iff Γ ⊨ A→ B . Proof. For the forward direction, let Γ ∪ {A} ⊨ B and let M be a structure so that M ⊨ Γ . If M ⊨ A, then M ⊨ Γ ∪ {A}, so since Γ ∪ {A} entails B , we get M ⊨ B . Therefore, M ⊨ A → B , so Γ ⊨ A→ B . For the reverse direction, let Γ ⊨ A→B and M be a structure so that M ⊨ Γ ∪ {A}. Then M ⊨ Γ , so M ⊨ A→ B , and since M ⊨ A, M ⊨ B . Hence, whenever M ⊨ Γ ∪ {A}, M ⊨ B , so Γ ∪ {A} ⊨ B . □ Proposition B.42. Let M be a structure, and A(x) a formula with one free variable x , and t a closed term. Then: 1. A(t ) ⊨ ∃x A(x) 2. ∀x A(x) ⊨ A(t ) Proof. 1. Suppose M ⊨ A(t ). Let s be a variable assignment with s (x) = ValM(t ). Then M, s ⊨ A(t ) since A(t ) is a sentence. By Proposition B.34, M, s ⊨ A(x). By Proposition B.30, M ⊨ ∃x A(x). 2. Exercise. □ 226 APPENDIX B. FIRST-ORDER LOGIC B.10 Theories Definition B.43. A set of sentences Γ is closed iff, whenever Γ ⊨ A then A ∈ Γ . The closure of a set of sentences Γ is {A : Γ ⊨ A}. We say that Γ is axiomatized by a set of sentences ∆ if Γ is the closure of ∆ Example B.44. The theory of strict linear orders in the language L< is axiomatized by the set ∀x ¬x < x, ∀x ∀y ((x < y ∨ y < x) ∨ x = y), ∀x ∀y ∀z ((x < y ∧ y < z ) → x < z ) It completely captures the intended structures: every strict linear order is a model of this axiom system, and vice versa, if R is a linear order on a set X , then the structure M with |M | = X and <M = R is a model of this theory. Example B.45. The theory of groups in the language 1 (constant symbol), * (two-place function symbol) is axiomatized by ∀x (x * 1) = x ∀x ∀y ∀z (x * (y * z )) = ((x * y) * z ) ∀x ∃y (x * y) = 1 Example B.46. The theory of Peano arithmetic is axiomatized by the following sentences in the language of arithmetic LA. ¬∃x x ′ = 0 ∀x ∀y (x ′ = y ′ → x = y) ∀x ∀y (x < y ↔∃z (z ′ + x) = y) ∀x (x + 0) = x ∀x ∀y (x + y ′) = (x + y)′ ∀x (x × 0) = 0 227 B.10. THEORIES ∀x ∀y (x × y ′) = ((x × y) + x) plus all sentences of the form (A(0) ∧ ∀x (A(x) → A(x ′))) → ∀x A(x) Since there are infinitely many sentences of the latter form, this axiom system is infinite. The latter form is called the induction schema. (Actually, the induction schema is a bit more complicated than we let on here.) The third axiom is an explicit definition of <. Summary A first-order language consists of constant, function, and predicate symbols. Function and constant symbols take a specified number of arguments. In the language of arithmetic, e.g., we have a single constant symbol 0, one 1-place function symbol ′, two 2-place function symbols + and ×, and one 2-place predicate symbol <. From variables and constant and function symbols we form the terms of a language. From the terms of a language together with its predicate symbol, as well as the identity symbol =, we form the atomic formulas. And in turn from them, using the logical connectives ¬, ∨, ∧, →, ↔ and the quantifiers ∀ and ∃ we form its formulas. Since we are careful to always include necessary parentheses in the process of forming terms and formulas, there is always exactly one way of reading a formula. This makes it possible to define things by induction on the structure of formulas. Occurrences of variables in formulas are sometimes governed by a corresponding quantifier: if a variable occurs in the scope of a quantifier it is considered bound, otherwise free. These concepts all have inductive definitions, and we also inductively define the operation of substitution of a term for a variable in a formula. Formulas without free variable occurrences are called sentences. 228 APPENDIX B. FIRST-ORDER LOGIC The semantics for a first-order language is given by a structure for that language. It consists of a domain and elements of that domain are assigned to each constant symbol. Function symbols are interpreted by functions and relation symbols by relation on the domain. A function from the set of variables to the domain is a variable assignment. The relation of satisfaction relates structures, variable assignments and formulas; M, s ⊨ A is defined by induction on the structure of A. M, s ⊨ A only depends on the interpretation of the symbols actually occurring in A, and in particular does not depend on s if A contains no free variables. So if A is a sentence, M ⊨ A if M, s ⊨ A for any (or all) s . The satisfaction relation is the basis for all semantic notions. A sentence is valid, ⊨ A, if it is satisfied in every structure. A sentence A is entailed by set of sentences Γ , Γ ⊨ A, iff M ⊨ A for all M which satisfy every sentence in Γ . A set Γ is satisfiable iff there is some structure that satisfies every sentence in Γ , otherwise unsatisfiable. These notions are interrelated, e.g., Γ ⊨ A iff Γ ∪ {¬A} is unsatisfiable. Problems Problem B.1. Give an inductive definition of the bound variable occurrences along the lines of Definition B.8. Problem B.2. Let L = {c, f ,A} with one constant symbol, one one-place function symbol and one two-place predicate symbol, and let the structure M be given by 1. |M | = {1,2,3} 2. cM = 3 3. f M(1) = 2, f M(2) = 3, f M(3) = 2 4. AM = {⟨1,2⟩, ⟨2,3⟩, ⟨3,3⟩} 229 B.10. THEORIES (a) Let s (v ) = 1 for all variables v . Find out whether M, s ⊨ ∃x (A(f (z ), c ) → ∀y (A(y,x) ∨ A(f (y),x))) Explain why or why not. (b) Give a different structure and variable assignment in which the formula is not satisfied. Problem B.3. Complete the proof of Proposition B.26. Problem B.4. Prove Proposition B.29 Problem B.5. Prove Proposition B.30. Problem B.6. Suppose L is a language without function symbols. Given a structure M, c a constant symbol and a ∈ |M |, define M[a/c ] to be the structure that is just like M, except that cM[a/c ] = a. Define M | |= A for sentences A by: 1. A ≡ ⊥: not M | |= A. 2. A ≡ R(d1, . . . ,dn): M | |= A iff ⟨dM1 , . . . ,d M n ⟩ ∈ R M. 3. A ≡ d1 = d2: M | |= A iff dM1 = d M 2 . 4. A ≡ ¬B : M | |= A iff not M | |= B . 5. A ≡ (B ∧C ): M | |= A iff M | |= B and M | |= C . 6. A ≡ (B ∨C ): M | |= A iff M | |= B or M | |= C (or both). 7. A ≡ (B→C ): M | |= A iff not M | |= B or M | |= C (or both). 8. A ≡ ∀x B : M | |= A iff for all a ∈ |M |, M[a/c ] | |= B[c/x], if c does not occur in B . 9. A ≡ ∃x B : M | |= A iff there is an a ∈ |M | such that M[a/c ] | |= B[c/x], if c does not occur in B . 230 APPENDIX B. FIRST-ORDER LOGIC Let x1, . . . , xn be all free variables in A, c1, . . . , cn constant symbols not in A, a1, . . . , an ∈ |M |, and s (xi ) = ai . Show that M, s ⊨ A iff M[a1/c1, . . . ,an/cn] | |= A[c1/x1] . . . [cn/xn]. (This problem shows that it is possible to give a semantics for first-order logic that makes do without variable assignments.) Problem B.7. Suppose that f is a function symbol not in A(x, y). Show that there is a structure M such that M ⊨ ∀x ∃y A(x, y) iff there is an M′ such that M′ ⊨ ∀x A(x, f (x)). (This problem is a special case of what's known as Skolem's Theorem; ∀x A(x, f (x)) is called a Skolem normal form of ∀x ∃y A(x, y).) Problem B.8. Carry out the proof of Proposition B.31 in detail. Problem B.9. Prove Proposition B.34 Problem B.10. 1. Show that Γ ⊨ ⊥ iff Γ is unsatisfiable. 2. Show that Γ ∪ {A} ⊨ ⊥ iff Γ ⊨ ¬A. 3. Suppose c does not occur in A or Γ . Show that Γ ⊨ ∀x A iff Γ ⊨ A[c/x]. Problem B.11. Complete the proof of Proposition B.42. APPENDIX C Natural Deduction C.1 Natural Deduction Natural deduction is a derivation system intended to mirror actual reasoning (especially the kind of regimented reasoning employed by mathematicians). Actual reasoning proceeds by a number of "natural" patterns. For instance, proof by cases allows us to establish a conclusion on the basis of a disjunctive premise, by establishing that the conclusion follows from either of the disjuncts. Indirect proof allows us to establish a conclusion by showing that its negation leads to a contradiction. Conditional proof establishes a conditional claim "if . . . then . . . " by showing that the consequent follows from the antecedent. Natural deduction is a formalization of some of these natural inferences. Each of the logical connectives and quantifiers comes with two rules, an introduction and an elimination rule, and they each correspond to one such natural inference pattern. For instance, →Intro corresponds to conditional proof, and ∨Elim to proof by cases. A particularly simple rule is ∧Elim which allows the inference from A ∧ B to A (or B). One feature that distinguishes natural deduction from other derivation systems is its use of assumptions. A derivation in nat231 232 APPENDIX C. NATURAL DEDUCTION ural deduction is a tree of formulas. A single formula stands at the root of the tree of formulas, and the "leaves" of the tree are formulas from which the conclusion is derived. In natural deduction, some leaf formulas play a role inside the derivation but are "used up" by the time the derivation reaches the conclusion. This corresponds to the practice, in actual reasoning, of introducing hypotheses which only remain in effect for a short while. For instance, in a proof by cases, we assume the truth of each of the disjuncts; in conditional proof, we assume the truth of the antecedent; in indirect proof, we assume the truth of the negation of the conclusion. This way of introducing hypothetical assumptions and then doing away with them in the service of establishing an intermediate step is a hallmark of natural deduction. The formulas at the leaves of a natural deduction derivation are called assumptions, and some of the rules of inference may "discharge" them. For instance, if we have a derivation of B from some assumptions which include A, then the →Intro rule allows us to infer A→ B and discharge any assumption of the form A. (To keep track of which assumptions are discharged at which inferences, we label the inference and the assumptions it discharges with a number.) The assumptions that remain undischarged at the end of the derivation are together sufficient for the truth of the conclusion, and so a derivation establishes that its undischarged assumptions entail its conclusion. The relation Γ ⊢ A based on natural deduction holds iff there is a derivation in whichA is the last sentence in the tree, and every leaf which is undischarged is in Γ . A is a theorem in natural deduction iff there is a derivation in which A is the last sentence and all assumptions are discharged. For instance, here is a derivation that shows that ⊢ (A ∧ B) → A: [A ∧ B]1 ∧ElimA 1 →Intro (A ∧ B) → A The label 1 indicates that the assumption A ∧ B is discharged at the →Intro inference. 233 C.2. RULES AND DERIVATIONS A set Γ is inconsistent iff Γ ⊢ ⊥ in natural deduction. The rule ⊥I makes it so that from an inconsistent set, any sentence can be derived. Natural deduction systems were developed by Gerhard Gentzen and Stanisław Jaśkowski in the 1930s, and later developed by Dag Prawitz and Frederic Fitch. Because its inferences mirror natural methods of proof, it is favored by philosophers. The versions developed by Fitch are often used in introductory logic textbooks. In the philosophy of logic, the rules of natural deduction have sometimes been taken to give the meanings of the logical operators ("proof-theoretic semantics"). C.2 Rules and Derivations Natural deduction systems are meant to closely parallel the informal reasoning used in mathematical proof (hence it is somewhat "natural"). Natural deduction proofs begin with assumptions. Inference rules are then applied. Assumptions are "discharged" by the ¬Intro, →Intro, ∨Elim and ∃Elim inference rules, and the label of the discharged assumption is placed beside the inference for clarity. Definition C.1 (Assumption). An assumption is any sentence in the topmost position of any branch. Derivations in natural deduction are certain trees of sentences, where the topmost sentences are assumptions, and if a sentence stands below one, two, or three other sequents, it must follow correctly by a rule of inference. The sentences at the top of the inference are called the premises and the sentence below the conclusion of the inference. The rules come in pairs, an introduction and an elimination rule for each logical operator. They introduce a logical operator in the conclusion or remove a logical operator from a premise of the rule. Some of the rules allow an assumption of a certain type to be discharged. To indicate which assumption is discharged by which inference, we also 234 APPENDIX C. NATURAL DEDUCTION assign labels to both the assumption and the inference. This is indicated by writing the assumption as "[A]n ." It is customary to consider rules for all the logical operators ∧, ∨, →, ¬, and ⊥, even if some of those are consider as defined. C.3 Propositional Rules Rules for ∧ A B ∧IntroA ∧ B A ∧ B ∧ElimA A ∧ B ∧ElimB Rules for ∨ A ∨IntroA ∨ B B ∨IntroA ∨ B A ∨ B [A]n C [B]n Cn ∨ElimC Rules for → [A]n Bn →IntroA→ B A→ B A →ElimB Rules for ¬ 235 C.4. QUANTIFIER RULES [A]n ⊥n ¬Intro ¬A ¬A A ¬Elim⊥ Rules for ⊥ ⊥ ⊥IA [¬A]n ⊥n ⊥CA Note that ¬Intro and ⊥C are very similar: The difference is that ¬Intro derives a negated sentence ¬A but ⊥C a positive sentence A. Whenever a rule indicates that some assumption may be discharged, we take this to be a permission, but not a requirement. E.g., in the→Intro rule, we may discharge any number of assumptions of the form A in the derivation of the premise B , including zero. C.4 Quantifier Rules Rules for ∀ A(a) ∀Intro ∀x A(x) ∀x A(x) ∀ElimA(t ) In the rules for ∀, t is a ground term (a term that does not contain any variables), and a is a constant symbol which does not occur in the conclusion ∀x A(x), or in any assumption which is undischarged in the derivation ending with the premise A(a). We call a the eigenvariable of the ∀Intro inference. 236 APPENDIX C. NATURAL DEDUCTION Rules for ∃ A(t ) ∃Intro ∃x A(x) ∃x A(x) [A(a)]n C n ∃ElimC Again, t is a ground term, and a is a constant which does not occur in the premise ∃x A(x), in the conclusion C , or any assumption which is undischarged in the derivations ending with the two premises (other than the assumptions A(a)). We call a the eigenvariable of the ∃Elim inference. The condition that an eigenvariable neither occur in the premises nor in any assumption that is undischarged in the derivations leading to the premises for the ∀Intro or ∃Elim inference is called the eigenvariable condition. We use the term "eigenvariable" even though a in the above rules is a constant. This has historical reasons. In ∃Intro and ∀Elim there are no restrictions, and the term t can be anything, so we do not have to worry about any conditions. On the other hand, in the ∃Elim and ∀Intro rules, the eigenvariable condition requires that the constant symbol a does not occur anywhere in the conclusion or in an undischarged assumption. The condition is necessary to ensure that the system is sound, i.e., only derives sentences from undischarged assumptions from which they follow. Without this condition, the following would be allowed: ∃x A(x) [A(a)]1 *∀Intro ∀x A(x) ∃Elim ∀x A(x) However, ∃x A(x) ⊭ ∀x A(x). 237 C.5. DERIVATIONS C.5 Derivations We've said what an assumption is, and we've given the rules of inference. Derivations in natural deduction are inductively generated from these: each derivation either is an assumption on its own, or consists of one, two, or three derivations followed by a correct inference. Definition C.2 (Derivation). A derivation of a sentence A from assumptions Γ is a tree of sentences satisfying the following conditions: 1. The topmost sentences of the tree are either in Γ or are discharged by an inference in the tree. 2. The bottommost sentence of the tree is A. 3. Every sentence in the tree except the sentence A at the bottom is a premise of a correct application of an inference rule whose conclusion stands directly below that sentence in the tree. We then say that A is the conclusion of the derivation and that A is derivable from Γ . Example C.3. Every assumption on its own is a derivation. So, e.g., C by itself is a derivation, and so is D by itself. We can obtain a new derivation from these by applying, say, the ∧Intro rule, A B ∧IntroA ∧ B These rules are meant to be general: we can replace the A and B in it with any sentences, e.g., by C and D . Then the conclusion would be C ∧D , and so C D ∧IntroC ∧D 238 APPENDIX C. NATURAL DEDUCTION is a correct derivation. Of course, we can also switch the assumptions, so that D plays the role of A and C that of B . Thus, D C ∧IntroD ∧C is also a correct derivation. We can now apply another rule, say, →Intro, which allows us to conclude a conditional and allows us to discharge any assumption that is identical to the antecedent of that conditional. So both of the following would be correct derivations: [C ]1 D ∧IntroC ∧D 1 →Intro C → (C ∧D) C [D]1 ∧IntroC ∧D 1 →Intro D → (C ∧D) Remember that discharging of assumptions is a permission, not a requirement: we don't have to discharge the assumptions. In particular, we can apply a rule even if the assumptions are not present in the derivation. For instance, the following is legal, even though there is no assumption A to be discharged: B 1 →IntroA→ B C.6 Examples of Derivations Example C.4. Let's give a derivation of the sentence (A∧B)→A. We begin by writing the desired conclusion at the bottom of the derivation. (A ∧ B) → A Next, we need to figure out what kind of inference could result in a sentence of this form. The main operator of the conclusion is →, so we'll try to arrive at the conclusion using the →Intro rule. It is best to write down the assumptions involved and label the inference rules as you progress, so it is easy to see whether all assumptions have been discharged at the end of the proof. 239 C.6. EXAMPLES OF DERIVATIONS [A ∧ B]1 A 1 →Intro (A ∧ B) → A We now need to fill in the steps from the assumption A ∧ B to A. Since we only have one connective to deal with, ∧, we must use the ∧ elim rule. This gives us the following proof: [A ∧ B]1 ∧ElimA 1 →Intro (A ∧ B) → A We now have a correct derivation of (A ∧ B) → A. Example C.5. Now let's give a derivation of (¬A∨B)→(A→B). We begin by writing the desired conclusion at the bottom of the derivation. (¬A ∨ B) → (A→ B) To find a logical rule that could give us this conclusion, we look at the logical connectives in the conclusion: ¬, ∨, and →. We only care at the moment about the first occurence of → because it is the main operator of the sentence in the end-sequent, while ¬, ∨ and the second occurence of → are inside the scope of another connective, so we will take care of those later. We therefore start with the →Intro rule. A correct application must look like this: [¬A ∨ B]1 A→ B 1 →Intro (¬A ∨ B) → (A→ B) This leaves us with two possibilities to continue. Either we can keep working from the bottom up and look for another application of the →Intro rule, or we can work from the top down and 240 APPENDIX C. NATURAL DEDUCTION apply a ∨Elim rule. Let us apply the latter. We will use the assumption ¬A ∨ B as the leftmost premise of ∨Elim. For a valid application of ∨Elim, the other two premises must be identical to the conclusion A→ B , but each may be derived in turn from another assumption, namely the two disjuncts of ¬A ∨B . So our derivation will look like this: [¬A ∨ B]1 [¬A]2 A→ B [B]2 A→ B 2 ∨ElimA→ B 1 →Intro (¬A ∨ B) → (A→ B) In each of the two branches on the right, we want to derive A→ B , which is best done using →Intro. [¬A ∨ B]1 [¬A]2, [A]3 B 3 →IntroA→ B [B]2, [A]4 B 4 →IntroA→ B 2 ∨ElimA→ B 1 →Intro (¬A ∨ B) → (A→ B) For the two missing parts of the derivation, we need derivations of B from ¬A and A in the middle, and from A and B on the left. Let's take the former first. ¬A and A are the two premises of ¬Elim: [¬A]2 [A]3 ¬Elim⊥ B By using ⊥I , we can obtain B as a conclusion and complete the branch. 241 C.6. EXAMPLES OF DERIVATIONS [¬A ∨ B]1 [¬A]2 [A]3 ⊥Intro⊥ ⊥IB 3 →IntroA→ B [B]2, [A]4 B 4 →IntroA→ B 2 ∨ElimA→ B 1 →Intro (¬A ∨ B) → (A→ B) Let's now look at the rightmost branch. Here it's important to realize that the definition of derivation allows assumptions to be discharged but does not require them to be. In other words, if we can derive B from one of the assumptions A and B without using the other, that's ok. And to derive B from B is trivial: B by itself is such a derivation, and no inferences are needed. So we can simply delete the assumption A. [¬A ∨ B]1 [¬A]2 [A]3 ¬Elim⊥ ⊥IB 3 →IntroA→ B [B]2 →IntroA→ B 2 ∨ElimA→ B 1 →Intro (¬A ∨ B) → (A→ B) Note that in the finished derivation, the rightmost →Intro inference does not actually discharge any assumptions. Example C.6. So far we have not needed the ⊥C rule. It is special in that it allows us to discharge an assumption that isn't a sub-formula of the conclusion of the rule. It is closely related to the ⊥I rule. In fact, the ⊥I rule is a special case of the ⊥C rule- there is a logic called "intuitionistic logic" in which only ⊥I is allowed. The ⊥C rule is a last resort when nothing else works. For instance, suppose we want to derive A ∨¬A. Our usual strategy would be to attempt to derive A ∨ ¬A using ∨Intro. But this would require us to derive either A or ¬A from no assumptions, and this can't be done. ⊥C to the rescue! 242 APPENDIX C. NATURAL DEDUCTION [¬(A ∨ ¬A)]1 ⊥ 1 ⊥CA ∨ ¬A Now we're looking for a derivation of ⊥ from ¬(A ∨ ¬A). Since ⊥ is the conclusion of ¬Elim we might try that: [¬(A ∨ ¬A)]1 ¬A [¬(A ∨ ¬A)]1 A ¬Elim⊥ 1 ⊥CA ∨ ¬A Our strategy for finding a derivation of ¬A calls for an application of ¬Intro: [¬(A ∨ ¬A)]1, [A]2 ⊥ 2 ¬Intro ¬A [¬(A ∨ ¬A)]1 A ¬Elim⊥ 1 ⊥CA ∨ ¬A Here, we can get ⊥ easily by applying ¬Elim to the assumption ¬(A∨¬A) and A∨¬A which follows from our new assumption A by ∨Intro: [¬(A ∨ ¬A)]1 [A]2 ∨IntroA ∨ ¬A ¬Elim⊥ 2 ¬Intro ¬A [¬(A ∨ ¬A)]1 A ¬Elim⊥ 1 ⊥CA ∨ ¬A On the right side we use the same strategy, except we get A by ⊥C : 243 C.7. DERIVATIONS WITH QUANTIFIERS [¬(A ∨ ¬A)]1 [A]2 ∨IntroA ∨ ¬A ¬Elim⊥ 2 ¬Intro ¬A [¬(A ∨ ¬A)]1 [¬A]3 ∨IntroA ∨ ¬A ¬Elim⊥ 3 ⊥CA ¬Elim⊥ 1 ⊥CA ∨ ¬A C.7 Derivations with Quantifiers Example C.7. When dealing with quantifiers, we have to make sure not to violate the eigenvariable condition, and sometimes this requires us to play around with the order of carrying out certain inferences. In general, it helps to try and take care of rules subject to the eigenvariable condition first (they will be lower down in the finished proof). Let's see how we'd give a derivation of the formula ∃x ¬A(x)→ ¬∀x A(x). Starting as usual, we write ∃x ¬A(x) → ¬∀x A(x) We start by writing down what it would take to justify that last step using the →Intro rule. [∃x ¬A(x)]1 ¬∀x A(x) 1 →Intro ∃x ¬A(x) → ¬∀x A(x) Since there is no obvious rule to apply to ¬∀x A(x), we will proceed by setting up the derivation so we can use the ∃Elim rule. Here we must pay attention to the eigenvariable condition, and choose a constant that does not appear in ∃x A(x) or any assumptions that it depends on. (Since no constant symbols appear, however, any choice will do fine.) 244 APPENDIX C. NATURAL DEDUCTION [∃x ¬A(x)]1 [¬A(a)]2 ¬∀x A(x) 2 ∃Elim ¬∀x A(x) 1 →Intro ∃x ¬A(x) → ¬∀x A(x) In order to derive ¬∀x A(x), we will attempt to use the ¬Intro rule: this requires that we derive a contradiction, possibly using ∀x A(x) as an additional assumption. Of course, this contradiction may involve the assumption ¬A(a) which will be discharged by the →Intro inference. We can set it up as follows: [∃x ¬A(x)]1 [¬A(a)]2, [∀x A(x)]3 ⊥ 3 ¬Intro ¬∀x A(x) 2 ∃Elim ¬∀x A(x) 1 →Intro ∃x ¬A(x) → ¬∀x A(x) It looks like we are close to getting a contradiction. The easiest rule to apply is the ∀Elim, which has no eigenvariable conditions. Since we can use any term we want to replace the universally quantified x , it makes the most sense to continue using a so we can reach a contradiction. [∃x ¬A(x)]1 [¬A(a)]2 [∀x A(x)]3 ∀ElimA(a) ¬Elim⊥ 3 ¬Intro ¬∀x A(x) 2 ∃Elim ¬∀x A(x) 1 →Intro ∃x ¬A(x) → ¬∀x A(x) It is important, especially when dealing with quantifiers, to double check at this point that the eigenvariable condition has not been violated. Since the only rule we applied that is subject to the eigenvariable condition was ∃Elim, and the eigenvariable a 245 C.7. DERIVATIONS WITH QUANTIFIERS does not occur in any assumptions it depends on, this is a correct derivation. Example C.8. Sometimes we may derive a formula from other formulas. In these cases, we may have undischarged assumptions. It is important to keep track of our assumptions as well as the end goal. Let's see how we'd give a derivation of the formula ∃x C (x,b) from the assumptions ∃x (A(x) ∧ B(x)) and ∀x (B(x) → C (x,b)). Starting as usual, we write the conclusion at the bottom. ∃x C (x,b) We have two premises to work with. To use the first, i.e., try to find a derivation of ∃x C (x,b) from ∃x (A(x) ∧ B(x)) we would use the ∃Elim rule. Since it has an eigenvariable condition, we will apply that rule first. We get the following: ∃x (A(x) ∧ B(x)) [A(a) ∧ B(a)]1 ∃x C (x,b) 1 ∃Elim ∃x C (x,b) The two assumptions we are working with share B . It may be useful at this point to apply ∧Elim to separate out B(a). ∃x (A(x) ∧ B(x)) [A(a) ∧ B(a)]1 ∧ElimB(a) ∃x C (x,b) 1 ∃Elim ∃x C (x,b) The second assumption we have to work with is ∀x (B(x) → C (x,b)). Since there is no eigenvariable condition we can instantiate x with the constant symbol a using ∀Elim to get B(a)→C (a,b). 246 APPENDIX C. NATURAL DEDUCTION We now have both B(a)→C (a,b) and B(a). Our next move should be a straightforward application of the →Elim rule. ∃x (A(x) ∧ B(x)) ∀x (B(x) →C (x,b)) ∀ElimB(a) →C (a,b) [A(a) ∧ B(a)]1 ∧ElimB(a) →ElimC (a,b) ∃x C (x,b) 1 ∃Elim ∃x C (x,b) We are so close! One application of ∃Intro and we have reached our goal. ∃x (A(x) ∧ B(x)) ∀x (B(x) →C (x,b)) ∀ElimB(a) →C (a,b) [A(a) ∧ B(a)]1 ∧ElimB(a) →ElimC (a,b) ∃Intro ∃x C (x,b) 1 ∃Elim ∃x C (x,b) Since we ensured at each step that the eigenvariable conditions were not violated, we can be confident that this is a correct derivation. Example C.9. Give a derivation of the formula ¬∀x A(x) from the assumptions ∀x A(x) → ∃y B(y) and ¬∃y B(y). Starting as usual, we write the target formula at the bottom. ¬∀x A(x) The last line of the derivation is a negation, so let's try using ¬Intro. This will require that we figure out how to derive a contradiction. [∀x A(x)]1 ⊥ 1 ¬Intro ¬∀x A(x) 247 C.8. DERIVATIONS WITH IDENTITY PREDICATE So far so good. We can use ∀Elim but it's not obvious if that will help us get to our goal. Instead, let's use one of our assumptions. ∀x A(x) → ∃y B(y) together with ∀x A(x) will allow us to use the →Elim rule. ∀x A(x) → ∃y B(y) [∀x A(x)]1 →Elim ∃y B(y) ⊥ 1 ¬Intro ¬∀x A(x) We now have one final assumption to work with, and it looks like this will help us reach a contradiction by using ¬Elim. ¬∃y B(y) ∀x A(x) → ∃y B(y) [∀x A(x)]1 →Elim ∃y B(y) ¬Elim⊥ 1 ¬Intro ¬∀x A(x) C.8 Derivations with Identity predicate Derivations with identity predicate require additional inference rules. =Introt = t t1 = t2 A(t1) =ElimA(t2) t1 = t2 A(t2) =ElimA(t1) In the above rules, t , t1, and t2 are closed terms. The =Intro rule allows us to derive any identity statement of the form t = t outright, from no assumptions. Example C.10. If s and t are closed terms, then A(s ), s = t ⊢ A(t ): 248 APPENDIX C. NATURAL DEDUCTION s = t A(s ) =ElimA(t ) This may be familiar as the "principle of substitutability of identicals," or Leibniz' Law. Example C.11. We derive the sentence ∀x ∀y ((A(x) ∧ A(y)) → x = y) from the sentence ∃x ∀y (A(y) → y = x) We develop the derivation backwards: ∃x ∀y (A(y) → y = x) [A(a) ∧ A(b)]1 a = b 1 →Intro ((A(a) ∧ A(b)) → a = b) ∀Intro ∀y ((A(a) ∧ A(y)) → a = y) ∀Intro ∀x ∀y ((A(x) ∧ A(y)) → x = y) We'll now have to use the main assumption: since it is an existential formula, we use ∃Elim to derive the intermediary conclusion a = b . ∃x ∀y (A(y) → y = x) [∀y (A(y) → y = c )]2 [A(a) ∧ A(b)]1 a = b 2 ∃Elima = b 1 →Intro ((A(a) ∧ A(b)) → a = b) ∀Intro ∀y ((A(a) ∧ A(y)) → a = y) ∀Intro ∀x ∀y ((A(x) ∧ A(y)) → x = y) 249 C.9. PROOF-THEORETIC NOTIONS The sub-derivation on the top right is completed by using its assumptions to show that a = c and b = c . This requies two separate derivations. The derivation for a = c is as follows: [∀y (A(y) → y = c )]2 ∀ElimA(a) → a = c [A(a) ∧ A(b)]1 ∧ElimA(a) →Elima = c From a = c and b = c we derive a = b by =Elim. C.9 Proof-Theoretic Notions Just as we've defined a number of important semantic notions (validity, entailment, satisfiabilty), we now define corresponding proof-theoretic notions. These are not defined by appeal to satisfaction of sentences in structures, but by appeal to the derivability or non-derivability of certain sentences from others. It was an important discovery that these notions coincide. That they do is the content of the soundness and completeness theorems. Definition C.12 (Theorems). A sentence A is a theorem if there is a derivation of A in natural deduction in which all assumptions are discharged. We write ⊢ A if A is a theorem and ⊬ A if it is not. Definition C.13 (Derivability). A sentence A is derivable from a set of sentences Γ , Γ ⊢ A, if there is a derivation with conclusion A and in which every assumption is either discharged or is in Γ . If A is not derivable from Γ we write Γ ⊬ A. Definition C.14 (Consistency). A set of sentences Γ is inconsistent iff Γ ⊢ ⊥. If Γ is not inconsistent, i.e., if Γ ⊬ ⊥, we say it is consistent. 250 APPENDIX C. NATURAL DEDUCTION Proposition C.15 (Reflexivity). If A ∈ Γ , then Γ ⊢ A. Proof. The assumption A by itself is a derivation of A where every undischarged assumption (i.e., A) is in Γ . □ Proposition C.16 (Monotony). If Γ ⊆ ∆ and Γ ⊢ A, then ∆ ⊢ A. Proof. Any derivation of A from Γ is also a derivation of A from ∆. □ Proposition C.17 (Transitivity). If Γ ⊢ A and {A}∪∆ ⊢ B , then Γ ∪ ∆ ⊢ B . Proof. If Γ ⊢ A, there is a derivation δ0 of A with all undischarged assumptions in Γ . If {A} ∪ ∆ ⊢ B , then there is a derivation δ1 of B with all undischarged assumptions in {A}∪∆. Now consider: ∆, [A]1 δ1 B 1 →IntroA→ B Γ δ0 A →ElimB The undischarged assumptions are now all among Γ ∪ ∆, so this shows Γ ∪ ∆ ⊢ B . □ When Γ = {A1,A2, . . . ,Ak } is a finite set we may use the simplified notation A1,A2, . . . ,Ak ⊢ B for Γ ⊢ B , in particular A ⊢ B means that {A} ⊢ B . Note that if Γ ⊢ A and A ⊢ B , then Γ ⊢ B . It follows also that if A1, . . . ,An ⊢ B and Γ ⊢ Ai for each i , then Γ ⊢ B . Proposition C.18. Γ is inconsistent iff Γ ⊢ A for every sentence A. Proof. Exercise. □ 251 C.9. PROOF-THEORETIC NOTIONS Proposition C.19 (Compactness). 1. If Γ ⊢ A then there is a finite subset Γ0 ⊆ Γ such that Γ0 ⊢ A. 2. If every finite subset of Γ is consistent, then Γ is consistent. Proof. 1. If Γ ⊢ A, then there is a derivation δ of A from Γ . Let Γ0 be the set of undischarged assumptions of δ. Since any derivation is finite, Γ0 can only contain finitely many sentences. So, δ is a derivation of A from a finite Γ0 ⊆ Γ . 2. This is the contrapositive of (1) for the special case A ≡ ⊥. □ Summary Proof systems provide purely syntactic methods for characterizing consequence and compatibility between sentences. Natural deduction is one such proof system. A derivation in it consists of a tree of formulas. The topmost formula a derivation are assumptions. All other formulas, for the derivation to be correct, must be correctly justified by one of a number of inference rules. These come in pairs; an introduction and an elimination rule for each connective and quantifier. For instance, if a formula A is justified by a →Elim rule, the preceding formulas (the premises) must be B → A and B (for some B). Some inference rules also allow assumptions to be discharged. For instance, if A→ B is inferred from B using →Intro, any occurrences of A as assumptions in the derivation leading to the premise B may be discharged, given a label that is also recorded at the inference. If there is a derivation with end formula A and all assumptions are discharged, we say A is a theorem and write ⊢ A. If all undischarged assumptions are in some set Γ , we say A is derivable from Γ and write Γ ⊢ A. If Γ ⊢ ⊥ we say Γ is inconsistent, otherwise consistent. These notions are interrelated, e.g., Γ ⊢ A iff Γ ∪ {¬A} ⊢ ⊥. They are also related to the corresponding 252 APPENDIX C. NATURAL DEDUCTION semantic notions, e.g., if Γ ⊢ A then Γ ⊨ A. This property of natural deduction-what can be derived from Γ is guaranteed to be entailed by Γ-is called soundness. The soundness theorem is proved by induction on the length of derivations, showing that each individual inference preserves entailment of its conclusion from open assumptions provided its premises are entailed by their open assumptions. Problems Problem C.1. Give derivations of the following: 1. ¬(A→ B) → (A ∧ ¬B) 2. (A→C ) ∨ (B →C ) from the assumption (A ∧ B) →C Problem C.2. Give derivations of the following: 1. ∃y A(y) → B from the assumption ∀x (A(x) → B) 2. ∃x (A(x) → ∀y A(y)) Problem C.3. Prove that = is both symmetric and transitive, i.e., give derivations of ∀x ∀y (x = y→y = x) and ∀x ∀y ∀z ((x = y ∧y = z ) → x = z ) Problem C.4. Give derivations of the following formulas: 1. ∀x ∀y ((x = y ∧ A(x)) → A(y)) 2. ∃x A(x)∧∀y ∀z ((A(y)∧A(z ))→y = z )→∃x (A(x)∧∀y (A(y)→ y = x)) Problem C.5. Prove Proposition C.18 APPENDIX D Biographies D.1 Alonzo Church Fig. D.1: Alonzo Church Alonzo Church was born in Washington, DC on June 14, 1903. In early childhood, an air gun incident left Church blind in one eye. He finished preparatory school in Connecticut in 1920 and began his university education at Princeton that same year. He completed his doctoral studies in 1927. After a couple years abroad, Church returned to Princeton. Church was known exceedingly polite and careful. His blackboard writing was immaculate, and he would preserve important papers by carefully covering them in Duco cement (a clear glue). Outside of his academic pursuits, he enjoyed reading science fiction magazines and was not afraid to write to the editors if he spotted any inaccuracies in the writing. Church's academic achievements were great. Together with his students Stephen Kleene and Barkley Rosser, he developed 253 254 APPENDIX D. BIOGRAPHIES a theory of effective calculability, the lambda calculus, independently of Alan Turing's development of the Turing machine. The two definitions of computability are equivalent, and give rise to what is now known as the Church-Turing Thesis, that a function of the natural numbers is effectively computable if and only if it is computable via Turing machine (or lambda calculus). He also proved what is now known as Church's Theorem: The decision problem for the validity of first-order formulas is unsolvable. Church continued his work into old age. In 1967 he left Princeton for UCLA, where he was professor until his retirement in 1990. Church passed away on August 1, 1995 at the age of 92. Further Reading For a brief biography of Church, see Enderton (2019). Church's original writings on the lambda calculus and the Entscheidungsproblem (Church's Thesis) are Church (1936a,b). Aspray (1984) records an interview with Church about the Princeton mathematics community in the 1930s. Church wrote a series of book reviews of the Journal of Symbolic Logic from 1936 until 1979. They are all archived on John MacFarlane's website (MacFarlane, 2015). D.2 Kurt Gödel Kurt Gödel (ger-dle) was born on April 28, 1906 in Brünn in the Austro-Hungarian empire (now Brno in the Czech Republic). Due to his inquisitive and bright nature, young Kurtele was often called "Der kleine Herr Warum" (Little Mr. Why) by his family. He excelled in academics from primary school onward, where he got less than the highest grade only in mathematics. Gödel was often absent from school due to poor health and was exempt from physical education. He was diagnosed with rheumatic fever during his childhood. Throughout his life, he believed this permanently affected his heart despite medical assessment saying otherwise. 255 D.2. KURT GÖDEL Gödel began studying at the University of Vienna in 1924 and completed his doctoral studies in 1929. He first intended to study physics, but his interests soon moved to mathematics and especially logic, in part due to the influence of the philosopher Rudolf Carnap. His dissertation, written under the supervision of Hans Hahn, proved the completeness theorem of first-order predicate logic with identity (Gödel, 1929). Only a year later, he obtained his most famous results-the first and second incompleteness theorems (published in Gödel 1931). During his time in Vienna, Gödel was heavily involved with the Vienna Circle, a group of scientifically-minded philosophers that included Carnap, whose work was especially influenced by Gödel's results. Fig. D.2: Kurt Gödel In 1938, Gödel married Adele Nimbursky. His parents were not pleased: not only was she six years older than him and already divorced, but she worked as a dancer in a nightclub. Social pressures did not affect Gödel, however, and they remained happily married until his death. After Nazi Germany annexed Austria in 1938, Gödel and Adele emigrated to the United States, where he took up a position at the Institute for Advanced Study in Princeton, New Jersey. Despite his introversion and eccentric nature, Gödel's time at Princeton was collaborative and fruitful. He published essays in set theory, philosophy and physics. Notably, he struck up a particularly strong friendship with his colleague at the IAS, Albert Einstein. In his later years, Gödel's mental health deteriorated. His wife's hospitalization in 1977 meant she was no longer able to 256 APPENDIX D. BIOGRAPHIES cook his meals for him. Having suffered frommental health issues throughout his life, he succumbed to paranoia. Deathly afraid of being poisoned, Gödel refused to eat. He died of starvation on January 14, 1978, in Princeton. Further Reading For a complete biography of Gödel's life is available, see John Dawson (1997). For further biographical pieces, as well as essays about Gödel's contributions to logic and philosophy, see Wang (1990), Baaz et al. (2011), Takeuti et al. (2003), and Sigmund et al. (2007). Gödel's PhD thesis is available in the original German (Gödel, 1929). The original text of the incompleteness theorems is (Gödel, 1931). All of Gödel's published and unpublished writings, as well as a selection of correspondence, are available in English in his Collected Papers Feferman et al. (1986, 1990). For a detailed treatment of Gödel's incompleteness theorems, see Smith (2013). For an informal, philosophical discussion of Gödel's theorems, see Mark Linsenmayer's podcast (Linsenmayer, 2014). D.3 Rózsa Péter Rózsa Péter was born Rósza Politzer, in Budapest, Hungary, on February 17, 1905. She is best known for her work on recursive functions, which was essential for the creation of the field of recursion theory. Péter was raised during harsh political times-WWI raged when she was a teenager-but was able to attend the affluent Maria Terezia Girls' School in Budapest, from where she graduated in 1922. She then studied at Pázmány Péter University (later renamed Loránd Eötvös University) in Budapest. She began studying chemistry at the insistence of her father, but later switched to mathematics, and graduated in 1927. Although she had the credentials to teach high school mathematics, the economic situation at the time was dire as the Great Depression af257 D.3. RÓZSA PÉTER fected the world economy. During this time, Péter took odd jobs as a tutor and private teacher of mathematics. She eventually returned to university to take up graduate studies in mathematics. She had originally planned to work in number theory, but after finding out that her results had already been proven, she almost gave up on mathematics altogether. She was encouraged to work on Gödel's incompleteness theorems, and unknowingly proved several of his results in different ways. This restored her confidence, and Péter went on to write her first papers on recursion theory, inspired by David Hilbert's foundational program. She received her PhD in 1935, and in 1937 she became an editor for the Journal of Symbolic Logic. Fig. D.3: Rózsa Péter Péter's early papers are widely credited as founding contributions to the field of recursive function theory. In Péter (1935a), she investigated the relationship between different kinds of recursion. In Péter (1935b), she showed that a certain recursively defined function is not primitive recursive. This simplified an earlier result due to Wilhelm Ackermann. Péter's simplified function is what's now often called the Ackermann function-and sometimes, more properly, the Ackermann-Péter function. She wrote the first book on recursive function theory (Péter, 1951). Despite the importance and influence of her work, Péter did not obtain a full-time teaching position until 1945. During the Nazi occupation of Hungary during World War II, Péter was not allowed to teach due to anti-Semitic laws. In 1944 the government created a Jewish ghetto in Budapest; the ghetto was cut off from the rest of the city and attended by armed guards. Péter was 258 APPENDIX D. BIOGRAPHIES forced to live in the ghetto until 1945 when it was liberated. She then went on to teach at the Budapest Teachers Training College, and from 1955 onward at Eötvös Loránd University. She was the first female Hungarian mathematician to become an Academic Doctor of Mathematics, and the first woman to be elected to the Hungarian Academy of Sciences. Péter was known as a passionate teacher of mathematics, who preferred to explore the nature and beauty of mathematical problems with her students rather than to merely lecture. As a result, she was affectionately called "Aunt Rosa" by her students. Péter died in 1977 at the age of 71. Further Reading For more biographical reading, see (O'Connor and Robertson, 2014) and (Andrásfai, 1986). Tamassy (1994) conducted a brief interview with Péter. For a fun read about mathematics, see Péter's book Playing With Infinity (Péter, 2010). D.4 Julia Robinson Julia Bowman Robinson was an American mathematician. She is known mainly for her work on decision problems, and most famously for her contributions to the solution of Hilbert's tenth problem. Robinson was born in St. Louis, Missouri on December 8, 1919. At a young age Robinson recalls being intrigued by numbers (Reid, 1986, 4). At age nine she contracted scarlet fever and suffered from several recurrent bouts of rheumatic fever. This forced her to spend much of her time in bed, putting her behind in her education. Although she was able to catch up with the help of private tutors, the physical effects of her illness had a lasting impact on her life. Despite her childhood struggles, Robinson graduated high school with several awards in mathematics and the sciences. She started her university career at San Diego State College, and transferred to the University of California, Berkeley as a se259 D.4. JULIA ROBINSON nior. There she was highly influenced by mathematician Raphael Robinson. They quickly became good friends, and married in 1941. As a spouse of a faculty member, Robinson was barred from teaching in the mathematics department at Berkeley. Although she continued to audit mathematics classes, she hoped to leave university and start a family. Not long after her wedding, however, Robinson contracted pneumonia. She was told that there was substantial scar tissue build up on her heart due to the rheumatic fever she suffered as a child. Due to the severity of the scar tissue, the doctor predicted that she would not live past forty and she was advised not to have children (Reid, 1986, 13). Fig. D.4: Julia Robinson Robinson was depressed for a long time, but eventually decided to continue studying mathematics. She returned to Berkeley and completed her PhD in 1948 under the supervision of Alfred Tarski. The first-order theory of the real numbers had been shown to be decidable by Tarski, and from Gödel's work it followed that the first-order theory of the natural numbers is undecidable. It was a major open problem whether the first-order theory of the rationals is decidable or not. In her thesis (1949), Robinson proved that it was not. Interested in decision problems, Robinson next attempted to find a solution Hilbert's tenth problem. This problem was one of a famous list of 23 mathematical problems posed by David Hilbert in 1900. The tenth problem asks whether there is an algorithm that will answer, in a finite amount of time, whether or not a polynomial equation with integer coefficients, such as 3x2−2y +3 = 0, 260 APPENDIX D. BIOGRAPHIES has a solution in the integers. Such questions are known as Diophantine problems. After some initial successes, Robinson joined forces with Martin Davis and Hilary Putnam, who were also working on the problem. They succeeded in showing that exponential Diophantine problems (where the unknowns may also appear as exponents) are undecidable, and showed that a certain conjecture (later called "J.R.") implies that Hilbert's tenth problem is undecidable (Davis et al., 1961). Robinson continued to work on the problem for the next decade. In 1970, the young Russian mathematician Yuri Matijasevich finally proved the J.R. hypothesis. The combined result is now called the Matijasevich-Robinson-DavisPutnam theorem, or MDRP theorem for short. Matijasevich and Robinson became friends and collaborated on several papers. In a letter to Matijasevich, Robinson once wrote that "actually I am very pleased that working together (thousands of miles apart) we are obviously making more progress than either one of us could alone" (Matijasevich, 1992, 45). Robinson was the first female president of the American Mathematical Society, and the first woman to be elected to the National Academy of Science. She died on July 30, 1985 at the age of 65 after being diagnosed with leukemia. Further Reading Robinson's mathematical papers are available in her Collected Works (Robinson, 1996), which also includes a reprint of her National Academy of Sciences biographical memoir (Feferman, 1994). Robinson's older sister Constance Reid published an "Autobiography of Julia," based on interviews (Reid, 1986), as well as a full memoir (Reid, 1996). A short documentary about Robinson and Hilbert's tenth problem was directed by George Csicsery (Csicsery, 2016). For a brief memoir about Yuri Matijasevich's collaborations with Robinson, and her influence on his work, see (Matijasevich, 1992). D.5 Alfred Tarski 261 D.5. ALFRED TARSKI Fig. D.5: Alfred Tarski Alfred Tarski was born on January 14, 1901 in Warsaw, Poland (then part of the Russian Empire). Described as "Napoleonic," Tarski was boisterous, talkative, and intense. His energy was often reflected in his lectures-he once set fire to a wastebasket while disposing of a cigarette during a lecture, and was forbidden from lecturing in that building again. Tarski had a thirst for knowledge from a young age. Although later in life he would tell students that he studied logic because it was the only class in which he got a B, his high school records show that he got A's across the board-even in logic. He studied at the University of Warsaw from 1918 to 1924. Tarski first intended to study biology, but became interested in mathematics, philosophy, and logic, as the university was the center of the Warsaw School of Logic and Philosophy. Tarski earned his doctorate in 1924 under the supervision of Stanisław Leśniewski. Before emigrating to the United States in 1939, Tarski completed some of his most important work while working as a secondary school teacher in Warsaw. His work on logical consequence and logical truth were written during this time. In 1939, Tarski was visiting the United States for a lecture tour. During his visit, Germany invaded Poland, and because of his Jewish heritage, Tarski could not return. His wife and children remained in Poland until the end of the war, but were then able to emigrate to the United States as well. Tarski taught at Harvard, the College of the City of New York, and the Institute for Advanced Study at Princeton, and finally the University of California, Berkeley. 262 APPENDIX D. BIOGRAPHIES There he founded the multidisciplinary program in Logic and the Methodology of Science. Tarski died on October 26, 1983 at the age of 82. Further Reading For more on Tarski's life, see the biography Alfred Tarski: Life and Logic (Feferman and Feferman, 2004). Tarski's seminal works on logical consequence and truth are available in English in (Corcoran, 1983). All of Tarski's original works have been collected into a four volume series, (Tarski, 1981). Photo Credits Alonzo Church, p. 253: Portrait of Alonzo Church, undated, photographer unknown. Alonzo Church Papers; 1924–1995, (C0948) Box 60, Folder 3. Manuscripts Division, Department of Rare Books and Special Collections, Princeton University Library. cO Princeton University. The Open Logic Project has obtained permission to use this image for inclusion in non-commercial OLPderived materials. Permission from Princeton University is required for any other use. Kurt Gödel, p. 255: Portrait of Kurt Gödel, ca. 1925, photographer unknown. From the ShelbyWhite and Leon Levy Archives Center, Institute for Advanced Study, Princeton, NJ, USA, on deposit at Princeton University Library, Manuscript Division, Department of Rare Books and Special Collections, Kurt Gödel Papers, (C0282), Box 14b, #110000. The Open Logic Project has obtained permission from the Institute's Archives Center to use this image for inclusion in non-commercial OLP-derived materials. Permission from the Archives Center is required for any other use. Rózsa Péter, p. 257: Portrait of Rózsa Péter, undated, photographer unknown. Courtesy of Béla Andrásfai. Julia Robinson, p. 259: Portrait of Julia Robinson, unknown photographer, courtesy of Neil D. Reid. The Open Logic Project has obtained permission to use this image for inclusion in noncommercial OLP-derived materials. Permission is required for any other use. 263 264 Photo Credits Alfred Tarski, p. 261: Passport photo of Alfred Tarski, 1939. Cropped and restored from a scan of Tarski's passport by Joel Fuller. Original courtesy of Bancroft Library, University of California, Berkeley. Alfred Tarski Papers, Banc MSS 84/49. The Open Logic Project has obtained permission to use this image for inclusion in non-commercial OLP-derived materials. Permission from Bancroft Library is required for any other use. Bibliography Andrásfai, Béla. 1986. Rózsa (Rosa) Péter. Periodica Polytechnica Electrical Engineering 30(2-3): 139–145. URL http://www.pp. bme.hu/ee/article/view/4651. Aspray, William. 1984. The Princeton mathematics community in the 1930s: Alonzo Church. URL http://www.princeton. edu/mudd/finding_aids/mathoral/pmc05.htm. Interview. Baaz, Matthias, Christos H. Papadimitriou, Hilary W. Putnam, Dana S. Scott, and Charles L. Harper Jr. 2011. Kurt Gödel and the Foundations of Mathematics: Horizons of Truth. Cambridge: Cambridge University Press. Church, Alonzo. 1936a. A note on the Entscheidungsproblem. Journal of Symbolic Logic 1: 40–41. Church, Alonzo. 1936b. An unsolvable problem of elementary number theory. American Journal of Mathematics 58: 345–363. Corcoran, John. 1983. Logic, Semantics, Metamathematics. Indianapolis: Hackett, 2nd ed. Csicsery, George. 2016. Zala films: Julia Robinson and Hilbert's tenth problem. URL http://www.zalafilms.com/films/ juliarobinson.html. 265 266 BIBLIOGRAPHY Davis, Martin, Hilary Putnam, and Julia Robinson. 1961. The decision problem for exponential Diophantine equations. Annals of Mathematics 74(3): 425–436. URL http://www.jstor. org/stable/1970289. Enderton, Herbert B. 2019. Alonzo Church: Life and Work. In The Collected Works of Alonzo Church, eds. Tyler Burge and Herbert B. Enderton. Cambridge, MA: MIT Press. Feferman, Anita and Solomon Feferman. 2004. Alfred Tarski: Life and Logic. Cambridge: Cambridge University Press. Feferman, Solomon. 1994. Julia Bowman Robinson 1919–1985. Biographical Memoirs of the National Academy of Sciences 63: 1–28. URL http://www.nasonline.org/publications/ biographical-memoirs/memoir-pdfs/robinson-julia. pdf. Feferman, Solomon, JohnW. Dawson Jr., Stephen C. Kleene, Gregory H. Moore, Robert M. Solovay, and Jean van Heijenoort. 1986. Kurt Gödel: Collected Works. Vol. 1: Publications 1929–1936. Oxford: Oxford University Press. Feferman, Solomon, JohnW. Dawson Jr., Stephen C. Kleene, Gregory H. Moore, Robert M. Solovay, and Jean van Heijenoort. 1990. Kurt Gödel: Collected Works. Vol. 2: Publications 1938–1974. Oxford: Oxford University Press. Gödel, Kurt. 1929. Über die Vollständigkeit des Logikkalküls [On the completeness of the calculus of logic]. Dissertation, Universität Wien. Reprinted and translated in Feferman et al. (1986), pp. 60–101. Gödel, Kurt. 1931. über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I [On formally undecidable propositions of Principia Mathematica and related systems I]. Monatshefte für Mathematik und Physik 38: 173–198. Reprinted and translated in Feferman et al. (1986), pp. 144– 195. 267 BIBLIOGRAPHY John Dawson, Jr. 1997. Logical Dilemmas: The Life and Work of Kurt Gödel. Boca Raton: CRC Press. Linsenmayer, Mark. 2014. The partially examined life: Gödel on math. URL http://www.partiallyexaminedlife.com/ 2014/06/16/ep95-godel/. Podcast audio. MacFarlane, John. 2015. Alonzo Church's JSL reviews. URL http://johnmacfarlane.net/church.html. Matijasevich, Yuri. 1992. My collaboration with Julia Robinson. The Mathematical Intelligencer 14(4): 38–45. O'Connor, John J. and Edmund F. Robertson. 2014. Rózsa Péter. URL http://www-groups.dcs.st-and.ac.uk/~history/ Biographies/Peter.html. Péter, Rózsa. 1935a. Über den Zusammenhang der verschiedenen Begriffe der rekursiven Funktion. Mathematische Annalen 110: 612–632. Péter, Rózsa. 1935b. Konstruktion nichtrekursiver Funktionen. Mathematische Annalen 111: 42–60. Péter, Rózsa. 1951. Rekursive Funktionen. Budapest: Akademiai Kiado. English translation in (Péter, 1967). Péter, Rózsa. 1967. Recursive Functions. New York: Academic Press. Péter, Rózsa. 2010. Playing with Infinity. New York: Dover. URL https://books.google.ca/books?id=6V3wNs4uv_4C&lpg= PP1&ots=BkQZaHcR99&lr&pg=PP1#v=onepage&q&f=false. Reid, Constance. 1986. The autobiography of Julia Robinson. The College Mathematics Journal 17: 3–21. Reid, Constance. 1996. Julia: A Life in Mathematics. Cambridge: Cambridge University Press. URL 268 BIBLIOGRAPHY https://books.google.ca/books?id=lRtSzQyHf9UC& lpg=PP1&pg=PP1#v=onepage&q&f=false. Robinson, Julia. 1949. Definability and decision problems in arithmetic. Journal of Symbolic Logic 14(2): 98–114. URL http://www.jstor.org/stable/2266510. Robinson, Julia. 1996. The Collected Works of Julia Robinson. Providence: American Mathematical Society. Sigmund, Karl, John Dawson, Kurt Mühlberger, Hans Magnus Enzensberger, and Juliette Kennedy. 2007. Kurt Gödel: Das Album–The Album. The Mathematical Intelligencer 29(3): 73– 76. Smith, Peter. 2013. An Introduction to Gödel's Theorems. Cambridge: Cambridge University Press. Takeuti, Gaisi, Nicholas Passell, and Mariko Yasugi. 2003. Memoirs of a Proof Theorist: Gödel and Other Logicians. Singapore: World Scientific. Tamassy, Istvan. 1994. Interview with Róza Péter. Modern Logic 4(3): 277–280. Tarski, Alfred. 1981. The Collected Works of Alfred Tarski, vol. I–IV. Basel: Birkhäuser. Wang, Hao. 1990. Reflections on Kurt Gödel. Cambridge: MIT Press. About the Open Logic Project The Open Logic Text is an open-source, collaborative textbook of formal meta-logic and formal methods, starting at an intermediate level (i.e., after an introductory formal logic course). Though aimed at a non-mathematical audience (in particular, students of philosophy and computer science), it is rigorous. Coverage of some topics currently included may not yet be complete, and many sections still require substantial revision. We plan to expand the text to cover more topics in the future. We also plan to add features to the text, such as a glossary, a list of further reading, historical notes, pictures, better explanations, sections explaining the relevance of results to philosophy, computer science, and mathematics, and more problems and examples. If you find an error, or have a suggestion, please let the project team know. The project operates in the spirit of open source. Not only is the text freely available, we provide the LaTeX source under the Creative Commons Attribution license, which gives anyone the right to download, use, modify, re-arrange, convert, and re-distribute our work, as long as they give appropriate credit. Please see the Open Logic Project website at openlogicproject.org for additional information.