with an INTRODUCTION to FORMAL LANGUAGES JOHN CARROLL San Diego State University DARRELL LONG University of California, Santa Cruz ill PRENTICE HALL, Englewood Cliffs, New Jersey 07632 Library of Congress Cataloging-in-Publication Data CARROLL,JOHN Theory of finite automata. Bibliography: p. Includes index. 1. Sequential machine theory. 2. Formal languages. 1. Long, Darrell II. Title. QA267.5.S4C35 1989 511 88-22416 ISBN 0-13-913708-4 Editorial/production supervision: Kathleen Schiaparelli and Joan McCulley Manufacturing buyer: Mary Noonan for Bonnie for Mary The author and publisher of this book have used their best efforts in preparing this book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The author and publisher make no warranty of any kind, expressed or implied, with regard to these programs or the documentation contained in this book. The author and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. TRADEMARK INFORMATION UNIX is a registered trademark of AT&T Bell Laboratories. Turing's World, copyright 1986by Jon Barwise and John Etchemendy Apple Macintosh is a registered trademark of Apple Computer Inc. " © 1989by Prentice-Hall, Inc.=A Division of Simon & Schuster Englewood Cliffs, New Jersey 07632 All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission in writing from the publisher. Printed in the United States of America 10 9 8 7 6 5 4 3 2 ISBN 0-13-913708-4 PRENTICE-HALL INTERNATIONAL (UK) LIMITED, London PRENTICE-HALL OF AUSTRALIA PrY. LIMITED, Sydney PRENTICE-HALL CANADA INC., Toronto PRENTICE-HALL HISPANOAMERICANA, S.A., Mexico PRENTICE-HALL OF INDIA PRIVATE LIMITED, New Delhi PRENTICE-HALL OF JAPAN, INC., Tokyo SIMON & SCHUSTER ASIA PTE. LTD., Singapore EDlTORA PRENTICE-HALL DO BRASIL, LTDA, Rio de Janeiro CONTENTS PREFACE vii 0 PRELIMINARIES 1 0.1 Logic and Set Theory 1 0.2 Relations 5 0.3 Functions 8 0.4 Cardinality and Induction 13 0.5 Recursion 17 0.6 Backus-Naur Form 18 Exercises 20 1 INTRODUCTION AND BASIC DEFINITIONS 23 1.1 Alphabets and Words 24 1.2 Definition of a Finite Automaton 28 1.3 Examples of Finite Automata 41 1.4 Circuit Implementation of Finite Automata 46 1.5 Applications of Finite Automata 54 Exercises 58 iii iv 2 CHARACTERIZATION OF FAD LANGUAGES 2.1 Right Congruences 65 2.2 Nerode's Theorem 70 2.3 Pumping Lemmas 75 Exercises 80 3 MINIMIZATION OF FINITE AUTOMATA 3.1 Homomorphisms and Isomorphisms 86 3.2 Minimization Algorithms 97 Exercises 110 4 NONDETERMINISTIC FINITE AUTOMATA 4.1 Definitions and Basic Theorems 116 4.2 Circuit Implementation of NDFAs 131 4.3 NDFAs with Lambda Transitions 134 Exercises 140 5 CLOSURE PROPERTIES 5.1 FAD Languages and Basic Closure Theorems 146 5.2 Further Closure Properties 160 Exercises 170 6 REGULAR EXPRESSIONS 6.1 Algebra of Regular Expressions 178 6.2 Regular Sets as FAD Languages 182 6.3 Language Equations 185 5.4 FAD Languages as Regular Sets; Closure Properties 200 Exercises 204 Contents 65 86 116 146 178 Contents 7 FINITE-STATE TRANSDUCERS 7.1 Basic Definitions 211 7.2 Minimization of Finite-State Transducers 217 7.3 Moore Sequential Machines 225 7A Transducer Applications and Circuit Implementation 237 Exercises 244 8 REGULAR GRAMMARS 8.1 Overview of the Grammar Hierarchy 253 8.2 Right-Linear Grammars and Automata 261 8.3 Regular Grammars and Regular Expressions 267 Exercises 278 9 CONTEXT-FREE LANGUAGES 9.1 Parse Trees 284 9.2 Ambiguity 290 9.3 Canonical Forms 301 9.4 Pumping Theorem 315 9.5 Closure Properties 319 Exercises 323 10 PUSHDOWN AUTOMATA 10.1 Definitions and Examples 327 10.2 Equivalence of PDAs and CFGs 339 10.3 Equivalence of Acceptance by Final State and Empty Stack 349 lOA Closure Properties and Deterministic Pushdown Automata 352 Exercises 360 v 210 253 284 327 vi 11 TURING MACHINES 11.1 Definitions and Examples 364 11.2 Variants of Turing Machines 376 1L3 Turing Machines, LBAs, and Grammars 382 11.4 Closure Properties and the Hierarchy Theorem 399 Exercises 401 12 DECIDASILITY 12.1 Decidable Questions about Regular Languages 405 12.2 Other Decidable Questions 414 12.3 An Undecidable Problem 417 12.4 Turing Decidability 422 12.5 Turing-Decidable Languages 424 Exercises 428 REFERENCES INDEX Contents 364 405 432 433 PREFACE It often seems that mathematicians regularly provide answers well before the rest of the world finds reasons to ask the questions. The operation of the networks of relays used in the first computers is exactly described by Boolean functions. George Boole thereby made his contribution to computer science in the mid-1800s, and Boolean algebra is used today to represent modern TTL (transistor-transistor logic) circuits. In the 1930s, Alan Turing formalized the concept of an algorithm with his presentation of an abstract computing device and characterized the limitations of such machines. In the 1950s, the abstraction of the concepts behind natural language grammars provided the theoretical basis for computer languages that today guides the design of compilers. These three major foundations of computer science, the mathematical description of computational networks, the limitations of mechanical computation, and the formal specification of languages are highly interrelated disciplines, and all require a great deal of mathematical maturity to appreciate. A computer science undergraduate is often expected to deal with all these concepts, typically armed only with a course in discrete mathematics. This presentation attempts to make it possible for the average student to acquire more than just the facts about the subject. It is aimed at providing a reasonable level of understanding about the methods of proof and the attendant thought processes, without burdening the instructor with the formidable task of simplifying the material. The majority of the proofs are written with a level of detail that should leave no doubt about how to proceed from one step to the next. These same proofs thereby provide a template for the exercises and serve as examples of how to produce formal proofs in the mathematical areas of computer science. It is vii viii Preface not unreasonable to expect to read and understand the material presented here in a nonclassroom setting. The text is therefore a useful supplement to those approaching a course in computation or formal languages with some trepidation. This text develops the standard mathematical models of computational devices, and investigates the cognitive and generative capabilities of such machines. The engineering viewpoint is addressed, both in relation to the construction of such devices and in the applications of the theory to real-world machines such as traffic controllers and vending machines. The software viewpoint is also considered, providing insight into the underpinnings of computer languages. Examples andapplications relating to compiler construction abound. This material can be tailored to several types of courses. A course in formal languages that stressed the development of mathematical skills could easily span two semesters. At the other extreme, a course designed as a prerequisite for a formal languages sequence might cover Chapters 1 through 7 and parts of Chapters 8 and 12. In particular, Chapter 8 is written so that the discussion of the more robust grammars (Section 8.1) can be entirely omitted. Section 12.1 is exclusively devoted to results pertaining to the constructs described in the earlier chapters, and Section 12.3 provides a natural introduction to the theory of computability by developing the halting problem without. relying on Turing machine concepts. Several people played significant roles in shaping this text. The book grew out of a set of lecture notes taken by Jack Porter, a student in a one-semester course on finite automata taught by Sara Baase at San Diego State in the 1970s. Baase's course was based on five weeks of lectures by Richard M. Karp at the University of California, Berkeley. The lecture notes were revised by William Root during the semesters he taught the course at San Diego State. The authors are also indebted to the many students who helped refine the presentation by suggesting clarifications and identifying typos, inaccuracies, and sundry other sins. Special thanks to Jon Barwise and John Etchemendy at Stanford University for their permission to incorporate examples from their Turing's World Macintosh software package, available from Kinko's Academic Courseware Exchange, 255 West Stanley Ave., Ventura, CA 93001. Robin Fishbaugh was instrumental in shepherding the class notes through their various electronic forms; her numerous contributions are gratefully acknowledged. ~~~ ( IT IS (NCOtJCEiVÃu IMAT I '7 SHOULD "Ef=... Aj>PRoAc..-Hc~ 13'1 AN 1'2>lcT W/Tff S~Al'\ CVI3E<;"* Courtesy ofAlexis A. Gilliland CHAPTER PRELIMINARIES This chapter reviewssome of the basic concepts used in this text. Many can be found in standard texts on discrete mathematics. Much of the notation employed in later chapters is also presented here. 0.1 LOGIC AND SETTHEORY A basic familiarity with the nature of formal proofs is assumed; most proofs given in this text are complete and rigorous, and the reader is encouraged to work the exercises in similar detail. A knowledge of logic circuits would be necessary to construct the machines discussed in this text. Important terminology and techniques are reviewed here. Unambiguous statements that can take on the values Trueor False (denoted by 1 and 0, respectively) can be combined with connectives such as and (1\), or (V), and not (-,) to form more complex statements. The truth tables for several useful connectives are given in Figure 0.1, along with the symbols representing the physical devices that implement these connectives. As an example of a complex statement, consider the assertion that two statements p and q take on the same value. This can be rephrased as: Either (p is true and q is true) or (p is false and q is false). As the truth table for not shows, a statement r is false exactly when -,r is true; the above assertion could be further refined to: Either (p is true and q is true) or (-,p is true and -'q is true). 1 2 Preliminaries Chap. 0 \'. ' , NOT gate AND gate OR gate NAND gate NOR gate # p q pAq p q pVq p q p t q p q p~q1 1 1 1 1 1 1 1 0 1 1 0o 1 1 0 0 1 0 1 1 0 1 1 0 0 0 1 0 '0 1 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 Figure 0.1 Common logic gates and their truth tables In symbols, this can be abbreviated as: (pA q) V (---,pA---,q) The truth table covering the four combinations of truth values of p and q can be built from the truth tables defining A, V, and ---', as shown in Figure 0.2. The truth table shows that the assertion is indeed true in the two cases where p and q reflect the same values, and false in the two cases where the values assigned top and q differ. When the statement that rand s always take on the same value is indeed true, we often write r iffs (r if and only if s). It can also be denoted by r~ s (r is equivalent to s). p q IP Iq IpA,q pAq (pAq)V(,pA,q) 1 1 0 0 0 1 1 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 1 Figure 0.2 Truth tables for various compound expressions Consider the statement (pA q) V(p ~ q). Truth tables can be constructed to verify that (p Aq) V(---,pA---,q) and (pA q) V(p ~ q) have identical truth tables, and thus (pAq) V(---,pA---,q)~ (pAq) V(p ~ q). EXAMPLE 0.1 Circuitry for realizing each of the above statements is displayed in Figure 0.3. Since the two statements were equivalent, the circuits will exhibit the same behavior for all combinations of input signals p and q. The second circuit would be less costly to build since it contains fewer components, and tangible benefits therefore arise when equivalent but less cumbersome statements can be derived. Techniques for minimizing such circuitry are presented in most discrete mathematics texts. Example 0.1 shows that it is straightforward to implement statement formulas by circuitry. Recall that the location of the I, values in the truth table can be used to find the corresponding principal disjunctive normal form (PDNF) for the expression Sec. OJ Logic and Set Theory 3 p q .p q p q (pAq)V(pJ,q) Figure 0.3 Functionally equivalent circuits represented by the truth table. *For example, the truth table corresponding to NAND has 3 rows with 1 values (p = 1, q = 0; p = 0, q = 1; P = O,q = 0), leading to three terms in the PDNF expression: (pA--,q)V(--,pAq)V(--,pA--,q). This formula can be implemented as the circuit illustrated in Figure 0.4, and thus a NAND gate can be replaced by this combination of three ANDs and one OR gate. This circuit relies on the assurance that a quantity of interest (such as p) will generally be available in both its negated and un negated forms. Hence we can count on access to an input line representing --,p (rather than feeding the input for p into a NOT gate). p -,p q -,p -,q Figure 0.4 A circuit equivalent to a single NAND gate In a similar fashion, any statement formula can be represented as a group of AND gates feeding a single OR gate. In larger truth tables, there may be many more 1 values, and hence more complex statements may need many AND gates. Regardless of the statement complexity, however, circuits based on the PDNF of an expression will allow for a fast response to changing input signals, since no signal must propagate through more than two gates. Other useful equivalences are given in Figure 0.5. Each rule has a dual, written on the same line. (p Vq) Ar~ (pAr) V(q /vr) (pVq)Vr~pV(qVr) p v q ee q vp --,(p Vq) ~--,pA--,q (pVq)Ap~p pV--,p~True (p zvq)Vr~ (p Vr) A(q Vr) (pAq) Ar~ pA (q/vr) p zvq eeq Ap --,(pAq) ~--,pV--,q (pAq)Vp~p PA --,p~ False (distributive laws) (associative laws) (commutative laws) (De Morgan's laws) (absorption laws) (mutual exclusion) Figure 0.5 Some usefuI equivalences and their duals Predicates are often used to make statements about certain objects ,such as the numbers in the set ~ of integers. For example, Q might represent the property of 4 Preliminaries Chap. 0 being less than 5, in which case Q(x) will represenf the statement "x is less than 5." Thus, Q(3) is true, while Q(7) is false. It is often necessary to make global statements such as: All integers have the property P, which can be denoted by (Vx E ~)P(x). Note that the dummy variable x was used to state the concept in a convenient form; x is not meant to represent a particular object, and the statement could be equivalently phrased as (Vi E ~)P(i). For the predicate Q defined above, the statement (Vx E ~)Q(x) is false, while when applied to more restricted domains, (Vx E {I, 2, 3})Q(x) is true, since it is in this case equivalent to Q(l) AQ(2) AQ(3), or (1 < 5) A(2 < 5) A(3 < 5). In a similar fashion, the statement that some integers have the property P will be denoted by (3i E ~)P(i). For the predicate Q defined above, (3i E {4,5, 6})Q(i) is true, since it is equivalent to Q(4) VQ(5)VQ(6), or (4 < 5)V(5 < 5)V(6 < 5). The statement (3y E {7,8, 9})Q(y) is false. Note that asserting that it is not the case that all objects have the property P is equivalent to saying that there is at least one object that does not have the property P. In symbols, we have ---,(Vx E ~)P(x) ~ (3x E ~)(---,P(x)) Similarly, ---,(3x E ~)P(x) ~ (Vx E ~)(---,P(x)) Given two statements A and B, if B is true whenever A is true, we will say that A implies B, and write A ~ B. For example, the truth tables show that pAq~ pV q, since for the case where p.vq is true (p = 1, q = 1), PVq is true, also. In the cases where pAq is false, the value of p Vq is immaterial. A basic knowledge of set theory is assumed. Some standard special symbols will be repeatedly used to designate common sets. V Definition 0.1 The set of natural numbers is given by N = {O, 1,2,3,4, }. The set of integers is given by ~ ={... -2, -1,0,1,2, }. The set of rational numbers is given by Q = {alb Ia E ~,b E ~,b =1= O}. The set of real numbers (points on the number line) will be denoted by R The following concepts and notation will be used frequently throughout the text. V Definition 0.2. Let A and B be sets. A is a subset of B if every element of A also belongs to B; that is, A (;;,B iff (Vx) (x EA ~ x E B). Ll V Definition 0.3. Two sets A and B are said to be equal if they contain exactly the same elements; that is, A = B iff (Vx) (x EA ~ x E B). Ll Sec. 0.2 Relations 5 Thus, two sets A and B are equal iffA ~ Band B ~ A. The symbol C will be used to denote a proper subset: A C B iffA ~ B and A f B. V Definition 0.4. For sets A and B, the cross product of A with B, is the set of all ordered pairs from A and B; that is, A x B = {(a, b) la EA 1\ bE B}. ~ 0.2 RELATIONS Relations are used to describe relationships between members of sets of objects. Formally, a relation is just a subset of a cross product of two sets. V Definition 0.5. Let X and Y be sets. A relation R from X to Y is simply a subset of X x Y. If (a, b) E R, we write aRb. If (a, b) ft. R, we write aRb. If X = Y, we say R is a relation in X. ~ EXAMPLE 0.2 Let X ={I, 2, 3}. The familiar relation < (less than) would then consist of the following ordered pairs: <: {(I, 2), (1,3), (2, 3)}, by which we mean to indicate that 1 < 2, 1 < 3, and 2 < 3. (3,3) ft. < since 31: 3. Some relations have special properties. For example, the relation "less than" is transitive, by which we mean that for any numbers x, y, and z, if x <y and y < z, then x < z. Definition 0.6 describes an important class of relations that have some familiar properties. V Definition 0.6 A relation is reflexive iff (Vx)(xRx). A relation is symmetric iff (Vx)(Vy)(xRy => yRx). A relation is transitive iff (Vx) (Vy)( Vz)((xRy 1\ y Rz) => xRz). An equivalence relation is a relation that is reflexive, symmetric, and transitive. EXAMPLE 0.3 < is not an equivalence relation; while it is transitive, it is not reflexive since 31: 3. (It is also not symmetric, since 2 < 3, but 31: 2.) EXAMPLE 0.4 Let X =I\J. The familiar relation = (equality) is an equivalence relation. = :{(O, 0), (1,1), (2,2), (3,3), (4,4), ... }, 6 Preliminaries Chap. 0 and it is clear that ('v'x)('v'y)(x = y =? y = x). The equality relation is therefore symmetric, and it is likewise obvious that = is also reflexive and transitive. V Definition 0.7. Let R be an equivalence relation in X, and let hEX. Then [h]R refers to the equivalence class consisting of all entities that are related to h by the equivalence relation R; that is, [h]R = {y lyRh}. ~ EXAMPLE 0.5 The equivalence classes for = are singleton sets: [1]= = {I}, [5]= = {5}, and so on. EXAMPLE 0.6 Let X = ~, and define the relation R in ~ by tu, v)R(w,x) iff ux = vw If (z , y) is viewed as the fraction xly, then R is the relation that identifies equivalent fractions: 2/3 R 14/21, since 2*21 = 3*14. In this sense, R can be viewed as the equality operator on the set of rational numbers Q. Note that in this context the equivalence class [2/8]R represents the set of all "names" for the point one-fourth of the way between°and 1; that is, [2/8]R ={... , 31-12, -2/-8, -11-4, 114,2/8,3/12,4/16,5/20, ...} There are therefore many other ways of designating this same set; for example, [1I4]R = { ... , -3/-12, -2/-8, -11-4, 114,2/8,3/12,4/16,5/20, ...} EXAMPLE 0.7 Let X = N and choose an n EN. Define R, by x Rny iff(3i E ~)(x Y = i *n) That is, two numbers are related if their difference is a multiple of n. Equivalently, x and y must have the same remainder upon dividing each of them by n if we are to have x Rny. R, can be shown to be an equivalence relation for each natural number n. The equivalence classes of R2 , for example, are the two familiar sets, the even numbers and the odd numbers. The equivalence classes for R3 are [0]R 3 = {O, 3, 6, 9, 12, 15, } [l]R3 ={I, 4, 7,10,13, } [2]R3 ={2,5, 8, 11, 14, } R, is often called congruence modulo n, and x Rny is commonly denoted by X"" Y (modn) or x ""nY. If R is an equivalence relation in X, then every element of X belongs to exactly Sec. 0.2 Relations 7 one equivalence class of R. X is therefore comprised of the union of the equivalence classes of R, and in this sense R partitions the set X into disjoint subsets. Conversely, a partition of X defines an equivalence relation in X; the sets of the partition can be thought of as the equivalence classes of the resulting relation. V' Definition 0.8. Given a set X and sets A1>Az, .. . ,An' P = {A1>Az, . . . ,An} is a partition of X if the sets in P are all subsets of X, they cover X, and are pairwise disjoint. That is, the following three conditions are satisfied: (Vi E {I, 2, ... , n})(Ai ~ X) (Vx EX)(3i E {1,2, ... ,n} ~ x EAi) (Vi,j E {I, 2, ... ,n})(i -+ j => Ai n A j = 0) V' Definition 0.9. Given a set X and a partition P = {AI, A z, ... ,An} of X, the relation R(P) in X induced by P is given by (Vx E X)(Vy E X)(x R(P)y ¢::> (3i E {I, 2, ... , n} ~ x EAi 1\ Y E Ai» R(P) thus relates elements that belong to the same subset of P. EXAMPLE 0.8 Let X = {I, 2, 3, 4, 5} and consider the relation Q = R(S) induced by the partition S ={{I, 2},{3, 5},{4}}. Since 1 and 2 are in the same set, they should be related by Q, while 1<;D4 because 1 and 4 belong to different sets of the partition. Q can be described by Q = {(I, I), (I, 2), (2, I), (2, 2), (3, 3), (3, 5), (4, 4), (5, 3), (5, 5)} It is straightforward to check that Q satisfies the three properties needed to qualify as an equivalence relation, and the equivalence classes of Q are [l]Q= {I, 2} [2]Q = {I, 2} [3]Q = {3,5} [4]Q = {4} [5]Q = {3,5} The set of distinct equivalence classes ofQ can be used to partition X; note that these three classes comprise P. In a similar manner, the three distinct equivalence classes of R3 in Example 0.7 form a partition of N. A "finer" partition of X can be obtained by breaking up the equivalence classes of Qintosmaller (and hence more numerous) sets. The resulting relation is called a refinement of Q. 8 Preliminaries Chap. 0 V Definition 0.10. Given two equivalence relations Rand Q in a set X, R is a refinement of Q iffR c;;;, Q; that is, ('fix EX)('fIy EX)«x,y) E R ~ (x,y) E Q). a EXAMPLE 0.9 Consider Q = {(I, 1), (1,2), (2,1), (2,2), (3,3), (3,5), (4,4), (5,3), (5,5)} and S = {(I, 1), (2,2), (3,3), (3,5), (4,4), (5,3), (5,5)}. S is clearly a subset of Q, and hence S refines Q. Note that the partition induced by S, {{I}, {2}, {3,5},{4}}, indeed splits up the partition induced by Q, which was {{1,2},{3,5},{4}}. While it may at first seem strange, the fact that S contained fewer ordered pairs than Q guarantees that S will yield more equivalence classes than Q. 0.3 FUNCTIONS A function f is a special type of relation in which each first coordinate is associated with one and only one second coordinate, in which case we can use functional notationf(x) to indicate the unique element f associates with a given first coordinate x. In the previous section we concentrated on relations in X, that is, subsets of X x X. The set of first coordinates of a function f (the domain X) is often different from the set of possible second coordinates (the codomain Y), and hence fwill be a subset of X x Y. V Definition 0.11. Afunctionf: X ~ Yis a subset of X x Yfor which 1. ('fix E X)(3y E Y , xfy). 2. ('fix E X)«XfYl /\ xfyz) ~ Yl = yz). a When a pair of elements are related by a function, we will writef(a) = b instead of afb or (a, b) Ef. The criteria for being a function could then be rephrased as ('fix E X)(3y E Y , f(x) = y), and ('fIXl E X) ('fIxz E X)(Xl = Xz ~ f(Xl) = f(xz)). EXAMPLE 0.10 Let n be a positive integer. Define fn : ~~ ~ by fnU ) = the smallest natural number i for which j == i modn. hU), for example, is a function and is represented by the ordered pairs h: {(O, 0), (1,1), (2, 2), (3, 0), (4,1), ... }. This implies that /3(0) = 0, N1) = 1, f3(2) = 2, f3(3) = 0, and so on. Note that f3 is a subset of the relation R3 given in Example 0.7. If R3 were presented as a function, it would not be well defined; that is, R3 does not satisfy Definition 0.11. For example, 2 R35 and 2 R38, but 5 =1= 8, and so R3(2) is not a meaningful expression, since there is no unique object that R3 associates with 2. In this case, R3 violated Definition 0.11 by associating more than one object with a Sec. 0.3 Functions 9 given first coordinate; in general, a proposed relation may also fail to be well defined by associating no objects with a potential first coordinate. EXAMPLE 0.11 Consider the "function" g:Q~ N defined by g(mln) = m. This apparently straightforward definition is fundamentally flawed. According to the formula, g(2/8) = 2, g(7/9) = 7, g(5/10) = 5, and so forth. However, 2/8= 5/20, but g(2/8) = 2 =f. 5 = g(5/20), and Definition 0.11 is again violated; g(0.25) is not a well defined quantity, and thus the "function" g is not well defined. Had g truly been a function, it would have passed the test: if x = y, then g(x) = g(y). The problem with this seemingly innocent definition is that 0.25 is actually an equivalenceclassof fractions (recall Example 0.6), and the definition of g was based on just one representative of that class. We observed that two representatives (2/8 and 5120) of the same class gave conflicting answers (2 and 5) for the value that g associated with their class (0.25). While it is possible to define functions on a set of equivalence classes in a consistent manner, it will always be important to verify that such functions are single valued. Selection criteria, which determine whether a candidate does or does not belong to a given set, are special types of functions. V Definition 0.12. Given a set A, the characteristic function XA associated with A is defined by XA(X) = 1 ifx E A and XA(X) = 0 ifx etA A EXAMPLE 0.12 The characteristic function for the set of odd numbers is the function fz given in Example 0.10. To say that a set is well defined essentially means that the characteristic function associated with that set is a well-defined function. A set of equivalence classes can be ill defined if the definition is based on the representatives of those equivalence classes. EXAMPLE 0.13 Consider the "set" of fractions that have odd numerators, whose characteristic "function" is defined by: XB(mln) = 1 if m is odd and XB(mln) = 0 if m is even 10 Preliminaries Chap. 0 This characteristic function suffers from flaws similar to those found in the function g in Example 0.11.1/4 = 2/8and yet XB(1/4) = 1 while XB(2/8) = 0, which implies that the fraction 1/4belongs to B, while 2/8 is not an element of B. Due to this ambiguous definition of set membership, B is not a well-defined set. B failed to pass the test: if x =y, then (x E B iffY E B). The definition of a relation requires the specification of the domain, codomain, and the ordered pairs comprising the relation. For relations that are functions, every domain element must occur as a first coordinate. However, the set of elements that occurs as second coordinates need not include all the codomain (as was the case in the function j, in Example 0.10). V Definition 0.13. The range of a function f: X ---? Y is given by {y E YI3x EX "3f(x) = y}. Conditions similar to those imposed on the behavior of first coordinates of a function may also be placed on second coordinates, yielding specialized types of functions. Functions for which the range encompasses all the codomain, for example, are called surjective. V Definition 0.14. A function f: X ---? Y is onto or surjective iff (Vy E Y)(3x EX"3 f(x) = y); that is, a set of ordered pairs representing an onto function must have at least one first coordinate associated with any given second coordinate. A EXAMPLE 0.14 The function g: {I, 2, 3}---? {a, b} defined by g(l) = a, g(2) = b, and g(3) = a is onto since both codomain elements are part of the range of g. However, the function h: {I, 2, 3}---? {a, b, c}defined by h (1) = a, h (2) = b, and h (3) = a is not onto since no domain element maps to c. The function [: !\J---?!\J defined by f(i) = i + 1 (Vi = 0, 1,2, ... ) is not onto since there is no element x for whichf(x) = 0. V Definition 0.15. A functionf: X ---? Yis one to one or injective iff (VXl E X)(VX2 E X) (f(Xl ) = f(X2) ~ Xl = X2); that is, an injective function must not have more than one first coordinate associated with any given second coordinate. A Sec. 0.3 Functions 11 ". EXAMPLE 0.15 The function f: Ñ N defined by f(i ) = i + 1 (Vi = 0, 1,2, ... ) is clearly injective since iff(i ) =f(j) then i + 1 =j + 1, and so i must equalj. The function g: {1,2, 3}x {a,b} defined by g(l) =a, g(2) =b, and g(3) =a is not one to one since g(l) =g(3), but 1 =13. ,V Definition 0.16. A function is a bijection iffit is one to one and onto (injective and surjective); that is, it must satisfy 1. (VXI E X)(VX2E X)(f(XI) = f(X2) ::} Xl = X2)' 2. (Vy E Y)(3x EX ~ f(x) = y). !::. A bijective function must therefore have exactly one first coordinate associated with any given second coordinate. EXAMPLE 0.16 The functionf: Ñ N defined by f(i) = i + 1 (Vi =0, 1,2, ... ) is injective but not surjective, so it is not a bijection. However, the function b: D~ 0 defined by b(i) = i + 1 (Vi = ... , -2, -1,0,1,2, ... ) is a bijection. Note that while the rule for b remains the same as for t. both the domain and range have been expanded, and many more ordered pairs have been added to form b. It is often appropriate to take the results produced by one function and apply the rule specified by a second function. For example, we may have a list associating students with their height in inches (that is, we have a function relating names with numbers). The conversion rule for changing inches into centimeters is also a function (associating any given number of inches with the corresponding length in centimeters), which can be applied to the heights given in the student list to produce a new list matching student names with their height in centimeters. This new list is referred to asthe compositionof the original two functions. V Defmition 0.17. The composition of two functions f: X ~ Y and gi. Y~ Z is given by gof={(x,z)13yEY~(x,y)Ef and (y,z)Eg} Note that the composition is not defined unless the codomain of the first function . matches the domain of the second function. In functional notation, gof = {(x, z)13y E Y ~ f(x) =y andg(y) =z}, and therefore wheng ofis defined, it can be described by the rule g 0 f(x) =g(f(x». 12 Preliminaries Chap. 0 EXAMPLE 0.17 Consider the functions f3 from Example 0.10 and f from Example 0.14, where /3 :~ ---7 ~ was defined by f3(j) = the smallest natural number i for which I a i mod 3, and the function f: ~ ---7 ~ is defined by f(i ) = i + 1. f'13 consists of the ordered pairs {(O,1),(1,2),(2,3),(3,1),(4,2),(5,3), ... } and is represented by the rule fo!J(j ) =!J(j ) + 1, which happens to be the smallest positive number that is congruent to j + 1 mod 3. Note that f3 0 f(j ) =f3(j + 1), which happens to be the smallest natural number that is congruent to j + 1mod 3. This represents the different set of ordered pairs {(O, 1), (1, 2), (2, 0), (3, 1), (4, 2), (5, 0), ... }. In most cases, i-s r s-). V Theorem 0.1. Let the functions f: X ---7 Y and g: Y ---7 Z be onto. Then g of is onto. Proof. See the exercises. V Theorem 0.2 Let the functions f: X ---7 Y and g: Y ---7 Z be one to one. Then g of is one to one. Proof. See the exercises. V Definition 0.18. The converse of a relation R, written -R, is defined by -R = {(y,x)\(x,y)E R} The converse of a function f is likewise -f={(Y,x)l(x,y) Ef} If -f happens to be a function, it is called the inverse of I and is denoted by t '. !:i When the inverse exists, itis~ppropriate to usefunctional ~otation for I-I also, añ we therefore have, for any elements a and b;rJ(b) =a iffI(a) = b. Note that If I: X ---7 Y then j"' I: Y ---7 X. EXAMPLE 0.18 Consider the ordered pairs for the relation <: {(1,2), (l, 3), (2, 3)}. The converse is then -<: {(2,1), (3,1), (3,2)}. Thus, the converse of "less than" is the relation "greater than. " The function b: 0---7 ~ defined by b(i) = i + 1 (Vi = , -2, -1,0, 1,2, ) has the inverse b -I: 6---76 defined by b -'(i) = i 1 (Vi = , -2, -1,0,1,2, ). The inverse of the function that increments integers by 1 is the function that decrements integers by the same amount. Sec. 0.4 Cardinality and Induction 13 The function f: 0-') 0defined by f(i ) = i2 ('Vi = ... , 2, -1,0, 1,2, ... ) has a converse that is not a function over the given domain and codomain; the inverse notation is inappropriate, sincer 1(3) is not defined, nor isr 1( -4), Not surprisingly, if the converse of f is to be a function, the codomain of f (which will be the new domain of f-l) must satisfy conditions similar to those imposed on the domain off. In particular: v Theorem 0.3. Let f: X -') Y be a function. The converse of f is a function iff f is a bijection. Proof. See the exercises. Iff is a bijection.j"? must exist and will also be a bijection. In fact, the compositions f of -1 andr:ofare the identity functions on the domain and codomain, respectively (seethe exercises). 0.4 CARDINALITY AND INDUCTION ... The size of various sets will frequently be of interest in the topics covered in this text, and it will occasionally be necessary to consider the set of all subsets of a given set. v 'Defmition 0.19. Given a set A, the power set of A, denoted by p(A) or 2A, is p(A)={XIXkA} EXAMPLE 0.19 p({a, b, c})= {0, {a}, {b},{c}, {a,b},{a,c},{b, c},{a,b, cH and p({ })= {0}. Note that {0} -+ 0. y Defmition 0.20. Tho sets X and Yare equipotent if there exists a bijection f: X -') Y, and we wiIlwrite IIxii = II YII. IIXII denotes the cardinality of X, that is, the number of elements in X. 6. That is, sets with the same cardinality or "size" are equipotent. The equipotent relation is reflexive, symmetric, and transitive and is therefore an equivalence relation. 14 Preliminaries Chap. 0 EXAMPLE 0.20 The function g: {a,b,'c}--+{x,y,z} defined by g(a)=z, g(b) =y, and g'(c)=x is a bijection, and thus II{a, b, cHI = II{x,y, zj]. The equivalence class consisting of all sets that are equipotent to {a, b,c} is generally associated with the cardinal number 3. Thus, l1{a, b, cHI = 3; II{ }II = O. {a, b, c}is not equipotent to { }, and hence 3 =1= O. The subset relation allows the sizes of sets to be ordered: IIA II ::511 B II iff (3C)(C cBAliA II = IICII)* We will write IIA II < IIBII iff<llA II ::5IIBII and IIA II =1= liB II)* The observations about {a, b, c}and { } imply that 0 < 3. For N ={O, 1,2,3,4,5,6, ... } and IE ={0,2,4, 6, ... }, the function f: M~ IE, defined by f(x) =2x, is a bijection. The set of natural numbers N is countably infinite, and its size is often denoted by Xo= IINII. The doubling functionfshows that IINII = 111E11~ Similarly, it can be shown that 0 and Nx Nare also countably infinite (see the exercises). A set that is equipotent to.one of its proper subsets is called an infinite set. Since II NII = II IE II and yet IE eN, we know that N must be infinite. No such correspondence between {a, b, c} and any of its proper subsets is possible, so {a, b, c}is a finite set. 3 is therefore a finite cardinal number, while Xorepresents an infinite cardinal number. Theorem 0.4 compares the size of a setA with the number of subsets of A and . shows that IIA II < II p(A) II. For the sets in Example 0.19, we see that 3 < 8 and 0 < 1, which is not unexpected. It is perhaps surprising to find that the theorem will also apply to infinite sets, for example, liN" < II p(N) II. This means that there are cardinal numbers larger than Xo; there are infinite sets that are not countably infinite. Indeed, the next. theorem implies that there is an unending progression of infinite cardinal numbers.* . . . V Theorem 0.4. Let A be any set. Then IIA II < IIp(A)II. Proof. There is a bijection between A and the set of all singleton subsets of A, as shown by the function s: A --+ {{x}lx EA} defined by s(z) ={z} for each z EA. Since {{x}lx EA}r;,p(A), we have IIAII::5llp(A)II. It remains to show that IIA II =1= IIp(A)II* By definition of cardinality, we must show that there cannot exist a bijection between A and p(A). The following proof by contradiction will show this. Assume f: A --+ p(A) is a function; we will demonstrate that there must exist a set in p(A) that is not in the range off, and hence f cannot be onto. Consider an element z of A and the set f(z) to which it maps. f(z) is a subset of A, and hence z mayor may not belong to f(z). Define B to be the set {y E A Iy E;tf(y)}. B is then the set of all elements of A that do not appear in the set corresponding to their image under f. It is impossible for B to be in the range of f, for if it were then there would be an element of A that maps to this subset: assume w E A and f( w) = B. Since w is an element of A, it might belong to B, which is a subset of A. If wEB, then w Ef(w), sincef(w) = B; but the elements forwhichy Ef(y) were exactly the . ones omitted from B, and thus we would have w E;t B, which is a contradiction. Our speculation that w might belong to B is therefore incorrect. The only other option is that w does not belong to B. But if w E;t B = f(w), then w is one of the elements that Sec. 0.4 Cardinality and Induction 15 are supposed to be in B and we are again faced with the impossibility that w f/:. Band wEB. In all cases, we reach a contradiction if we assume that there exists an element w for whichf(w) == B. Thus, B was a member of the codomain that is not in the range off, and f is therefore not a bijection. A Sets that are finite or are countably infinite are called countable or denumerable because their elements can be arranged one after the other (enumerated). We will often need to prove that a given statement is true in an infinite variety of cases that can be enumerated by the natural numbers 0, 1,2, .... The assertion that the sum of the first n positive numbers can be predicted by multiplying n by the number one larger than n and dividing the result by 2 seems to be true for various test values of n: 1 + 2+ 3 == 3(3 + 1)/2 1 + 2 + 3 + 4 + 5 == 5(5 + 1)/2 and so on. We would like to show that the assertion is true for all values of n == 1,2,3, ... , but we clearly could never check the arithmetic individually for an infinite number of cases. The assertion, which varies according to the particular number n we choose, can be represented by the statement P(n): 1 + 2 + 3 + ... + (n 2) + (n 1) + n adds up to (n + 1)n/2. Note that P(n) is not a number; it is the assertion that two numbers are the same and therefore will only take on the values True and False. We would like to show that P(n) is true for each positive integer n; that is, ('v'n)P(n). Notice that if you were to attempt to check out whether P(101) was true your work would be considerably simplified if you already knew how the first 100 numbers added up. If the first 100 summed to 5050, it is clear that 1 + 2 + ... + 99 + 100 + 101 = (1 + 2 + ... + 99 + 100) + 101 = 5050 + 101 = 5151; the hard part of the calculation can be done without doing arithmetic with 101 separate numbers. Checking that (101 + 1)101/2 agrees with 5151 shows that P(101) is indeed true [that is, as long as we are sure that our calculations in verifying P(100) are correct]. Essentially, the same technique could have been used to show that P(6) followed from P(5). This trick of using the results of previous cases to help verify further cases is reflected in the principle of mathematical induction. V Theorem 0.5. Let P(n) be a statement for each natural number n EN. From the two hypotheses i. P(O) ii. ('v'm E N)(P(m) =i> P(m + 1)) we can conclude ('v'n E N)P(n). A 16 Preliminaries Chap. 0 The fundamental soundness of the principle is obvious in light of the following analogy: Assume you can reach the basement of some building (hypothesis i). If you were assured that from any floor m you could reach the next higher floor- (hypothesis ii), you would then be assured that you could reach any floor you wished «"In E N)P(n)). Similar statements can be made from other starting points; for example, beginning with P(4) and ("1m> 4)(P(m)~ P(m + 1)), we can derive the conclusion ("1m> 4)P(n); had we started on the fourth floor of the building, we could reach any of the higher floors. EXAMPLE 0.21 Consider the statement discussed above, where P(n) was the assertion that 1 + 2 + 3 + ... + (n 2) + (n 1) + n adds up to (n + l)n/2. We will begin with P(I) (the basis step) and note that 1 = (1 + 1)1/2, so P(I) is indeed true. For the inductive step, let m be an arbitrary (but fixed) positive integer, and assume P(m + 1) is true; that is, 1 + 2 + 3 + ... + (m 2) + (m 1) + m adds up to (m + l)m/2. We need to show P(m + 1): 1 + 2 + 3 + ... + (m + 1 2) + (m + 1 1) + (m + 1)) adds up to (m + 1 + 1)(m + 1)/2. As in the case of proceeding from 100 to 101, we will use the fact that the first m integers add up correctly (the induction assumption) to see how the first m + 1 integers add up. We have: 1 + 2 + 3 + ... + (m + 1 2) + (m + 1 1) + (m + 1) = (1 + 2 + 3 + ... + (m + 1 2) + (m + 1 1)) + (m + 1) = (m + l)m/2 + (m + 1) = (m + l)m/2 + (m + 1)2/2 = «m + l)m + (m + 1)2)/2 = (m + 1)(m + 2)/2 = (m + 1 + 1)(m + 1)/2 P(m + 1) is therefore true, and P(m + 1) indeed follows from P(m). Since m was arbitrary, (Vm)(P(m)~ P(m + 1)) and, by induction, ("In;::: I)P(1i'). The formula is therefore true for every positive integer n. It is interesting to note that, with the usual convention of defining the sum of no integers to be zero, the formula also holds for n = 0, and P(O) could have been used as the basis step to prove ("In E N)P(n). EXAMPLE 0.22 Consider the statement Any statement formula using the n variables Pipz, ... -P» has an equivalent expression that contains less than n *2" operators. This can be proved by induction on the statement Sec. 0.5 Recursion 17 P(n): Any statement formula using n or fewer variables has an equivalent expression that contains less than n *2n operators. Basis step: A statement formula in one variable must be either be p, -,p, T, or F, each of which requires at most one operator, and since 1 < 1.21, P(l) is true. Inductive step: Assume P(m) is true; we need to prove that P(m + 1) is true, which is to say that we need to ensure that the statement holds not just for formulas with m or fewer variables, but also for formulas with m + 1 variables. Thus, choose an expression S containing the variables Pi> P2,... ,pm, Pm+1' Consider the principal disjunctive normal form (PDNF) of S. This expression is equivalent to S and has terms that can be separated into two categories: (1) those that contain the term pm+i> and (2) those that contain the term -'pm+l' While the PDNF may very well contain more than the desired number of terms, the distributive law can be used to factor pm+l out of all the terms in (1), leaving an expression of the form CA pm+i> where C is a formula containing only the terms Pi> P2,... 'Pm' Similarly, -'Pm+l can be factored out of all the terms in (2), leaving an expression of the form D A-,pm+i> where D is also a formula containing only the terms Pi> P2,... ,pm' S can therefore be written as (CAPm+l)V(DA-'Pm+l)' which contains the four operators A, V, A, and -, and the operators that comprise the formulas for C and D. However, since both C and D only contain the m variables PI, P2,... 'Pm' the induction assumption ensures that they each have equivalent representations using no more than m *2m operators. S can therefore be written in a form containing at most 4 + m *2m + m *2m operators, which can be shown to be less than (m + 1)* 2m+1 for all positive numbers m. Since S was an arbitrary expresson with m + 1 operators, we have shown that any statement formula using exactly m + 1 variables has an equivalent expression that contains no more than (m + 1)* 2m+1 operators. Since P(m) was assumed true, we likewise know that any statement formula using m or fewer variables also has an equivalent expression that contains no more than m *2m operators. P(m + 1) is therefore true, and P(m + 1) indeed follows from P(m). Since m was an arbitrary positive integer, ('fIm > l)(P(m)~ P(m + 1» and by induction ('fin> l)P(n). The formula is therefore true for every natural number n. 0.5 RECURSION Since this text will be dealing with devices that repeatedly perform certain operations, it is important to understand the recursive definition of functions and how to effectively investigate the properties of such functions. Recall that the factorial function (f(n) = n!) is defined to be the product of the first n integers. Thus, f(l) = 1 f(2)=1*2=2 f(3) = 1*2*3 = 6 f(4) = 1*2*3*4 = 24 18 Preliminaries Chap. 0 and so on. Note that individual definitions get longer as n increases. If we adopt the convention thatf(O) = 1, the factorial function can be recursively defined in terms of other values produced by the function. yo Definition 0.21. For x E ~, define f(x) = 1, f(x) = x *f(x -1), if x =0 ifx >0 This definition implies thatf(3) = 3'f(2) = 3*2'f(1) = 3*2*1 'f(0) = 3*2*1*1 = 6. 0.6 BACKUS-NAUR FORM The syntax of programming languages is often illustrated with syntax diagrams or described in Backus-Naur Form (BNF) notation. EXAMPLE 0.23 The constraints for integer constants, which may begin with a sign and must consist of one or more digits, are succinctly described by the following productions (replacement rules): <sign> :: = + 1- <digit>:: =0111213141516171819 <natural> :: = <digit> 1<digit> <natural> <integer> :: = <natural> [<sign><natural> The symbol [ represents "or," and the rule <sign> :: = + [should be interpreted to mean that the token <sign> can be replaced by either the symbol + or the symbol -. A typical integer constant is therefore +12, since it can be derived by applying the above rules in the following fashion: <integer>~ <sign> <natural> <sign><natural>~ + <natural> + <natural>~ + <digit> <natural> + <digit> <natural>~ + 1<natural> + 1<natural>~ + 1<digit> +1<digit>~+12 Sec. 0.6 Backus-Naur Form 19 Syntax diagrams for each of the four productions are shown in Figure 0.6. These can be combined to form a diagram that does not involve the intermediate tokens <sign>, <digit>, and <natural> (see Figure 0.7). (b) natural t )digit ) (a) sign (d) integer (c) digit --...-+) 0 --=-~ 1 2 3 4 5 6 7 8 9 ( ) --..;.....~) sign ) natural ---~ Figure 0.6 Syntax diagrams for the components of integer constants integer o --=--~ 1 2 3 4 5 6 7 8 9 Figure 0.7 A syntax diagram for integer constants 20 EXERCISES Preliminaries Chap. 0 0.1. Construct truth tables for: (a) -lr V (--,p ~ --,q) (b) (p!\--,q)V--,(p t q) 0.2. Draw circuit diagrams for: (a) --,(rV(--,p ~ --,q)) t (s!\p) (b) (p!\--,q)V--,(p t q) 0.3. Show that the sets {l, 2}x {a, b} and {a, b} x {l, 2}are not equal. 0.4. Let X = {l, 2, 3, 4}. (a) Determine the set of ordered pairs comprising the relation <. (b) Determine the set of ordered pairs comprising the relation =. (c) Since relations are sets of ordered pairs, it makes sense to union them together. Determine the set = U <. (d) Determine the set of ordered pairs comprising the relation es. 0.5. Let n be a natural number. Show that congruence modulo n, =n, is an equivalence relation. 0.6. Let X = N. Determine the equivalence classes for congruence modulo O. 0.7. Let X = N. Determine the equivalence classes for congruence modulo 1. 0.8. Let X = IR. Determine the equivalence classes for congruence modulo 1. 0.9. Let R be an arbitrary equivalence relation in X. Prove that the distinct equivalence classes of R form a partition of X. 0.10. Given a set X and a partition P = {A l ,A2 , ••• ,An} of X, prove that X equals the union of the sets in P. 0.11. Given a set X and a partition P = {A l ,A2 , ••• ,An} of X, prove that the relation R(P) in X induced by P is an equivalence relation. 0.12. Let X = {l, 2, 3, 4}. (a) Give an example of a partition P for which R(P) is a function. (b) Give an example of a partition P for which R(P) is not a function. 0.13. The following "proof' seems to indicate that a relation that is symmetric and transitive must also be reflexive: By symmetry, xRy ~ yRx. Thus we have (xRy !\yRx). By transitivity, (xRy!\y Rx) ~xRx. Hence (Vx)(xRx). Find the flaw in this "proof" and give an example of a relation that is symmetric and transitive but not reflexive. 0.14. Let R be an arbitrary equivalence relation in X. Prove that the equality relation on X refines R. 0.15. Consider the "function" t: IR~ IR defined by pairing x with the real number whose cosine is x. (a) Show that t is not well defined. (b) Adjust the domain and range of t to produce a valid function. 0.16. Consider the function s': IR~ IR defined by s'(x) = x 2 • Show that the converse of s' is not a function. Chap. 0 Exercises 21 0.17. Let IP' be the set of nonnegative real numbers, and consider the function s: IP'~ IP' defined by s(x) = x 2 • Show that s -I exists. 0.18. Letf: X x Y be an arbitrary function. Prove that the converse offis a function ifftis a bijection. 0.19. (a) Let -A denote the complement of a set A. Prove that -(-A) = A. (b) Let R denote the converse of a relation R. Prove that -(R) = R. 0.20. Let the functions f: X ~ Y and g: Y ~ Z be one to one. Prove that g of is one to one. 0.21. Let the functions f: X ~ Y and g: Y~ Z be onto. Prove that g of is onto. 0.22. Define two functions for which fog = g of. 0.23. Define, if possible, a bijection between: (a) Nand D (b) Nand N x N (c) Nand Q (d) N and {a, b, c} 0.24. Use induction to prove that the sum of the cubes of the first n positive integers adds up to n\n + 1)2/4. 0.25. Use induction to prove that the sum of the first n positive integers is less than n 2 (for n > 1). 0.26. qse induction to prove that, for n > 3, n! > n". ~. 0.27. U'se-induction to prove that, for n > 3, n! > 2n. 0.28. Use induction to prove that 12 + 22 + ... + n 2 = n(n + 1)(2n + 1)/6. 0.29. Prove by induction that X n (Xl U X 2 U ... U x, ) = (X n XI ) U (X n X 2 ) U ... U (X nxn ) . 0.30. Let -A denote the complement of the setA. Prove -(XI UX2 U*** UXn ) = (-Xd n / _~-.<;X2) n*** n (-Xn ) by induction. 0.31. Use induction to prove that there are 2n subsets of a set of size n; that is, for a finite set A, IIp(A II = 211AII• 0.32. The principle of mathematical induction is often stated in the following form, which requires (apparently) stronger hypotheses to reach the desired conclusion: Let P(n) be a statement for each natural number n EN. From the two hypotheses I, P(O) ii. ('tim eN)«('tIi $m)P(i»~P(m+ 1» we can conclude ('tin E N)P(n). Prove that the strong form of induction is equivalent to the statement of induction given in the text. Hint: Consider the restatement of the hypothesis given in Example 0.22. 0.33. Determine what types of strings are defined by the following BNF: <sign>:: =+ 1- <digit>:: =0111213141516171819 <natural>:: = <digit> I<digit><natural> <integer>": = <natural> 1 <sign><natural> <real constant>': = <integer> I <integer>. I <integer>. <natural> I <integer>. <natural>E<integer> 0.34. A set X is cofinite if the complement of X (with respect to some generally understood universal set) is finite. Let the universal set be D. Give an example of 22 Preliminaries Chap. 0 (a) A finite set (b) A cofinite set (c) A set that is neither finite nor cofinite 0.35. Consider the equipotent relation, which relates sets to other sets. (a) Prove that this relation is reflexive. (b) Prove that this relation is symmetric. (c) Prove that this relation is transitive. 0.36. Define a function that will show that II N II = II N x Nil. 0.37. Show that N is equipotent to N. 0.38. Show that N is equipotent to Q. 0.39. Show that p(N) is equipotent to {f: Ñ {Yes,No}lfis a function}. 0.40. Show that p(N) is equipotent to ~. 0.41. Draw a circuit diagram that will implement the function q given by the truth table shown in Figure 0.8. 0.42. (a) Draw a circuit diagram that will implement the function q, given by the truth table shown in Figure 0.9. (b) Draw a circuit diagram that will implement the function q2given by the truth table shown in Figure 0.9. (c) Draw a circuit diagram that will implement the function q3given by the truth table shown in Figure 0.9. PI pz P3 P4 ql q2 q3 0 0 0 0 1 1 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 1 1 1 1 0 0 1 0 0 0 1 1 0 1 0 1 1 0 0 0 1 1 0 1 1 0 PI pz P3 q 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 1 0 0 1 1 0 0 0 1 0 1 1 0 1 0 0 1 0 0 1 1 0 1 0 1 1 0 0 0 1 0 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 0 1 0 1 0 1 1 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 Figure 0.8 The truth table for Exercise 0.41 Figure 0.9 The truth table for Exercise 0.42 CHAPTER INTRODUCTION and BASIC DEFINITIONS This chapter introduces the concept of a finite automaton, which is perhaps the simplest form of abstract computing device. Although finite automata theory is concerned with relatively simple machines, it is an important foundation of a large number of concrete and abstract applications. The finite-state control of a finite automaton is also at the heart of more complex computing devices such as finitestate transducers (Chapter 7), pushdown automata (Chapter 10), and Turing machines (Chapter 11). Applications for finite automata can be found in the algorithms used for string matching in text editors and spelling checkers and in the lexical analyzers used by assemblers and compilers. In fact, the best known string matching algorithms are based on finite automata. Although finite automata are generally thought of as abstract computing devices, other noncomputer applications are possible. These applications include traffic signals and vending machines or any device in which there are a finite set of inputs and a finite set of things that must be "remembered" by the device. Briefly, a deterministic finite automaton, also called a recognizer or acceptor, is a mathematical model of a finite-state computing device that recognizes a set of words over some alphabet; this set of words is called the language accepted by the automaton ..For each word over the alphabet of the automaton, there is a unique path through the automaton; if the path ends in what is called a final or accepting state, then the word traversing this path is in the language accepted by the automaton. Finite automata represent one attempt at employing a finite description to rigorously define a (possibly) infinite set of words (that is, a language). Given such a 23 24 Introduction and Basic Definitions Chap. 1 description, the criterion for membership in the language is straightforward and well-defined; there are simple algorithms for ascertaining whether a given word belongs to the set. In this respect, such devices model one of the behaviors we require of a compiler: recognizing syntactically correct programs. Actually, finite automata have inherent limitations that make them unsuitable for modeling the compilers of modern programming languages, but they serve as an instructive first approximation. Compilers must also be capable of producing object code from source code, and a model of a simple translation device is presented in Chapter 7 and enhanced in later chapters. Logic circuitry can easily be devised to implement these automata in hardware. With appropriate data structures, these devices can likewise be modeled with software. An example is the highly interactive Turing's World, developed at Stanford University by Jon Barwise and John Etchemendy. This Applev Macintosh graphics package and the accompanying tutorial are particularly useful in experimenting with many forms of automata. Both hardware and software approaches will be explored in this chapter. We begin our formal treatment with some fundamental definitions. 1.1 ALPHABETS AND WORDS The devices we will consider are meant to react to and manipulate symbols. Different applications may employ different character sets, and we will therefore take care to explicitly mention the alphabet under consideration. V Definition 1.1. I is an alphabet if/I is a finite nonempty set of symbols. a An element of an alphabet is often called a letter, although there is no reason to restrict symbols in an alphabet to consist solely of single characters. Some familiar examples of alphabets are the 26-letter English alphabet and the ASCII character set, which represents a standard set of computer codes. In this text we will usually make use of shorter, simpler alphabets, like those given in Example 1.1. EXAMPLE 1.1 i. {O, I} ii. {a, b, c} iii. {(O, 0), (0,1), (1, 0), (1, I)} It is important to emphasize that the elements (letters) of an alphabet are not restricted to single characters. In example (iii) above, the alphabet is composed of the ordered pairs in {O, I} x {O, I}. Such an alphabet will be utilized in Chapter 7 when we use sequential machines to construct a simple binary adder. Sec. 1.1 Alphabets and Words 25 Based on the definition of an alphabet, we can define composite entities called words or strings, which are finite sequences of symbols from the alphabet. V Definition 1.2. For a given alphabet l and a natural number n, a sequence of symbols 8,82 ... a, is a word (or string) over the alphabet l of length n ifffor each i = 1,2, ... .n, 8iEl. ~ As formally specified in Definition 1.5, the order in which the symbols of the word occur will be deemed significant, and therefore a word of length 3 can be identified with an ordered triple belonging to l x l x l. Indeed, one may view the three-letter word bC8 as a convenient shorthand for the ordered triple (b, C,8). A word over an alphabet is thus an ordered string of symbols, where each symbol in the string is an element of the given alphabet. An obvious example of words are what you are reading right now, which are words (or strings) over the standard English alphabet. In some contexts, these strings of symbols are occasionally called sentences. EXAMPLE 1.2 Let l = {O, 1, 2, 3, 4, 5, 6, 7, 8, 9}; some examples of words over this alphabet are i, 42 ii. 242342 Even though only three different members of l occur in the second example, the length of 242342 is 6, as each symbol is counted each time it occurs. To easily and succinctly express these concepts, the absolute value notation will be employed to denote the length of a string. Thus, 1421 = 2, 12423421 = 6, and 18,8283841 = 4. V Definition 1.3. For a given alphabet l and a word x = 8,82 ... a, over l, Ixl denotes the length of x. That is, 18J82' .. a, 1= n, ~ It is possible to join together two strings to form a composite word; this process is called concatenation. The concatenation of two strings of symbols produces one longer string of symbols, which is made up of the characters in the first string, followed immediately by the symbols of the second string. V Definition 1.4. Given an alphabet l, let x = 81 .. . a, and y = b, ... bm be strings where each a, Eland each bj E l. The concatenation of the strings x and y, denoted by x .y, is the juxtaposition of x and y; that is, X'y = 81' .. a.b, ... bm • ~ 26 Introduction and Basic Definitions Chap. 1 Note in Definition 1.4 that [x*YI=n+m=lxl+IYI. Some examples of string concatenation are i, aaa*bbb = aaabbb ii, home* run = homerun iii. a2 •b3 = aabbb Example (iii) illustrates a shorthand notation for denoting strings. Placing a superscript after a symbol means that this entity is a string made by concatenating it to itself the specified number of times. In a similar fashion, (ac)" is meant to express acacac. Note that an equal sign was used in the above examples. Formally, two strings are equal if they have the same number of symbols and these symbols match, character for character. V Definition 1.5. Given an alphabet ~, let x = a1 ••• an and Y = b1 .•. bm be strings over S. x and yare equal iffn = m and for each i = 1,2, ... .n, a, = b.. ~ The operation of concatenation has certain algebraic properties: it is associative, and it is not commutative. That is, i, ('Ix E~*)(Vy E~*)(Vz E~*)x'(Y'z) = (x*y)*z. ii. For most strings x and y, x .y =1= y .x. When the operation of concatenation is clear from the context, we will adopt the convention of omitting the symbol for the operator (as is done in arithmetic with the multiplication operator). Thus xyz refers to z -y *z. In fact, in Chapter 6 it will be seen that the operation of concatenation has many algebraic properties that are similar to those of arithmetic multiplication. It is often necessary to count the number of occurrences of a given symbol within a word. The notation described in the next definition will be an especially useful shorthand in many contexts. V Definition 1.6. Given an alphabet S, and some b E~, the length ofa word w with respect to b , denoted 1w Ib , is the number of occurrences of the letter b within that word. ~ EXAMPLE 1.3 i, [abb Ib =2 ii, 1abb Ie = 0 iii. 1100000000111888188888811 = 5 Sec. 1.1 Alphabets and Words 27 V Definition 1.7. Given an alphabet ~, the empty word, denoted by A, is defined to be the (unique) word consisting of zero letters. d The empty word is often denoted by E in other formal language texts. The empty string serves as the identity element for concatenation. That is, for all strings x, X'A=A'X=X Even though the empty word is represented by a single character, Ais a string but is not a member of any alphabet: Af/=.~. A particular string x can be divided into substrings in several ways. If we choose to break x up into three substrings u, v, and w, there are many ways to accomplish this. For example, if x = abeedbe, it could be written as ab-ccd-bc; that is, x = uvw, where u =ab, v = eed, and w = be. This x could also be written as abc-x-cdbc, where u = abc, v = A, and w = edbe. In this second case, Ixi = 7 = 3+0+4=lul+lvl+lwl* A fundamental structure in formal languages involves sets of words. A simple example of such a set is ~\ the collection of all words of exactly length k (for some kEN) that can be constructed from the letters of S. V Definition 1.8. Given an alphabet ~ and a nonnegative integer kEN, we define ~k= {x Ix is a word over ~ and Ixl = k} EXAMPLE 1.4 If ~ = {O, I} then ~o = {A} ~1 = {O, I} ~2 = {OO, 01,10, 11} ~3 = {OOO, 001, 010,011,100,101,110, 111} Ais the only element of ~o, the set of all words containing zero letters from ~. There is no difficulty in letting Abe an element (and the only element) of ~o, since each ~k is not necessarily an alphabet, but is instead a set of words; A, according to the definition, is indeed a word consisting ofzero letters. 28 Introduction and Basic Definitions Chap. 1 yo Definition 1.9. Given an alphabet I, define cc I* = U Ik=IouI1UI2UI3U ... k=O and co I+= U I k=I1uI2uI3u ... k=l I* is the set of all words that may be constructed from the letters of an alphabet I. I+ is the set of all nonempty words that may be constructed from I . . I*, like the set of natural numbers, is an infinite set. Although I* is infinite, each word in I* is of finite length. This property follows from the definition of I* and a property of natural numbers: any kEN must by definition be a finite number. I* is defined to be the union of all Ik, kEN. Since each such k is a finite number and every word in I k is of length k, then every word in I k must be of finite length. Furthermore, since I* is the union of all such I k, every word in I* must also be of finite length. While I * can contain arbitrarily long words, each of these words must be finite, just as every number in N is finite. Since I* is the union of all I kfor kEN, I* must also contain IO. In other words, besides containing all words that can be constructed from one or more letters of I, I* also contains the empty word A. While A$. I, AE I*. Arepresents a string and not a symbol, and thus the empty string cannot be in the alphabet I. However, A is included in I *, since I * is not just an alphabet, but a collection of words over the alphabet I. Note, however, that I+ is I* - {A}; I+ specifically excludes A. 1.2 DEFINITION OF A FINITE AUTOMATON We now have the building blocks necessary to define deterministic finite automata. A deterministic finite automaton is a mathematical model of a machine that accepts a particular set of words over some alphabet I. A useful visualization of this concept might be referred to as the black box model. This conceptualization is built around a black box that houses the finite-state control. This control reacts to the information provided by the read head, which extracts data from the input tape. The control also governs the operation of the output indicator, often depicted as an acceptance light, as shown in Figure 1.1. There is no limit to the number of symbols that can be on the tape (although each individual word must be of finite length). As the input tape is read by the machine, state transitions, which alter the current state of the automaton, take place within the black box. Depending on the word contained on the input tape, the light bulb either lights or remains dark when the end of the input string is reached, indicating acceptance or rejection of the word, respectively. We assume that the input head can sense when it has passed the last symbol on the tape. In some sense, a personal computer fits the finite-state control model; it reacts Sec. 1.2 Definition of a Finite Automaton Finite State Control 29 Figure 1.1 A model of a finite-state acceptor to each keystroke entered from the keyboard according to the current state of the CPU and its own internal memory. However, the number of possible bit patterns that even a small computer can assume is so astronomically large that it is totally impractical to model a computer in this fashion. Finite-state machines can be profitably used to describe portions of a computer (such as parts of the arithmetic! logic unit, as discussed in Chapter 7, Example 7.15) and other devices that assume a reasonable number of states. Although finite automata are usually thought of as processing strings of letters over some alphabet, the input can conceptually be elements from any finite set. A useful example is the "brain" of a vending machine, which, say, dispenses 30et candy bars. EXAMPLE 1.5 The input to the vending machine is the set of coins {nickel, dime, quarter}, represented by n, d, and q in Figure 1.2. The machine may only "remember" a finite number of things; in this case, it will keep track of the amount of money that has been dropped into the machine. Thus, the machine may be in the "state" of remembering that no money has yet been deposited (denoted -in this example by <Oet», or that a single nickel has been inserted (the state labeled <Se>), or that either a dime or two nickels have been deposited «1Oet», and so on. Note that from state <Oet> there is an arrow labeled by the dime token d pointing to the state <10et> , indicating that, at a time when the machine "believes" that no money has been deposited, the insertion of a dime causes the machine to transfer to the state that remembers that ten cents has been deposited. From the <Oet> state, the arrows in the diagram show that if two nickels (n) are input the machine moves through the <Se> state and likewise ends in the state labeled <Iue>. The vending machine thus counts the amount of change dropped into the machine (up to 50et). The machine begins in the state labeled <Oet> and follows the arrows to higher-numbered states as coins are inserted. For example, depositing a nickel, a dime, and then a quarter would move the machine to the states <Se> , 30 . Introduction and Basic Definitions q Chap. 1 Figure 1.2 An implementation of a vending machine <15¢>, and then <40¢>. The states labeled 30¢ and above are doubly encircled to indicate that enough money has been deposited; if 30¢ or more has been deposited, then the machine "accepts," indicating that a candy bar may be selected. Finite automata are appropriate whenever there are a finite number of inputs and only a finite number of situations must be distinguished by the machine. Other applications include traffic signals and elevators (as discussed in Chapter 7). We now present a formal mathematical definition of a finite-state machine. V Definition 1.10. A deterministic finite automaton or deterministic finite acceptor (DFA) is a quintuple <I., S, so, 3, F>, where i, I. is the input alphabet (a finite nonempty set of symbols). ii. S is a finite nonempty set of states. iii. So is the start (or initial) state, an element of S. iv. 3 is the state transition function; 3: S x I.--i> S. v. Fis the set oi final (or accepting) states, a (possibly empty) subset of S. ~ The input alphabet, I., for any deterministic finite automaton A, is the set of symbols that can appear on the input tape. Each successive symbol in a word will cause a transition from the present state to another state in the machine. As specified by the 3 function, there is exactly one such state transition for each combination of a symbol a E I. and a state s E S. This is the origin of the word "deterministic" in the phrase "deterministic finite automaton." The various states represent the memory of the machine. Since the number of states in the machine is finite, the number of distinguishable situations that can be Sec. 1.2 Definition of a Finite Automaton 31 remembered by the machine is also finite. This limitation of the device's ability to store its past history is the origin of the word "finite" in the phrase "deterministic finite automaton." At any given time during processing, if the previous history of the machine is considered to be the reactions of the DFA to the letters that have already been read, then the current state represents all that is known about the history of the machine. The start state of the machine is the state in which the machine always begins processing a string. From this state, successive input symbols from!' are used by the I') function to arrive at successive states in the machine. Processing stops when the string of symbols is exhausted. The state in which the machine is left can either be a final state, in which case the word is accepted, or it can be anyone of the other states of S, in which case the word is rejected. To produce a formal description of the concepts defined above, it is necessary to enumerate each part of the quintuple that comprises the DFA. !', S, so, and Fare easily enumerated, but the function I') can often be tedious to describe. One device used to display the mapping I') is the state transition diagram. Besides graphically displaying the transitions of the 8 function, the state transition diagram for a deterministic finite automaton also illustrates the other four parts of the quintuple. A finite automaton state transition diagram is a directed graph. The states of the machine represent the vertices of the graph, while the mapping of the 8 function describes the edges. Final states are denoted by a doubly encircled state, and the start state is identified by a straight incoming arrow. Each domain element of the transition function corresponds to an edge in the directed graph. We formally define a finite automaton state transition diagram for <!" s, so, 8, F> as a directed graph G = (V, E), as follows: I, V=S, ii. E ={(s, t, a) Is, t E S, a E!, 1\ 8(s, a) = t}, where V is the set of vertices of the graph, and E is the set of edges connecting these vertices. Each element of E is an ordered triple, ts, t, a), such that s is the origin vertex, t is the terminus, and a is the letter from!' labeling the edge. Thus, for any vertex there is exactly one edge leaving that vertex for each element of!,. EXAMPLE 1.6 In the DFA shown in Figure 1.3, the set of edges E of the graph G is given by E = {(so, S1, a), (so, S2, b), (S1, S1, a), (S1, S2, b), (S2' S1, a), (S2' so, b)}. The figure also shows that So is the designated start state and that Slis the only final state. The state transition function for a finite automaton is often represented in the form of a state transition table. A state transition table is a matrix with the rows of the matrix labeled and indexed by the states of the machine, and the columns of the matrix labeled and indexed by the elements of the input alphabet; the entries in the table are the states to which the DFA will move. Formally, let T be a state transition table 32 Introduction and Basic Definitions Chap. 1 Figure 1.3 The DFA described in Example 1.6 for some deterministic finite automaton A = <I, S, sO, 8, F>, and let s E Sand a E I. Then the value of each matrix entry is given by the equation ("Is E S)('Va E I)T.. =8(s, a) For the automaton in Example 1.6, the state transition table is 8 a b So SI S2 SI SI S2 52 S1 So This table represents the following transitions: 8(so, a) = Sl 8(so, b) = S2 8(s}, a) = Sl 8(s}, b) = S2 8(S2' a) = Sl 8(S2' b) = So State transition tables are the most common method of representing the basic structure of an automaton within a computer. When represented as an array in the memory of the computer, access is very fast and the structure lends itself easily to manipulation by the computer. Techniques such as depth-first search are easily and efficiently implemented when the state transition diagram is represented as a table. Figure 1.4 illustrates an implementation of the 8 function via transition tables in Pascal. type Sigma = 'a' .. 'c: State= (sO, 51, 52); var TransitionTable=array [State, Sigma] of State; function Delta(S: State; A: Sigma) : State; begin Delta : = TransitionTable [S, A] end ; {Delta} Figure 1.4 A Pascal implementation of a state transition function Sec. 1.2 Definition of a Finite Automaton 33 With 8, we can describe the state in which we will find ourselves after processing a single letter. We also want to be able to describe the state at which we will arrive after processing an entire string. We will extend the 8 function to cover entire strings rather than just single letters; 8(s, x) will be the state we wind up at when starting at s and processing, in order, all the letters of the string x. While this is a relatively easy concept to (vaguely) state in English, it is somewhat awkward to formally define. To facilitate formal proofs concerning DFAs, we use the following recursive definition. V Definition 1.11. Given a DFA A = <~, S, SO ,8, F>, the extended statetransition function for A, denoted 8, is a function 8: S x ~*~ S defined recursively as follows: I, (Vs E S)(Va E~) ii, (Vs E S) iii. (Vs E S)(Vx E ~*)(Va E~) ~ 8(s, a) = 8(s, a) 8(s,~) = s 8(s,ax) = 8(8(s,a),x) The 8 function extends the 8 function from single letters to words. Whereas the 8 function maps pairs of states and letters to other states, the 8 function maps pairs of states and words to other states. (i) is the observation that 8 and 8 treat single letters the same; this fact is not really essential to the definition of 8, since it can be deduced from (ii) and (iii) (see the exercises). The 8 function maps the current state s and the first letter a, of a word w = a, ... an via the 8 function to some other state 1. It is then recursively applied with the new state t and the remainder of the word, that is, with a2' .. an' The recursion stops when the remainder of the word is the empty word ~. See Examples 1.7 and 1.11 for illustrations of computations using this recursive definition. Since the recursion of the 8 function all takes place at the end of the string, 8 is called tail recursive. Tail recursion is easily transformed into iteration by applying the 8 to successive letters of the input word and using the result of the previous application of 8 as an input to the current application. Figure 1.5 gives an implementation of the 8 function in Pascal. Recursion has been replaced by iteration, and previous function results are saved in an auxiliary variable T. The function Delta, the input alphabet Sigma, and the state set State agree with the definitions given in Figure 1.4. It stands to reason that if we start in state s and word y takes us to state r, and if we start in state r and word x takes us to state t, then the word yx should take us from state s all the way to 1. That is, if 8(s, y) = rand 8(r, x) = t, then 8(s, yx) should equal t, also. We can indeed prove this, as shown with the following theorem. V Theorem 1.1. Let A = <~, S, so, 8, F> be a DFA. Then (Vx E~*)(Vy E~*)(VSES)(8(s,yx) =8(8(s,y),x)) Proof. Define P(n) by (Vx E ~*)(Vy E ~n)(vs E S)(8(s,yx) = 8(8(s,y),x)) 34 Introduction and Basic Definitions Basis step: P(O): Let y E "J} (::;,Y = A). Chap. 1 (since y = A) (by Definition 1.llii) . (since x = A'x) (since y = A) = 8(8(s,y),x) = 8(8(s, A),x) = 8(s, x) = 8(s, Ax) =8(s,yx) Inductive step: Assume P(m): (Vx E I*)(Vy E Im)(vs E S)(8(s,yx) =8(8(s,y),x)). For any z E Im+l, (3a E I 1)(3y E Im) ~ z =ay. Then 8(s, zx) (by definition of z) = 8(s, ayx) (by Definition 1.11iii) = 8(8(s, a),yx) (since (3t E S) ~ 8(s, a) = t) = 8(t,yx) (by the induction assumption) = 8(8(t,y),x) (by definition oft) = 8(8(8(s,a),y),x) (by Definition 1.11iii) = 8(8(s, ay), x) (by definition of z) = 8(8(s,z),x) Therefore, P(m)::;, P(m + 1), and since this implication holds for any nonnegative integer m, by the principle of mathematical induction we can say that P(n) is true for all n EN. Since the statement therefore holds for any string y of any length, the assertion is indeed true for all y in I *. This completes the proof of the theorem. a Note that the statement of Theorem 1.1 is very similar to the rule iii of the recursive definition of the extended state transition function (Definition 1.11) with the string y replacing the single letter a. We will see a remarkable number of situations like this, where a recursive rule defined for a single symbol extends in a natural manner to a similar rule for arbitrary strings. As alluded to earlier, the state in which a string terminates is significant; in particular, it is important to determine whether the terminal state for a string happens to be one of the states that was designated to be a final state. V Definition 1.12. Given a DFA A = <I, S, sO, 8, F>, A acceptsa word wE I* iff 8(so, w) E F. a Sec. 1.2 Definition of a Finite Automaton 35 const MaxWordLength = 255; {anarbitrary constraint} type Word = record Length : 0 .. MaxWordLength ; Letters: packed array [0 .• MaxWordLengthj of Sigma end; {Word} function DeltaBar(S: State; W: Word) : State; {uses the function Delta defined previously} var T: State; I : 0 .. MaxWordLength; begin T:= S; if W. Length> 0 then for I : = 1 to W. Length do T :=Delta(T, W.Letters[I]); Del taBar : = T end; {DeltaBar} Figure 1.5 A Pascal implementation of the extended state transition function We say a word w is accepted by a machine A= <I, S, sO, 8,F> iff the extended state transition function "8 associated with A maps to a final state from So when processing the word w. This means that the path from the start state ultimately leads to a final state when the word w is presented to the machine. We will occasionally say that A recognizes w; a DFA is sometimes referred to as a recognizer. V Definition 1.13. Given a DFA A = <I, S, sO, 8, F>, A rejects a word wEI* iff 8(so, w) $. F. Ll In other words, a word w is rejected by a machine A = <I, S, sO, 8, F> iff the "8 function associated with A maps to a nonfinal state from So when processing the word w. EXAMPLE 1.7 Let where A = <I, S, sO, 8, F> I = {O, I} S ={qo,qd So = qo F={qd 36 and 8 is given by the transition table Introduction and Basic Definitions Chap. 1 II 0 1 The structure of this automaton is shown in Figure 1.6. To see how some of the above definitions apply, let x = 0100: S(qo, x) = S(qo, 0100) =S(8(qo,0),100) = S(qo,100) = S(8(qo, 1), 00) = S(qI, 00) = S(8(qI, 0), 0) = S(ql'O) = 8(qI, 0) =q! Thus, 8(qo,x) = ql E F, which means that x is accepted by A; A recognizes x. Now let y = 1100: S(qo,Y) = S(qo,1100) = S(8(qo, 1), 100) = S(q!, 100) = S(8(qI, 1), 00) = S(qo,OO) = S(8(qo, 0), 0) = S(qo, 0) =8(qo,0) = qo Therefore, S(qo, y) = qo f/=. F, which means that y is not accepted by A. Following the Pascal conventions defined in the previous programming fragments, the function Accept defined in Figure 1.7 tests for acceptance of a string by consulting a FinalState set and using DeltaBar to refer to the TransitionTable. Sec. 1.2 Definition of a Finite Automaton 1 37 1 Figure 1.6 The DFA discussed in Example 1.7 The functions Delta, DeltaBar, and Accept can be combined to form a Pascal program that models a DFA. The sample fragments given in Figures 1.4,1.5, and 1.7 rightly pass the candidate string as a parameter. A full program would be complicated by several constraints, including the awkward way in which strings must be handled in Pascal. To highlight the correspondence between the code modules and the automata definitions, the program given in Figure 1.8 handles input at the character level rather than at the word level. The definitions in the procedure Initialize reflect the structure of the DFA shown in Figure 1.9. Invoking this program will produce a response to a single input word. For example, a typical exchange would be cba Rejected Running this program again might produce ecce Accepted This behavior is essentially the same as that of the C program shown in Figure 1.10. The succinct coding clearly shows the relationship between the components of the quintuple for the DFA and the corresponding code. V Definition 1.14. Given an alphabet :S, L is a language over the alphabet :s iff L c:S*. ~ A language is a collection of words over some alphabet. If the alphabet is denoted by :S, then a language Lover :s is a subset of :s*. Since L C :s*, L may be finite or infinite. Clearly, the words used in the English language are a subset of function Accept (W : Word) : Boolean; {returns TRUE iff Wis accepted by the DFA} begin Accept :=DeltaBar(sD, W) in FinalState end; {Accept} Figure 1.7 A Pascal implementation of a test for acceptance 38 program DFA(input, output); Introduction and Basic Definitions Chap. 1 {This program tests whether input strings are accepted by the {automaton displayed in Figure 1.9. The program expects input {from the keyboard, delimited by a carriage return. No error {checking is done; letters outside ['a' .. 'c'] cause a range {error. type Sigma = 'a' .. 'c' ; State= (sO, s r , s2); var TransitionTable FinalState array [State, Sigma] of State; set of State; function Delta(s State; c : Sigma) begin Delta: =TransitionTable[s,c] end; { Delta} State; function DeltaBar(s var t : State; begin t : =s; State) State; { Step through the keyboard input one letter at a time. } while not eoln(input) do begin t : =Delta(t, input"); get(input) end; DeltaBar : = t end; { Del taBar } function Accept: boolean; begin Accept :=DeltaBar(So) in FinalState end; { Accept } procedure Initialize; begin FinalState : = [s2]; { Set up the state transition table. TransitionTable [sO, 'a'] : =s1; TransitionTable [sO, fbi] : =so; Sec. 1.2 Definition of a Finite Automaton 39 TransitionTable [sO, Ie'] : =s2; TransitionTable [s1, la'] : =s2; TransitionTable [s b , Ib l] : =sO; TransitionTable [s1, lei] : =sO; TransitionTable [s2, 'a l ] : =sO; TransitionTable [s2, 'b l ] : =sO; TransitionTable [s2, Ie'] : =s1; end; { Initialize begin { DFA } Initialize; if Accept then writeln(output, 'Accepted') else writeln(output, I Rejected') end. { DFA } Figure 1.8 A Pascal program that emulates the DFA shown in Figure 1.9 Figure 1.9 The DFA emulated by the programs in Figures 1.8 and 1.10 words over the Roman alphabet and this collection is therefore a language according to our definition. Note that a language L, in this context, is simply a list of words; neither syntax nor semantics are involved in the specification of L. Thus, a language as defined by Definition 1.14 has little of the structure or relationships one would normally expect of either a natural language (like English) or a programming language (like Pascal). EXAMPLE 1.8 Some other examples of valid languages are i, 0 ii, {WE{O,1}*llwl>5} iii. {A} lv. {A, bilbo, frodo, samwise} v. {xE{a,b}*llxl.=lxjb} 40 Introduction and Basic Definitions Chap. 1 # include <stdio.h> # define «int) c~(int) la l ) # define . This table implements the state transition function and is indexed by the current state and the current input letter. enum state /* ** ** */ enum state transition_table[3][3)={ s_1, s_O, s_2 }, s_2, s.,n. s_O }, s_O, s_O, s_1 } }; enum enum char { state delta(s, c) state s; c; return transition_table[ (int) s][ to_int(c»); enum state delta_bar(s) enum state s; enum state t; char c; t=s; /* ** Step through the input one letter at a time. */ while «char) (c = getchar(» != I\n') t=delta(t, c); return t; main() { if (delta_bar(s_O) == FINAL_STATE) printf("Accepted\n"); else printf("Rejected\n"); exit( 0) ; Figure 1.10 A C program that emulates the DFA shown in Figure 1.9 Sec. 1.3 Examples of Finite Automata 41 The empty language, denoted by 0 or { }, is different from {A}, the language consisting of only the empty word A. Whereas the empty language consists of zero words, the language consisting of A contains one word (which contains zero letters). The distinction is analogous to an example involving sets of numbers: the set {OJ, containing only the integer 0, is still a larger set than the empty set. Every DFA differentiates between words that do not reach final states and words that do. In this sense, each automaton defines a language. V Definition 1.15. Given aDFA A = <I, S, sO, 8, F>, the language accepted by A, denoted L(A), is defined to be L(A) = {w E I* 18(so, w) E F} L(A), the language accepted by a finite automaton A, is the set of all words w from I* for which 8(so,w) E F. In order for a word w to be contained in L(B), the path through the finite automaton B, as determined by the letters in w, must lead from the start state to one of the final states. For deterministic finite automata, the path for a given word w is unique: there is only one path since, at any given state in the automaton, there is exactly one transition for each a E I. This is not necessarily the case for another variety offinite automaton, the nondeterministic finite automaton, as will be seen in Chapter 4. V Definition 1.16. Given-an alphabet I, a language L ~ I* is finite automaton definable (FAD) qJthere exists some DFA B = <I, S, sO, 8, F>, such that L = L(B). d The set of all words over {O, I} that contain an odd number of Is is finite automaton definable, as evidenced by the automaton in Example 1.7, which accepts exactly this set of words. 1.3 EXAMPLES OF FINITE AUTOMATA This section illustrates the definitions of the quintuples and the state transition diagrams for some nontrivial automata. The following example and Example 1.11 deal with the recognition of tokens, an important issue in the construction of compilers. EXAMPLE 1.9 The set of FORTRAN identifiers is a finite automaton definable language. This statement can be proved by verifying that the following machine accepts the set of all valid FORTRAN 66 identifiers. These identifiers, which represent variable, subroutine, and array names, can contain from 1 to 6 (nonblank) characters, must 42 Introduction and Basic Definitions Chap. 1 begin with an alphabetic character, can be followed by up to 5 letters or digits, and may have embedded blanks. In this example, we have ignored the difference between capital and lowercase letters, and 0 represents a blank. ~ = ASCII r = ASCII - {a, b,c, ... x, y,z,O, 1,2,3,4,5,6,7,8,9,O} S = {so, SbS2, S3, S4, S5, S6, S7} So = So B a b c ... y z 0 1 ... 8 9 0 r 80 81 81 81 ... 81 81 87 87 ... 87 87 80 87 81 82 82 82 ... 82 82 82 82 ... 82 82 81 87 82 83 83 83 .•. 83 83 83 83 ... 83 83 82 87 83 84 84 84 ... 84 84 84 84 ... 84 84 83 87 84 85 85 85 ... 85 85 85 85 ... 85 85 84 87 85 86 86 86 ... 86 86 86 86 ... 86 86 85 87 86 87 87 87 ... 87 87 87 87 ... 87 87 86 87 87 87 87 87 ... 87 87 87 87 ... 87 87 87 87 F = {S1 ,S2, S3, S4, S5 ,S6} The entries under the column labeled r show the transitions taken for each member of the set r. The state transition diagram of the machine corresponding to this quintuple is displayed in Figure 1.11. Note that, while each of the 26 letters transition from So to S1 , a single arrow labeled a-z is sufficient to denote all these transitions. Similarly, the transition labeled ~ from S7 indicates that every element of the alphabet follows the same path. Figure 1.11 A DFA that recognizes valid FORTRAN identifiers Sec. 1.3 Examples of Finite Automata 43 Figure 1.12 The DFA M discussed in Example 1.10 EXAMPLE 1.10 The DFA M shown in Figure 1.12 accepts only those strings that have an even number of bs and an even number of as. Thus, L(M)={xE{a,b}*llxl.=Omod2/\ Ixlb=Omod2} The corresponding quintuple for M = <~, S, SO , 8, F> has the following components: ~ == {a, b] S = {<O,0>, <0, 1>, <1,0>, <1, 1>} so=<O,O> S a b <0,0> <1,0> <0,1> <0,1> <1,1> <0,0> <1,0> <0,0> <1,1> <1,1> <0,1> <1,0> F= {<O,O>} Note that the transition function can be succinctly specified by 8(<i,j>, a) = <1i,j> and 8(<i,j>, b) = <i, 1j> for all i,j E {O, I} See the exercises for some other problems involving congruence modulo 2. EXAMPLE 1.11 Consider a typical set of all real number constants in modified scientific notation format described by the BNF in Table 1.1. 44 Introduction and Basic Definitions TABLE 1.1 <sign>:: =+ 1- <digit>:: =0111213141516171819 <natural>:: =<digit> 1<digit> <natural> <integer> :: = <natural> I<sign> <natural> <real constant>:: =<integer> <integer>. <integer>. <natural> I <integer>.< natural>E<integer> Chap. 1 This set of productions defines real number constants like + 192., since <real constant> =? <integer>. <integer>. =? <sign><natural>. <sign> <natural>. =? +<natural>. + <natural>. =? +<digit><natural>. + <digit> <natural>. =? +1<natural>. + 1<natural>. =? + 1<digit> <natural>. + 1<digitc-cnaturalc-ic> + 1<digit><digit>. +1<digit><digit>. =? +1<digit>2. + 1<digit>2. =? + 192. Other possibilities are 1 3.1415 2.718281828 27. 42.42 1.0E-32 while the following strings do not qualify: .01 1. + 1 8.E*I0 Sec. 1.3 Examples of Finite Automata 45 The set of all real number constants that can be derived from the productions given in Table 1.1 is a FAD language. Let R be the deterministic finite automaton defined below. The corresponding state transition diagram is given in Figure 1.13. I = {O, 1, 2,3,4,5, 6, 7,8, 9, +, - ,E,.} S = {so, S1>~, S3, S4, S5, 8(" 8-), ss} SO=SO B 0 I 2 3 4 5 6 7 8 9 + E So S2 S2 82 82 82 82 S2 82 82 82 81 SI 87 87 81 82 82 82 82 S2 82 82 82 82 82 87 S7 87 87 82 S2 82 82 82 82 82 82 82 82 82 87 S7 87 83 83 88 88 88 88 88 88 Sa 88 S8 88 S7 S7 S7 87 84 Ss 8s Ss 8s 8s 8s 8s s, s, S5 S6 86 S7 87 8s 8s 8s 8s 8s 8s 8s Ss 8s Ss 85 87 87 S7 87 Sa s, 8s Ss Ss 8s 8s Ss Ss 8s Ss S7 87 87 S7 S7 S7 87 87 S7 87 S7 87 S7 87 S7 87 87 S7 87 88 88 S8 88 88 88 S8 S8 S8 88 88 87 S7 84 S7 F = {~, S3, S5 , ss} The language accepted by R, that is L (R), is exactly the set of all real number constants in modified scientific notation format described by the BNF in Table 1.1. + E Figure 1.13 A DFA that recognizes real number constants 46 Introduction and Basic Definitions Chap. 1 For example, let x = 3.1415: B(so, x) = B(so, 3.1415) = B(8(so, 3), .1415) = B(S2' .1415) = B(8(S2' .), 1415) = B(S3' 1415) = B(8(S3' 1),415) = B(S8' 415) = B(8(S8' 4),15) = B(S8' 15) = B(8(S8' 1),5) = B(S8' 5) = 8(S8' 5) = S8 S8 E F, and therefore 3.1415 E L (R). While many important classes of strings such as numerical constants (Example 1.11) and identifiers (Example 1.9) are FAD, not all languages that can be described by BNF can be recognized by DFAs. These limitations will be investigated in Chapters 8 and 9, and a more capable type of automaton will be defined in Chapter 10. 1.4 CIRCUIT IMPLEMENTATION OF FINITE AUTOMATA Now that we have described the mathematical nature of deterministic finite automata, let us turn to the physical implementation of such devices. We will investigate the sort of physical components that actually go into the "brain" of, say, a vending machine. Recall that the basic building blocks of digital logic circuits are logic gates; using 0 or False to represent a low voltage (ground) and 1 or True to represent a higher voltage (often +5 volts), the basic gates have the truth tables shown in Figure 1.14. Since our DFA will examine one letter at a time, we will generally need some type of timing mechanism, which will be regulated by a clock pulse; we will read one letter per pulse and allow enough interim time for transient signals to propagate through our network as we change states and move to the next letter on the input tape. The clock pulse will alternate between high and low voltages, as shown in Figure 1.15. For applications such as vending machines, the periodic clock pulse Sec. 1.4 Circuit Implementation of Finite Automata 47 P~ :~:~ :~ :~ NOT gate AND gate OR gate NAND gate NOR gate :t: p q p/\q P q pVq P q pi q p q p ~ q1 0 1 1 1 1 1 1 1 1 0 1 1 0o 1 1 0 0 1 0 1 1 0 1 1 0 0 0 1 0 0 1 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 Figure 1.14 Common logic gates and their truth tables volts +5 ,.... ,.... 0 tim e Figure 1.15 A typical clock pulse pattern for latched circuits would be replaced by a device that pulsed whenever a new input (such as the insertion of a coin) was detected. We need to retain the present status of the network (current state, letter, and so forth) as we move on to the next input symbol. This is achieved through the use of a D flip-flop (D stands for data or delay), which uses NAND gates and the clock signal to store the current value of, say, p', between clock pulses. The symbol for a D flip-flop (sometimes called a latch) is shown in Figure 1.16, along with the actual gates that comprise the circuit. . The output, p and ~p, will reflect the value of the input signal p' only after the high clock pulse is received and will retain that value after the clock drops to low (even if p' subsequently changes) until the next clock pulse comes along, at which time the output will reflect the new current value of p'. This is best illustrated by referring to the NAND truth table and tracing the changes in the circuit. Begin with clock =P =p' = 0 and ~p = 1, and verify that the circuit is stable. Now assume that p' changes to 1, and note that, although some internal values may change, p and -'p remain at 0 and 1, respectively; the old value of p' has been "remembered" bythe D flip-flop. Contrast this with the behaviorwhen we strobe the clock: assume that the clock now also changes to 1 so that we now have clock =p , =~p = 1, and p = O. When the signal propagates through the network, we find that p and -,p have changed to reflect the new value of p'; clock = p =p' = 1, and ~p = O. We will also have to represent the letters of our input alphabet by high and low voltages (that is, combinations of Os and Is). The. ASCII alphabet, for example, is quite naturally represented by 8 bits, 313z33343s36373s, where B, for example, has the bit pattern 01000010 (binary 66). One of these bit patterns should be reserved for 48 p'---I clock ---I p'---f-Il'-_--1 clock ---+-6-~--l Introduction and Basic Definitions I---p t---""p (a) P-...-t--P ~.......-+---...,p (b) Chap. 1 Figure 1.16 (a) A data flip-flop or latch (b) The circuitry for a D flip-flop indicating the end of our input string <EOS>. Our convention will be to reserve binary zero for this role, which means our ASCII end of string symbol would be 00000000 (or NULL). In actual applications using the ASCII alphabet, however, a more appropriate choice for <EOS> might be 00001101 (a carriage return) or 00001001 (a line feed) or 00100000 (a space). Our alphabets are likely to be far smaller than the ASCII character set, and we willhence need fewer than 8 bits of information to encode our letters. For example, if ~ =[b, c}, 2 bits, 31 and 32, will suffice. Our choice of encoding could be 00 = <EOS>, 01 = b, 10 = c, and 11 is unused. Sec. 1.4 Circuit Implementation of Finite Automata 49 In a similar fashion, we must encode state names. A machine with S = {ro, r., r2,r3, r4, rs} would need 3 bits (denoted by t l, t2, and t3) to represent the six states. The most natural encoding would. be ro = 000, rl = 001, r2 = 010, r, == 011, r, == 100, and r, == 101, with the combinations 110 and 111 left unused. Finally, a mechanism for differentiating between final and nonfinal states must be implemented (although this need not be engaged until the <EOS> symbol is encountered). Recall that we must illuminate the "acceptance light" if the machine terminates in a final state and leave it unlit if the string on the input tape is instead rejected by the DFA. A second "rejection light" can be added to the physical model, and exactly one of the two will light when <EOS> is scanned by the input head. EXAMPLE 1.12 When building a logical circuit from the definition of a DFA, we will find it convenient to treat <EOS> as an input symbol, and define the state transition function for it by (Vs E S)(8(s, <EOS» == s). Thus, the DFA in Figure 1.17a should be thought of as shown in Figure 1.17b. As we have only two states, a single state bit will suffice, representing So by t l = 0 and Sl by t l = 1. Since I ={b, e}, we will again use 2 bits, 31 and 32, to represent the input symbols. As before, 00 = <EOS>, 01 =b, 10 = e, and 11 is unused. (a) (b) Figure 1.17 (a) The DFA discussed in Example 1.12 (b) The expanded state transition diagram for the DFA implemented in Figure 1.18 Determining the state transition function will require knowledge of the current state (represented by the status of t l) and the current input symbol (represented by the pair of bits 31 and 32' These three input values will allow the next state t{ to be calculated. From the 8 function, we know that 8(so,b) = So 8(so,c) = Sl 8(Sh b) = So 8(Shc) = So These specifications correspond to the following four rows of the truth table for t;: 50 Introduction and Basic Definitions Chap. 1 tt at a2 t; t1 a[ a2 t; 0 0 1 0 So O. l=b So 0 1 0 1 which represents So 1 o=c sl 1 0 1 0 sl 0 l=b So sl -So1 1 0 0 1 o=c Adding the state transitions for <EOS> and using * to represent the outcome for the two rows corresponding to the unused combination 818z = 11 fills out the eight rows of the complete truth table, as shown in Table 1.2. TABLE 1.2 tt at az t; 0 0 0 0 0 0 1 0 0 1 0 1 0 1 1 * 1 0 0 1 1 0 1 0 1 1 0 0 1 1 1 * If we arbitrarily assume that the two don't-care combinations (*) are zero, the principle disjunctive normal form of t; contains just two terms: (----,tl /\ 81/\ ----'8z) V (t1/\ ----,81/\ ----'8z). It is profitable to reassign the don't-care value in the fourth row to 1, 'since the expression can then be shortened to (----,t j/\8j)V(tl/\----'81/\----'8z) by applying standard techniques for minimizing Boolean functions. Incorporating this into a feedback loop with a D flip-flop provides the heart of the digital logic circuit representing the DFA, as shown in Figure 1.18. t, f---+-+---l T,f--.-+--f--l clock accept reject -'3"-'a, .. ---.10---->, ..----/ <EGS> Figure 1.18 The circuitry implementing the DFA discussed in Example 1.12 Sec. 1.4 Circuit Implementation of Finite Automata 51 The accept portion of the circuitry ensures that we do not indicate acceptance when passing through the final state; it is only activated when we are in a final state while scanning the <BOS> symbol. Similarly, the reject circuitry can only be activated when the <BOS> symbol is encountered. When there are several final states, this part of the circuitry becomes correspondingly more complex. It is instructive to follow the effect a string such as bee has on the above circuit. Define 8i(j) as the jth value the bit a, takes on as the string bee is processed; that is, 8;(j) is the value of a, during the jth clock pulse. We then have 81(1)=0 82(1)=1 :::}b 81(2) = 1 82(2) = 0 :::} e 81(3) = 1 82(3) = 0 :::} e 81(4) = 0 82(4) = 0 :::} <BOS> Trace the circuit through four clock pulses (starting with t 1 = 0), and observe the cutrent values that t 1 assumes, noting that it corresponds to the appropriate state of the machine as each input symbol is scanned. Note that a six-state machine would require more and substantially larger truth tables. Since a state encoding would now need to specify tl> t2 , and t3, three different truth tables (for ti, t2, and t3) must be constructed to predict the next state transition. More significantly, the input variables would include tl> t2, t3, 81> and 82, making each table 32 rows long. Three D flip-flop feedback loops would be necessary to store the three values tl> t2, and t3 . Also, physical logic circuits of this type have the disconcerting habit of initializing to some random configuration the first time power is applied to the network. A true working model would thus need a reset circuit to initialize each t, to 0 in order to ensure that the machine started in state SQ. Slightly more complex set-reset flip-flops can be used to provide a hardware solution to this problem. However, a simple algorithmic solution would require the input tape to have a leading start-of-string symbol <SOS>. The definition of the state transition function should be expanded so that scanning the <SOS> symbol from any state will automatically transfer control to SQ. We will adopt the convention that <SOS> will be represented by the highest binary code; in ASCII, for example/ this would be 11111111, while in the preceding example it would be 11. To promote uniformity in the exercises, it is suggested that <SOS> should always be given the highest binary code and <BOS> be represented by binary zero; as in the examples given here, the symbols in ~ should be numbered sequentially according to their natural alphabetical order. In a similar fashion, numbered states should be given their corresponding binary codes. The reader should note, however, that other encodings might result in less complex circuitry. EXAMPLE 1.13 As a more complex example of automaton circuitry, consider the DFA displayed in Figure 1.19. Two flip-flops t 1 and t2 will be necessary to represent the three states, 52 Introduction and Basic Definitions Chap. 1 Figure 1.19 The DFA discussed in Example 1.13 most naturally encoded as So = 00, SI = 01, S2 = 10, with S3 = 11 unused. Employing both <SOS> and <EOS> encodings yields the DFA in Figure 1.20. Note that we must account for the possibility that the circuitry might be randomly initialized to t1=1 and t2=1; we must ensure that scanning the <SOS> symbol moves us back into the "real" part of the machine. Two bits of information (81 and 82) are also needed to describe the input symbols. Following our conventions, we assign <EOS> = 00, 8 = 01, b = 10, and <SOS> = 11. The truth table for both the transition function and the conditions for acceptance is given in Table 1.3. In the first row, t1= 0 and t2= 0 indicate state So, while a1 = 0 and a2= 0 denote the <EOS> symbol. Since 8(so,<EOS» = So, t; = 0 and t2= O. We do not want to accept a string that ends in So, so accept = 0 also. The remaining rows are determined similarly. The (nonminimized) circuitry for this DFA is shown in Figure 1.21. TABLE 1.3 t1 t2 a1 a2 t; t2 accept 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 1 0 1 0 1 1 0 0 0 1 1 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 1 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 0 1 0 1 1 0 0 0 1 1 0 0 • • • 1 1 0 1 • • • 1 1 1 0 • • • 1 1 1 1 0 0 0 Sec. 1.4 Circuit Implementation of Finite Automata 53 <SOS>.<EOS> Figure 1.20 The expanded state transition diagram for the DFA implemented in Figure 1.21 ~-RJ..... t,,, ..... f:," .." .....1\ I- -- ""'A"'~t>-7 m t,l---_ t,,,..... t.. ......." .....1\ t 1 clock 1; . accept. I-- - -~<: ..... t,,, t,,, .......,, .....1\ . ~ 'A"'A'Ã to _ ~"'A" ~"'A" t z clock 1;- .. ....... .....1\ 1\ Figure 1.21 The circuitry implementing the DFA discussed in Example 1.13 54 1.5 APPLICATIONS OF FINITE AUTOMATA Introduction and Basic Definitions Chap. 1 In this chapter we have described the simplest form of finite automaton, the DFA. Other forms of automata, such as nondeterministic finite automata, pushdown automata, and Turing machines, are introduced later in the text. We close this chapter with three examples to motivate the material in the succeeding chapters. When presenting automata in this chapter, we made no effort to construct the minimal machine. A minimal machine for a given language is one that has the least number of states required to accept that language. EXAMPLE 1.14 In Example 1.5, the vending machine kept track of the amount of change that had been deposited up to salt. Since the candy bars cost only 301t, there is no need to count up to salt. In this sense, the machine is not optimal, since a less complex machine can perform the same task, as shown in Figure 1.22. The corresponding quintuple is <[n, d, q},{so, Ss, SIO, SIS, S20, S2S, S30}, so, 5, {S30}>, where for each state s., 5 is defined by 5(s;,0) = Smin{30,; +S} 5(s;, d) = Smin{30,;+ IO} 5(s;, q) = Smin{30,i+2S} Note that the higher-numbered states in Example 1.5 were all effectively "remembering" the same thing, that enough coins had been deposited. These final states have been coalesced into a single final state to produce the more efficient machine in Figure 1.22. In the next two chapters, we develop the theoretical background and algorithms necessary to construct from an arbitrary DFA the minimal machine that accepts the same language. As another illustration of the utility of concepts relating to finite-state machines, we will consider the formalism used by many text editors to search for a particular target string pattern in a text file. To find ababb in a file, for example, a Figure 1.22 The automaton discussed in Example 1.14 Sec. 1.5 Applications of Finite Automata 55 naive approach might consist of checking whether the first five characters of the file fit this pattern, and next checking characters 2 through 6 to find a match, and so on. This results in examining file characters more than once; it ought to be possible to remember past values, and avoid such duplication. Consider the text string aabababbb. By the time the fifth character is scanned, we have matched the first four characters of ababb. Unfortunately, a, the sixth character of aabababbb, does not produce the final match; however, since characters 4, 5, and 6 (aba) now match the first three characters of the target string, it does allow for the possibility of . characters 4 through 8 matching (as is indeed the case in this example). This leads to a general rule: If we have matched the first four letters of the target string, and the next character happens to be a (rather than the desired b), we must remember that we have now matched the first three letters of the target string. "Rules" such as these are actually the state transitions in the DFA given in the next example. State s, represents having matched the first i characters of the target string, and the rule developed above is succinctly stated as 8(S4' a) == S3' EXAMPLE 1.15 A DFA that accepts all strings that contain ababb as a substring is displayed in Figure 1.23. The corresponding quintuple is <{a, b},{so, s., S2, S3, S4, s.}, so, 8, {ss}>, where 8 is defined by 8 a b 50 51 50 51 51 52 52 53 50 53 51 54 54 53 55 55 55 55 Figure 1.23 A DFA that accepts strings containing ababb It is a worthwhile exercise to test the operation of this DFA on several text strings and verify that the automaton is indeed in state s, exactly when it has matched the first i characters of the target string. Note that if we did not care what the third character of the substring was (that is, if we were searching for occurrences 56 Introduction and Basic Definitions Chap. 1 of ababb or abbbb), a trivial modification of the above machine would allow us to search for both substrings at once, as shown in Figure 1.24. The corresponding quintuple is <{a, b},{so, Sll S2, S3, S4, s.}, so, 8, [s.]>, where 8 is defined by 8 a b 80 81 80 81 S1 82 82 83 83 83 S1 84 84 S3 85 85 85 85 a.b Figure 1.24 A DFA that accepts strings that contain either ababb or abbbb In this case, we required one letter between the initial part of the search string (ab) and the terminal part (bb). It is possible to modify the machine to accept strings that contain ab, followed by any number of letters, followed by bb. This type of machine would be useful for identifying comments in many programming languages. For example, a Pascal comment is essentially of the form (*, followed by most combinations of letters, followed by the first occurrence of *). It should be noted that the machine in Example 1.15 is highly specialized and tailored for the specific string ababb; other target strings would require completely different recognizers. While it appears to require much thought to generate the appropriate DFA for a given string, we will see how the tools presented in Chapter 4 can be used to automate the entire process. Example 1.15 indicates how automata can be used to guide the construction of software for matching designated patterns. Finite-state machines are also useful in designing hardware that detects designated sequences. Example 4.7 will explore a communications application, and the following discussion illustrates how these concepts can be applied to help evaluate the performance of computers. A computer program is essentially a linear list of machine instructions, stored in consecutive memory locations. Each memory location holds a sequence of bits that can be thought of as words comprised of Os and Is. Different types of instructions are represented by different patterns of bits. The CPU sequentially fetches these instructions and chooses its next action by examining the incoming bit pattern to determine the type of instruction that should be executed. The sequences of bits that encode the instruction type are called opcodes. Various performance advantages can be attained when one part of the CPU prefetches the next instruction while another part executes the current instruction. However, computers must have the capability of altering the order in which instrucSec. 1.5 Applications of Finite Automata 57 tions are executed; branch instructions allow the CPU to avoid the anticipated next instruction and instead begin executing the instructions stored in some other area of memory. When a branch occurs, the prefetched instruction will generally need to be replaced by the proper instruction from the new area of memory. The consequent delay can degrade the speed with which instructions are executed. Irrespective of prefetching problems, it should be clear that a branch instruction followed immediately by another branch instruction is inefficient. If a CPU is found to be regularly executing two or more consecutive branch instructions, it may be worthwhile to consider replacing such series of branches with a single branch to the ultimate destination [FERR]. Such information would be determined by monitoring the instruction stream and searching for patterns that represented consecutive branch opcodes. This activity is essentially the pattern recognition problem discussed in Example 1.15. It is unwise to try to collect the data representing the contents of the instruction stream on secondary storage so that it can be analyzed later. The volume of information and the speed with which it is generated preclude the collection of a sufficiently large set of data points. Instead, the preferred solution uses a specially tailored piece of hardware to monitor the contents of the CPU opcode register and increment a hardware counter each time the appropriate patterns are detected. The heart of this monitor can be built by transforming the appropriate automaton into the corresponding logic circuitry, as outlined in Section 1.4. Unlike the automaton in Example 1.15, the automaton model for this application would allow transitions out of the final state, so that it may continue to search for successive patterns. The resulting logic circuitry would accept as input the bit patterns currently present in the opcode register, and send a pulse to the counter mechanism each time the accept circuitry was energized. Note that in this case we would not want to inhibit the accept circuitry by requiring an <EOS> symbol to be scanned. Indeed, we want the light on our conceptual black box to flicker as we process the data, since we are intent on counting the number of times it flickers during the course of our monitoring. EXAMPLE 1.16 We close this chapter with an illustration of the manner in which computational algorithms can profitably use the automaton abstraction. Network communications between independent processors are governed by a protocol that implements a finite state control [TANE]. The Kermit protocol, developed at Columbia University, is widely employed to communicate between processors and is still most often used for its original purpose: to transfer files between micros and mainframes [DACR]. During a file transfer, the send portion of Kermit on the source host is responsible for delivering data to the receive portion of the Kermit process on the destination host. The receive portion of Kermit reacts to incoming data in much the same way as the machines presented in this chapter. The receive program starts in a state of waiting for a transfer request (in the form of an initialization packet) to signal the commencement of a file transfer (state R in Figure 1.25). When such a packet is received, Kermit transitions to the RF state, where it awaits a file-header 58 Introduction and Basic Definitions Chap. 1 Figure 1.25 The state transition diagram for the receive portion of Kermit, as discussed in Example 1.16 packet (which specifies the name of the file about to be transferred). Upon receipt of the file-header packet, it enters the RD state, where it processes a succession of data packets (which comprise the body of the file being transferred). An EOF packet should arrive after all the data are sent, which can then be followed by another file-header packet (if there is a sequence of files to be transferred) or by a break packet (if the transfer is complete). In the latter case, Kermit reverts to the start state R and awaits the next transfer request. The send portion of the Kermit process on the source host follows the behavior of a slightly more complex automaton. The state transition diagram given in Figure 1.25 succinctly describes the logic of the receive portion of the Kermit protocol; for simplicity, timeouts and error conditions are not reflected in the diagram. The input alphabet is {B, D, Z, H, S}, where B represents a break, D is a data packet, Z is EOF, H is a file-header packet, and S is a send-intention packet. The state set is {A, R, RF, RD}, where A denotes the abort state, R signifies receive, RF is receive jileheader, and RD is receive data. Note that unexpected packets (such as a data packet received in the start state R or a break packet received when data packets are expected in state RD) cause a transition to the abort state A. In actuality, the receive protocol does more than just observe the incoming packets; Kermit sends an acknowledgment (ACK or NAK) of each packet back to the source host. Receipt of the file header should also cause an appropriate file to be created and opened, and each succeeding data packet should be verified and its contents placed sequentially in the new file. A machine model that incorporates actions in response to input is the subject of Chapter 7, where automata with output are explored. EXERCISES 1.1. Recall how we defined 8 in this chapter: ('Is E S)('1a E 1) ('Is ES) ('Is E S)('1x E 1 *)('1a E 1) 8.(s, a) = 8(s, a) 8.(s, A) = s 8.(s, ax) = 8,(8(s, a),x) Chap. 1 Exercises 59 B, here denoted B" was tail recursive. Tail recursion means that all recursion takes place at the end of the string. Let us now define an alternative extended transition function, Bh, thusly: (Vs E S)(Va E I) Bh(s, a) = 8(s, a) (VsE S) Bh(s, x) = s (VsE S)(Va E I)(Vx E I*) Bh(s,xa) = 8(Bh(s,x), a) It is clear from the definition of Bhthat all the recursion takes place at the head of the string. For this reason, Bhis called head recursive. Show that the two definitions result in the same extension of 8, that is, prove by mathematical induction that (VsE S)(Vx E I*)(B,(s,x) = Bh(s,x» 1.2. Consider Example 1.14. The vending machine accepts coins as input, but if you change your mind (or find you do not have enough change), it will not refund your money. Modify this example to have another input, <coin-return>, which is represented by r and which will conceptually return all your coins. 1.3. (a) Specify the quintuple corresponding to the DFA displayed in Figure 1.26. (b) Describe the language defined by the DFA displayed in Figure 1.26. ~O'l Figure 1.26 The automaton discussed in Exercise 1.3 1.4. Construct a state transition diagram and enumerate all five parts of a deterministic finite automaton A = <{a, b, c},S, so,8, F> such that L(A) = {x I Ixl is a multiple of 2 or 3}. 1.5. Let I = {O, I}. Construct deterministic finite automata that will accept each of the following languages, if possible. (a) LI = {x I IxImod 7 = 4} (b) Lz=I*-{wI3ñBw=al ... an 1\ an= l } (c) L3 = {y Ilylo = Iyll} 1.6. Let I = {a, b}. (a) Construct deterministic finite automata AI> Az, A3 , and ~ such that: I, L(AI ) = {x 1(lxls is odd)I\(lxlbis even)} ii. L(Az) = {y I(Iy Is is even) V (Iy Ib is odd)} iii. L(A3 ) ={z 1(lzls is even) V (14is even)} (V represents exclusive-or) tv, L(~) = {z Ilzls is even} (b) How does the structure of each of these machines relate to the one defined in Example 1.lO? 1.7. Modify the machine M defined in Example 1.10 so that the language accepted by the machine consists of strings x E {a, b}", where both IxIs and IxIb are even and IxI> 0, that is, the new machine should accept L (M) - {x}, 1.8. Let M = <I, S, So,8, F> be an (arbitrary) DFA that accepts the language L(M). Write down a general procedure for modifying this machine so that it will accept L(M) - {x}, (Specify the five parts of the new machine and justify your statements.) It may be helpful to do this for a specific machine (as in Exercise 1.7) before attempting the general case. 1.9. Let M = <I, S, so,8, F> be an (arbitrary) DFA that accepts the languageL(M). Write 60 Introduction and Basic Definitions Chap. 1 down a general procedure for modifying this machine so that it will accept L(M) U {A}. (Specify the five parts of the new machine and justify your statements.) 1.10. Let I = {a, b, d} and 'IJf = {x E I* I(x begins with d) V (x contains two consecutive bs)}. (a) Draw a machine that will accept 'IJf. (b) Formally specify the five parts of the DFA from part (a). 1.11. Let I = {a, b, c}and <I> = {x E I* 1every b in x is immediately followed by c}. (a) Draw a machine that will accept <1>. (b) Formally specify the five parts of the DFA from part (a). 1.12. Let I = {O, 1, 2, 3, 4, 5, 6, 7,8, 9}.Consider the base 10 numbers formed by strings from I*: 14 represents fourteen, the three-digit string 205 represents two hundred and five, and so on. Let n = {x E I *Ithe number represented by x is evenly divisible by 7}= {A, 0, 00, 000, ... ,7,07,007, ... ,14,21,28,35, ... }. (a) Draw a machine that will accept n. (b) Formally specify the five parts of the DFA from part (a). 1.13. Let I={0,1,2,3,4,5,6,7,8,9}. Let r={xEI*lthe number represented by x is evenly divisible by 3}. (a) Draw a three-state machine that will accept r. (b) Formally specify the five parts of the DFA from part (a). 1.14. Let I={0,1,2,3,4,5,6,7,8,9}. Let K={xEI*lthe number represented by x is evenly divisible by 5}. (a) Draw a five-state DFA that accepts K. (b) Formally specify the five parts of the DFA from part (a). (c) Draw a two-state DFA that accepts K. (d) Formally specify the five parts of the DFA from part (c). 1.15. Let I = {O, 1,2,3,4,5,6,7,8, 9}. Draw a DFA that accepts the first eight primes. 1.16. (a) Find all ten combinations of u, v, and w such that uvw = cab (one such combination is u = c, v = A, w = ab). (b) In general, if x is oflength n, and uvw = x, how many distinct combinations of u, v, and w will satisfy this constraint? 1.17. Let I = {a, b} and E = {x E I* Ix contains (at least) two consecutive bs 1\ x does not contain two consecutive as}. Draw a machine that will accept E. 1.18. The FORTRAN identifier in Example 1.9 recognized all alphabetic words, including those like DO, DATA, END, and STOP, which have different uses in FORTRAN. Modify Figure 1.11 to produce a DFA that will also reject the words DO and DATA while still accepting all other valid FORTRAN identifiers. 1.19. Consider the machine defined in Example 1.11. This machine accepts most realnumber constants in scientific notation. However, this machine does have some (possibly desirable) limitations. These limitations include requiring that a 0 precede the decimal point when specifying a number with a mantissa less than 1. (a) Modify Figure 1.13 so that it will accept the set of real-number constants described by the following BNF. <sign>:: =+ 1- <digit>:: =0111213141516171819 <natural>:: = <digit> I<digit> <natural> . <integer> :: = <natural> I<sign> <natural> <real constant> :: = <integer> Chap. 1 Exercises 61 <integer>. I . <natural> I <sign>.<natural> I .<natural>E<integer> I <sign>. <natural>E<integer> I <integer>. <natural> I <integer>. <natural>E<integer> (b) Write a program in your favorite programming language to implement the automaton derived in part (a). The program should read a line of text and state whether or not the word on that line was accepted. 1.20. Show that part (i) of Definition 1.11 is implied by parts (ii) and (iii) of that definition. 1.21. Develop a more succinct description of the transition function given in Example 1.9 (compare with the description in Example 1.10). 1.22. Let the universal set be {a, b]". Give an example of (a) A finite set. (b) A cofinite set. (c) A set that is neither finite nor cofinite. 1.23. Consider the DFA given in Figure 1.27. (a) Specify the quintuple for this machine. (b) Describe the language defined by this machine. a Figure 1.27 The DFA discussed in Exercise 1.23 1.24. Consider the set consisting of the names of everyone in China. Is this set a FAD language? 1.25. Consider the set of all legal infix arithmetic expressions over the alphabet {A,B, +, -, *,1} without parentheses (assume normal procedence rules apply). Is this set a FAD language? If so, draw the machine. 1.26. Consider an arbitrary deterministic finite automaton M. (a) What aspect of the machine determines whether AE L(M)? (b) Specify a condition that would guarantee that L (M) = I*. (c) Specify a condition that would guarantee that L (M) = 0. 62 Introduction and Basic Definitions Chap. 1 1.27. Construct deterministic finite automata to accept each of the following languages. (a) {x E {a, b, c]" Iabc is a substring of x} (b) {x E {a, b, c}*Iacaba is a substring of x} 1.28. Consider Example 1.14. The vending machine had as input nickels, dimes, and quarters. When 30e had been deposited, a candy bar could be selected. Modify this machine to also accept pennies, denoted by p, as an additional input. How does this affect the number of states in the machine? 1.29. (a) Describe the language defined by the following quintuple (compare with Figure 1.28). ~ ={a,b] 8(to,a) = to S = {to, t1} 8(to,b) = t1 So = to 8(t!, a) = t 1 F ={t1} 8(t!, b) = to (b) Rigorously prove the statement you made in part (a). Hint: First prove the inductive statement pen): ("Ix E In)((8(to,x) = tõ Ix Ib is even) A(8(to,x) = t1 ~ Ix Ib is odd». b Figure 1.28 The DFA discussed in Exercise 1.29 1.30. Consider a vending machine that accepts as input pennies, nickels, dimes, and quarters and dispenses Wecandy bars. (a) Draw a DFA that models this machine. (b) Define the quintuple for this machine. (c) How many states are absolutely necessary to build this machine? 1.31. Consider a vending machine that accepts as input nickels, dimes, and quarters and dispenses Wecandy bars. (a) Draw a DFA that models this machine. (b) How many states are absolutely necessary to build this machine? (c) Using the standard encoding conventions, draw a circuit diagram for this machine (include <EOS> but not <SOS> in the input alphabet). 1.32. Using the standard encoding conventions, draw a circuit diagram that will implement the machine given in Exercise 1.29, as follows: (a) Implements both <EOS> and <SOS>. (b) Uses neither <EOS> nor <SOS>. 1.33. Using the standard encoding conventions, draw a circuit diagram that will implement the machine given in Exercise 1.7, as follows: (a) Implements both <EOS> and <SOS>. (b) Uses neither <EOS> nor <SOS>. 1.34. Modify Example 1.12 so that it correctly handles the <SOS> symbol; draw the new circuit diagram. 1.35. Using the standard encoding conventions, draw a circuit diagram that will implement the machine given in Example 1.6, as follows: Chap. 1 Exercises 63 (a) Implements both <EOS> and <SOS>. (b) Uses neither <EOS> nor <SOS>. 1.36. Using the standard encoding conventions, draw a circuit diagram that will implement the machine given in Example 1.10, as follows: (a) Implements both <EOS> and <SOS>. (b) Uses neither <EOS> nor <SOS>. 1.37. Using the standard encoding conventions, draw a circuit diagram that will implement the machine given in Example 1.14 (include <EOS> but not <SOS> in the input alphabet). 1.38. Using the standard encoding conventions, draw a circuit diagram that will implement the machine given in Example 1.16; include the <SOS> and <EOS> symbols. 1.39. Let ~ = {a, b, c}. Let L = {x E {a, b, c}* I Ix Ib = 2}. (a) Draw a DFA that accepts L. (b) Formally specify the five parts of a DFA that accepts L. 1.40. Draw a DFA accepting {x E {a, b, c}* Ievery b in x is eventually followed by c};that is, x might look like baabacaa, or bcacc, and so on. 1.41. Let ~ = {a, b]. Consider the language consisting of all words that have neither consecutive as nor consecutive bs. (a) Draw a DFA that accepts this language. (b) Formally specify the five parts of a DFA that accepts L. 1.42. Let ~ ={a, b, c}. Let L = {x E {a, b, c}* Ilxl.;;0 mod 3}. (a) Draw a DFA that accepts L. (b) Formally specify the five parts of a DFA that accepts L. 1.43. Let ~ = {a, b, (, *, )}. Recall that a Pascal comment is essentially of the form: (* followed by most combinations of letters followed by the first occurrence of *). While the appropriate alphabet for Pascal is the ASCII character set, for simplicity we will let ~ = {a, b, (, *, )}. Note that (*b(*b(a)b*) is a single valid comment, since all characters prior to the first *) (including the second (* ) are considered part of the comment. Consequently, comments cannot be nested. (a) Draw a DFA that recognizes all strings that contain exactly one valid Pascal comment (and no illegal portions of comments, as in aa(*b*)b(*a). (b) Draw a DFA that recognizes all strings that contain zero or more valid (that is, unnested) Pascal comments. For example, a(*b(*bb*)ba*)aa and a(*b are not valid, while aOa(**)b(*ab*) is valid. 1.44. (a) Is the set of all postfix expressions over {A,B, + , -, *, I} with two or fewer operators a FAD language? If it is, draw a machine. (b) Is the set of all postfix expressions over {A,B, +, -, *, I}with four or fewer operators a FAD language? If it is, draw a machine. (c) Is the set of all postfix expressions over {A,B, +, -, *, I}with eight or fewer operators a FAD language? If it is, draw a machine. (d) Do you think the set of all postfix expressions over {A,B,+,-,*,I} is a FAD language? Why or why not? 1.45. Let ~ = {a, b, c}. Consider the language consisting of all words that begin and end with different letters. (a) Draw a DFA that accepts this language. (b) Formally specify the five parts of a DFA that accepts this language. 64 Introduction and Basic Definitions Chap. 1 , ,. 1.46. Let I = {a, b, c}. (a) Draw a DFA that rejects all words for which the last two letters match. (b) Formally specify the five parts of the DFA. 1.47. Let I = {a, b, c}. (a) Draw a DFA that rejects all words for which the first two letters match. (b) Formally specify the five parts of the DFA. 1.48. Prove that the empty word is unique; that is, using the definition of equality of strings, show that if x and yare empty words then x = y. 1.49. For any two strings x and y, show that Ixy 1= Ixl + Iy I. 1.50. (a) Draw the DFA corresponding to C = <{a, b, c},{to, t.l,qo,8, {h}>, where 8(to, a) = to 8(t!, a) = to 8(ta, b) = h 8(t!, b) = t l 8(to, c) = t l 8(h, c) = to (b) Describe L(C). (c) Using the standard encoding conventions, draw a circuit diagram for this machine (include <EOS> but not <SOS> in the input alphabet). 1.51. Let I = {I, V, X, L, C, D, M}. Recall that VVI is not considered to be a Roman numeral. (a) Draw a DFA that recognizes strict-order Roman numerals; that is, 9 must be represented by VIllI rather than IX, and so on. (b) Draw a DFA that recognizes the set of all Roman numerals; that is, 9 can be represented by IX, 40 by XL, and so on. (c) Write a Pascal program based on your answer to part (b) that recognizes the set of all Roman numerals. 1.52. Describe the setof words accepted by the DFA in Figure 1.9. 1.53. Let I = {O, 1, 2, 3, 4,5,6,7,8, 9}. Let L, = {x E I *1 the sum of the digits of x is evenly divisible by n}. Thus, L7 = {A, 0, 7, 00, 07,16,25,34,43,52,59,61,68,70,77,86,95,000,007, ... }. (a) Draw a machine that will accept L7 • (b) Formally specify the five parts of the DFA given in part (a). (c) Draw a machine that will accept L3 . (d) Formally specify the five parts of the DFA given in part (c): (e) Formally specify the five parts of a DFA that will recognize Ln. 1.54. Consider the last row of Table 1.3. Unlike the preceding three rows, the outputs in this row are not marked with the don't-care symbol. Explain. CHAPTER CHARACTERIZATION of FAD LANGUAGES Programming languages can be thought of, in a limited sense, as conforming to the definition of a language given in Chapter 1. We can consider a text file as being one long "word," that is, a string of characters (including spaces, carriage returns, and so on). In this sense, each Pascal program can be thought of as a single word over the ASCII alphabet. We might define the language Pascal as the set of all valid Pascal programs (that is, the valid "words" are those text files that would compile with no compiler errors). This and many other languages are too complicated to be represented by the machines described in Chapter 1. Indeed, even reliably matching an unlimited number of begin and end statements in a file is beyond the capabilities of aDFA. The goals for this chapter are to develop some tools for identifying these non-FAD languages and to investigate the underlying structure of finite automaton definable languages. We begin with the exploration of the relations that describe that structure. 2.1 RIGHT CONGRUENCES To characterize the structure of FAD languages, we will be dealing with relations over I*, that is, we will relate strings to other strings. Recall that an equivalence relation must be reflexive, symmetric, and transitive. The identity relation over I*, in which each string is related to itself but to no other string, is an example of an equivalence relation. The main tool we will need to understand which kinds of languages can be 65 66 Characterization of FAD Languages Chap. 2 represented by finite automata is the concept of a right congruence. If we allow the set of all strings that terminate in some given state to define an equivalence class, the states of a DFA naturally partition !,* into equivalence classes (as formally presented later in Definition 2.4). Due to the structure imposed on the machine, these classes have special relationships that are not found in ordinary equivalence relations. For example, if &(s, a) = t, then, given any word x in the class corresponding to the state s, appending an a to this word to form xa is guaranteed to produce a word listed in the class corresponding to the state 1. Right congruences, defined below, allow us to break up E" in the same fashion that a DFA breaks up!'*. V Definition 2.1. Given an alphabet !', a relation R between pairs of strings (R C!'* x !,*) is a right congruence in!'* iff the following four conditions hold: (Vx E !,*) (Vx,y E!'*) (Vx,y,z E!'*) (Vx,y E !,*) (xRx) (xRy =?yRx) (xRy /\yRz =?xRz) (x Ry =? (Vu E !, *)(xu Ryu)) (R) (S) (T) (RC) Note that if P is a right congruence then the first three conditions imply that P must be an equivalence relation; for example, if!, = {a, b}, aa P aa by reflexivity, and if (abb, aba) E P, then by symmetry (aba, abb) E P, and so forth. Furthermore, if abb P aba, then the right congruence property guarantees that abba P abaa if u = a abbbPabab abbaa P abaaa abbbbaabb P ababbaabb if u = b ifu=aa if u = bbaabb and so on. Thus, the presence of just one ordered pair in P requires the existence of many, many more ordered pairs. This might seem to make right congruences rather rare objects; there are, however, an infinite number of them, many of them rather simple, as shown by the following examples. EXAMPLE 2.1 Let!' = {a, b}, and let R be defined by x Ry ~ Ixl-Iy I is even. It is easy to show that this R is an equivalence relation (see the exercises) and partitions S" into two equivalence classes: the even-length words and the odd-length words. Furthermore, R is a right congruence: for example, if x = abb and y = baabb, then abb R baabb, since Ixl-Iyl = 3 5 = -2, which is even. Note that for any choice of u, abbu R baabbu, since Ixu I-Iyu I will also be -2. Thus abbu R baabbu for every choice of u. The same is true for any other pair of words x and y that are related by R, and so R is indeed a right congruence. Sec. 2.1 Right Congruences 67 EXAMPLE 2.2 Let l = {a, b, e}, and let Rzbe defined by x RzY ~x and Y end with the same letter. It is straightforward to show that Rz is a right congruence (see the exercises) and partitions l* into four equivalence classes: those words ending in a, those words ending in b, those words ending in e, and {A}. The relation Rz was based on the placement of letters within words, while Example 2.1 was based solely on the length of the words. The following definition illustrates a way to produce a relation in l* based on a given set of words L. V Definition 2.2. Given an alphabet l and a language L ~ l ", the relation induced by L in l *, denoted by RL , is defined by ('tJx,Y El*)(xRLy ~ ('tJu El*)(xu EL~yuEL)) EXAMPLE 2.3 Let K be the set of all words over {a, b}" that are of odd length. Those strings that are in K are used to define exactly which pairs of strings are in RK • For example, we can determine that ab RK bbaa, since it is true that, for any u E l *, either abu fl. K and bbaau fl. K (when Iu I is even) or abu E K and bbaau E K (when Iu I is odd). Note that ab and a are not related by RK , since there are choices for u that would violate the definition of RK : abx $. K and yet aA E K. In this case, RK turns out to be the same as the relation R defined in Example 2.1. Recall that relations are sets of ordered pairs, and thus the claim that these two relations are equal means that they are equal as sets; an ordered pair belongs to R exactly when it belongs to RK : R = RK iff ('tJx,y E l*)(xRy ~XRKY) The strings ab and bbaa are related by R in Example 2.1, and they are likewise related by RK • A similar statement is true for any other pair that was in the relation R; it will be in RK , also. Additionally, it can be shown that elements that were not in R will not be in RK either. Notice that RK relates more than just the words in K; neither ab nor bbaa belongs to K, and yet they were related to each other. This simple language K happens to partition l * into two equivalence classes, corresponding to the language itself and its complement. Less trivial languages will often form many equivalence classes. The relation RL defined by a language L has all the properties given in Definition 2.1. V Theorem 2.1. Let l be an alphabet. If L is any language over l (that is, L ~ l*), the relation RL given in Definition 2.2 must be a right congruence. Proof. See the exercises. 68 Characterization of FAD Languages Chap. 2 Note that the above theorem is very broad in scope: any language, no matter how complex, always induces a relation that satisfies all four properties of a right congruence. Thus, RL always partitions l* into equivalence classes. One useful measure of the complexity of a language L is the degree to which it fragments l*, that is, the number of equivalence classes in RL • V Definition 2.3. Given an equivalence relation P, the rank of P, denoted rk(P), is defined to be the number of distinct (and nonempty) equivalence classes ofP. . ~ The ranks of the relation in Example 2.3 was 2, since there were two equivalence classes, the set of even-length words and the set of odd-length words. In Example 2.2, rk (R2) = 4. The rank of RL can be thought of as a measure of the complexity of the underlying language L. Thus, for K in Example 2.3, rk(RK ) = 2, and K might consequently be considered to be a relatively simple language. Some languages are too complex to be recognized by finite automata; this relationship will be explored in the subsequent sections. While the way in which a language gives rise to a partition of I * may seem mysterious and highly nonintuitive, a deterministic finite automaton naturally distributes the words of E" into equivalence classes. The following definition describes the manner in which a DFA partitions I *. V Definition 2.4. Given a DFA M = <I, S, so, 8, F>, define a relation RM on I * as follows: ('rIx,y E 1*)(xRM y ¢:> 8(so,x) =8(so,Y» RM relates all strings that, when starting at so, wind up at the same state. It is easy to show that RM will be an equivalence relation with (usually) one equivalence class for each state of M (remember that equivalence classes are by definition nonempty; what type of state might not have an equivalence class associated with it?). It is also straightforward to show that the properties of the state transition function guarantee that RM is in fact a right congruence (see the exercises). The equivalence classes of RM are called initial sets and will be of further interest in later chapters. For a DFA M = <I, S, so, 8, F> and a given state t from M, [(M, t) = {x I8(so,x) = t}. This initial set can be thought of as the language accepted by a machine similar to M, but which has t as its only final state. That is, if we define Mt = <I, S, so, 8, {t}>, then [(M, t) = L(M t ) . The notation presented here allows a concise method of denoting both relations defined by languages and relations defined by automata. It is helpful to observe that even in the absence of context, Rx indicates that a relation based on the language X is being described (since X occurs as a subscript), while the relation RY identifies Y as a machine (since Y occurs as a superscript). Just as each DFA M gives rise to a right congruence RM , many right congruSec. 2.1 RightCongruences 69 ences Q can be associated with a DFA, which will be called Ao. It can be shown that, if some of the equivalence classes of Q are singled out to form a language L, Ao will recognize L. V Definition 2.5. Given a right congruence Q of finite rank and a language L that is the union of some of the equivalence classes of Q, Ao is defined by Ao = <!', So, soo' 80, Fo> where So = {[x]0 Ix E !,*} sOo=[>"]o Fo = {[x]0 Ix E L} and 80 is defined by (Vx E!'*)(VaE!')(80([xlo,a) = [xa]o) Note that this is a finite-state machine since rk(Q) < 00, and that if L1 were a different collection of equivalence classes of Q, Ao would remain the same except for the placement of the final states. In other words, Fo is the only aspect of this machine that depends on the language L (or L1 ) . As small as this change might be, it should be noted that Ao is defined both by Q and the language L. It is left for the reader to show that Ao is well-defined and that L(Ao) = L (see the exercises). The corresponding statements will be proved in detail in the next section for the important special case where Q = RL . EXAMPLE 2.4 Let Q C [a}" x [a}" be the equivalence relation with the following equivalence classes: [>"]0 = {>"} = {a}O [a]o = {a} = {aF [aa]o = {a}' U{a}' U{a]" U{a}'U ... It is easy to show that Q is a right congruence (see the exercises). If L1 were defined to be [>"]0 U [a]o, then Ao would have the structure shown in Figure 2.la. For the language defined by the different combination of equivalence classes given by L, = [>"]0 U [aalo, Ao would look like the DFA given in Figure 2.lb. This example illustrates that it is the right congruence Q that establishes the start state and the transitions, while the language L determines the final state set. It should also be clear why L must be a union of equivalence classes from Q. The figure shows that a machine with the structure imposed by Q cannot possibly both reject aaa and accept aaaa. Either the entire equivalence class for [aa]o must belong to L, or none of the strings from [aa]o can belong to L. 70 Characterization of FAD Languages Chap. 2 a (b) 2.2 NERODE'S THEOREM Figure 2.1 (a) The automaton for L1 in Example 2.4 (b) The automaton for L, in Example 2.4 In this section, we will show that languages that partition I* into a finite number of equivalence classes can be represented by finite automata, while those that yield an infinite number of classes would require a machine with an infinite number of states. EXAMPLE 2.5 The language K given in Example 2.3 can be represented by a finite automaton with two states; all words that have an even number of letters eventually wind up at state so, while all the odd words are taken by 8 to S1. This machine is shown in Figure 2.2. It is no coincidence that these states split up the words of I * into the same equivalence classes that RK does. There is an intimate relationship between languages that can be represented by a machine with a finite number of states and languages that induce right congruences with a finite number of equivalence classes, as shown by the following theorem. Figure 2.2 The DFA discussed in Example 2.5 V Theorem 2.2: Nerode's Theorem. Let L be a language over an alphabet I; the following statements are all equivalent: 1. L is FAD. 2. There exists a right congruence R on I* for which L is the (possibly empty) union of some of the equivalence classes of Rand rk(R) < 00. 3. rk(Rd < 00. Proof. Because of the transitivity of =>, it will be sufficient to show only the three implications (1) => (2), (2) => (3), and (3) => (1), rather than all six of them. Sec. 2.2 Nerode's Theorem 71 Proofof (1)~ (2): Assume (1); that is, let L be FAD. Then there is a machine that accepts L; that is, there exists a finite automaton M = <I, S, so, 8, F> such that L(M) =L. Consider the relation RM on I* based on this machine M as given in Definition 2.4: (\lx,y E I*)(xRM y ~ &(so, x) = &(so,y)). This RM will be the relation R we need to prove (2). For each s E S, consider I(M, s) = {x E I* I&(so,x) = s}, which represents all strings that wind up at state s (from so). Note that it is easy to define automata for which it is impossible to reach certain states from the start state; for such states, I(M, s) would be empty. Then \Is E S, I(M, s) is either an equivalence class of RM or I(M, s) = 0. Since there is at most one equivalence class per state, and there are a finite number of states, it follows that rk(RM) is also finite: rk(RM) s; IISII < 00. However, we have L =L(M) = {x E I* j&(so, x) E F} = U {x E I* I&(so, x) = f} = U I(M, f) fEF fEF That is, L is the union of some of the equivalence classes of the right congruence RM, and RM is indeed of finite rank, and hence (2) is satisfied. Thus (1)~ (2). Proof of (2)~ (3): Assume that (2) holds; that is, there is a right congruence R for which L is the union of some of the equivalence classes of the right congruence R, and rk(R) < 00. Note that we no longer have (1) as an assumption; there is no machine (as yet) associated with L. Case 1: It could be that L is the empty union; that is, that L =0. In this case, it is easy to show that RL has only one equivalence class (I*), and thus rk(Rd = 1 < 00 and (3) will be satisfied. Case 2: In the nontrivial case, L is the union of one or more of the equivalence classes of the given right congruence R, and it is possible to show that this R must then be closely related to the RL induced by the original language L. In particular, for any strings x and y, x Ry ~ (since R is a right congruence) (\lu E I*)(xu Ryu)~ (by definition of [ ]) (\lu E I*)([xu]R = [YU]R) ~ (by definition of L as a union of [ ]'s) (\lu E I *)(xu E L~ yu E L)~ (by definition of RL ) xRLy (\Ix E I*)(\ly E I*)(xRy ~ xRLy) means that R refines RL , and thus each equivalence class of R is entirely contained in an equivalence class of RL ; that is, each equivalence class of RL must be a union of one or more equivalence classes of R. Thus, there are more equivalence classes in R than in RL , and so rk(Rd -s rk(R). But by hypothesis, rk(R) is finite, and so RL must be of finite rank also, and (3) is satisfied. Thus, in either case, (2)~ (3). Proof of (3)~ (1): Assume now that condition (3) holds; that is, L is a language for which RL is of finite rank. Once again, note that all we know is that RL has a finite number of equivalence classes; we do not have either (1) or (2) as a hypothesis. Indeed, we wish to show (1) by proving that L is accepted by some finite 72 Characterization of FAD Languages Chap. 2 automaton. We will base the structure of this automaton on the right congruence RL , using Definition 2.5 with Q = RL . ARLis then defined by ARL= <l, SRL, SORL' 8RL, FRL> where SRL = {[xklx E l*} SORL = [A]RL FRL = {[xklx E L} and 8RLis defined by ('t/x E l *)('t/a E l)(8RL([xk, a) = [xak) The basic idea in this construction is to define one state for each equivalence class in RL , use the equivalence class containing Aas the start state, use those classes that were made up of words in L as final states, and define 8 in a natural manner. We claim that this machine is really a well-defined finite automaton and that it does behave as we wish it to; that is, the language accepted by ARLreally is L. In other words, L(ARJ = L. First, note that SRLis a finite set, since [by the only assumption we have in (3)] RL consists of only a finite number of equivalence classes. It can be shown that FRL is well defined; if [zk = [Yk, then either (both z ELand Y E L) or (neither z nor Y belong to L) (why?). The reader should show that 8RLis similarly well defined; that is, if [z]RL= [y ]RL, it follows that 8RLis forced to also take both transitions to the same state ([za]RL= [ya]RJ. Also, a straightforward induction on Iy Ishows that the rule for 8RLextends to a similar rule for 8RL: ('t/x E l*)('t/y E l*)(8RL([xk,y) = [XY]RJ With this preliminary work out of the way, it is possible to easily show that L (ARJ = L. Let x be any element of l *. Then x EL(ARJ~ (by definition of L) 8RL(SORL' x) E FRL~ (by definition of SOR) 8RL([A]RL, x) E FRL~ (by definition of 8RLand induction) [uk E FRL~ (by definition of A) [xk E FRL~ (by definition of FRJ xEL Consequently, L is exactly the language accepted by this finite automaton; so L must be FAD, and (1) is satisfied. Thus (3)~ (1). We have therefore come full circle, and all three conditions are equivalent. .:l The correspondence described by Nerode's theorem can best be illustrated by an example. Sec. 2.2 Nerode's Theorem 73 Figure 2.3 The DFA N discussed in Example 2.6 1 0,1 o 0,1 EXAMPLE 2.6 Let L be the following FAD language: L = I * - {A.} = Ii-. There are many finite automata that accept L, one of which is the DFA given in Figure 2.3. This four-state machine gives rise to a four-equivalence class right congruence as described in (1) => (2), where [A.]RN = I (N, so) = {A.}, since A. is the only string that ends up at So [l]RN = leN, s.) = {y I Iy Iis odd, and y ends with a I} = {z 15(so, z) = s.} [l1]RN = leN, S2) = {y I Iy Iis even} - {A.} = [z 15(so, z) = S2} [OOO]RN = leN, S3) = {y I Iyl is odd, and y ends with a O} = [z 15(so, z) = S3} Note that L is indeed l(N,sl)Ul(N,s2)Ul(N,s3) which is the union of all the equivalence classes that correspond to final states in N, as required by (2). To illustrate (2) => (3), let R be the equivalence relation RN defined above, let L again be I+, and note that (2) is satisfied: L = [l]R U [l1]R U [OOO]R, the union of 3 of the equivalence class of a right congruence of rank 4 (which is finite). As in the proof of (2) => (3), RL is refined by R, but in this case Rand RL are not equal. All the relations from R still hold, such as 11 R 1111, so 11 RL 1111; OR 000, and thus 0 RL 000, and so forth. It can also be shown that 11 RL 000, even though 11 and 000 were not related by R (apply the definition of RL to convince yourself of this). Thus, everything in [l1]R is related by R L to everything in [OOO]R; that is, all the strings belong to the same equivalence class of RL , even though they formed separate equivalence classes in R. It may at first appear strange, but the fact that there are more relations in RL means that there are fewer equivalence classes in RL than in R. Indeed, RL has only two equivalence classes, {A} and L. In this case, three equivalence classes of R collapse to form one large equivalence class of R L • Thus {A} = [A]RL= [A]R and L = [11k = [l]R U [l1]R U [OOO]R and, as we were assured by (2) => (3), R refines RL . 1 To illustrate (3) => (1), let's continue to use the Land R L given above. Since RL is of rank 2, we are assured of finding a two-state machine that will accept L. ARLin this case would take the form of the automaton P displayed in Figure 2.4. In this DFA, for example, 8([11]RL'0) = [110]RL = [l1]RL' and [A.]RL is the start state. [l1]RLis a final state since 11 E L. Verify that this machine accepts all words except A; that is, L(ARL) = L. 74 Characterization of FAD Languages Chap. 2 EXAMPLE 2.7 Figure 2.4 The DFA P discussed in Example 2.6 Assume Q is defined to be the right congruence R given in Example 2.4, and L is again I + , which is the union of three of the equivalence classes of Q: [1]Q, [11]Q, and [OOO]Q. The automaton AQis given in Figure 2.5. Figure 2.5 The automaton discussed in Example 2.7 If we were instead to begin with the same language L, but use the two-state machine P at the end of Example 2.6 to represent L, we would find that L would consist of only one equivalence class, RP would have only two equivalence classes, and R P would in this case be the same as RL (see the exercises). R P turns out to be as simple as RL because the machine we started with was as "simple" as we could get and still represent L. In Chapter 3 we will characterize the idea of a machine being "as simple as possible," that is, minimal. The two machines given in Example 2.6 accept the same language. It will be convenient to formalize this notion of distinct machines "performing the same task," and we therefore make the following definition. V Definition 2.6. Two DFAs A and 8 are equivalent iffL (A) = L (8). ~ EXAMPLE 2.8 The DFAs N from Example 2.6 and AQ from Example 2.7 are equivalent since L(N) = I+ = L(AQ ) . V Definition 2.7. A DFA A = <I, SA, SOA' 8A, FA> is minimal iff for every DFA 8 = <I, SB, SOB' 8B, FB> for which L (A) = L (8), ISA I:51 SB I* ~ An automaton is therefore minimal if no equivalent machine has fewer states. Sec. 2.3 Pumping Lemmas 75 EXAMPLE 2.9 The DFA N from Example 2.6 is clearly not minimal since the automaton Ao from Example 2.7 is equivalent and has fewer states than N. The techniques from Chapter 3 can be used to verify that the automaton Ao from Example 2.7 is minimal. More importantly, minimization techniques will be explored in Chapter 3 that will allow an optimal machine (like this Ao) to be produced from an inefficient automaton (like N). 2.3 PUMPING LEMMAS As you have probably noticed by now, finding RL and counting the equivalence classes is not a very practical way of verifying that a suspected language cannot be defined by a finite automaton. It would be nice to have a better way to determine if a given language is unwieldy. The pumping lemma will supply us with such a technique. It is based on the observation that if your automaton processes a "long enough" word it must eventually visit (at least) one state more than once. Let A = <I, S, sO, 8, F>, and consider starting at some state s and processing a word x of length 5. We will pass through state s and perhaps five other states, although these states may not all be distinct if we visit some of them repeatedly while processing the five letters in x; thus the total will be six states (or less). Note that if A has 10 states (IISII = 10) and Ixl =12 we cannot go to 13 different states while processing x; (at least) one state must be visited more than once. u Figure 2.6 A path with a loop In general, if n = IISII, then any string x whose length is equal to or greater than n must pass through some state q twice while being processed by A, as shown in Figure 2.6. Here the arrows are meant to represent the path taken while processing several letters, and the intermediate states are not shown. The strings u, v, and ware defined as u = first few letters of x that take us to the state q v = next few letters of x that will again take us back to q w = rest of the string x Then, with x = uvw, we have 8(s, u) = q, 8(s, uv) = g, and in fact 8(q, v) = q. Also, 8(s, x) = 8(s, uvw) = f and 8(q, w) = f, as is clear from the diagram. Now consider the string uw, that is, the string x with the v part "removed": 8(s, uw) = 8(8(s, u), w) = 8(q, w) =f (why?) 76 Characterization of FAD Languages Chap. 2 That is, the string uw winds up in the same place uvw does; this is illustrated in Figure 2.7a. Note that a similar thing happens if uv 2 w is processed: 8(s, uvvw) = 8(8(s, u), vvw) = 8(q, vvw) = 8(8(q, v), vw) =8(q, vw) = 8(8(q, v), w) = 8(q, w) =f This behavior is illustrated in Figure 2.7b. (a) (b) Figure 2.7 (a) The path that bypasses the loop (b) The path that traverses the loop twice In general, it can be proved by induction that (Vi E N){8(s,uv i w) = f = 8(s, uvw)}. Notice that we do reach q two distinct times, which implies that the string v contains at least one letter; that is, Iv I2=: 1. Also, after the first n letters of x, we must have already repeated a state, and thus some state q can be found such that Iuv 1:5 n. If s happens to be the start state So and f is a final state, we have now shown that: If A = <I, S, sO, '0,F>, where [S] = n, then, given any string x = 818283 .•. 8 m , where m 2=: nand 8(so,x) = fE F [which implies x EL(A)], the states 8(so,A), 8(so,81), 8(so,8182)' 8(so,818283)' ... ,8(so, 8182' .. 8 n ) cannot all be distinct, and so x can be broken up into strings u, v, and w such that x =uvw luvl :5n Ivl2=: 1 and (Vi E N)(8(so,uv i w) = f), that is, (Vi E N)(uv i w EL(A». In other words, given any "long" string in L (A), there is a part of the string that can be "pumped" to produce even longer strings in L (A). Thus, ifL is FAD, there exists an automaton (with a finite number n of states), and thus for some n EN, the above statement should hold. We have just proved what is generally known as the pumping lemma, which we now state formally. Sec. 2.3 Pumping Lemmas 77 V Theorem 2.3: The Pumping Lemma. Let L be an FAD language over I*. Then (3n E ~)(Vx EL :llxl ::::n)(3u, v, w EI*):l x = uvw, luvl:5n, Ivl:::: 1, and (Vi E ~)(UViW E L). Proof. Given above. a EXAMPLE 2.10 Let E be the set of all even-length words over {a, b}". There is a two-state machine that accepts E, so E is FAD, and the pumping lemma applies if n is, say, S. Then Vx :llx I> 5, if x = alaZa3' .. aj E E (that is,j is even), we can choose u = A, v = alaZ, and w=a3a4 ... aj' Note that luvl=2:5S, Ivl=2::::1, and luv iwl=j+2(i-1), which is even, and so (Vi E ~)(UVi wEE). If Example 2.10 does not appear truly exciting, there is good reason: The pumping lemma is generally not applied to FAD languages! (Note: We will see an application later.) The pumping lemma is often applied to show languages are not FAD (by proving that the language does not satisfy the pumping lemma). Note that the contrapositive of Theorem 2.3 is: V Theorem 2.4. Let L be a language over I ". If (VnE~)(3xEL:llxl::::n)(Vu,v,wEI* h =uvw, Iuv [:5 n, Ivi:::: 1)(3iE~:luviw Et L), then L is not FAD. Proof. See the exercises. a EXAMPLE 2.11 Consider L4= {y E{O, l}*llyll = [ylo}. We will use Theorem 2.4 to show L4is not FAD: Let n be given, and choose x =onr, Then x EL4, since Ixll =n = Ixlo. It should be observed that x must be dependent on n, and we have no control over n (in particular, n cannot be replaced by some constant; similarly, while i may be chosen to be a fixed constant, a proof that covers all possible combinations of u, v, and w must be given). Note that this choice of x is "long enough" in that Ix[= 2n :::: n, as required by Theorem 2.4. For any combination of u, v, wEI* such that x = uvw, Iuv I:5n, Iv I:::: 1, we hope to find a value for i such that uv i w Et L4. Since Iuv 1:5 n and the first n letters of x are all zeros, this narrows down the choices for u, v, and w. They must be of the form u = oj and v = Ok (since Iuv 1:5 n and x starts with n zeros), and w must be the "rest of the string" and look something like w = 0"' I", The constraints on u, v, and w imply thatj + k:5 n, k :::: 1, and j + k + m = n. If i = 2, we have that 78 Characterization of FAD Languages Chap. 2 uv 2 W = on+k I" $. L4 (why?). Thus, by Theorem 2.4, L4 is not FAD [or, alternately, because the conclusion of the pumping lemma (Theorem 2.3) does not hold, L4 cannot be FAD]. It is instructive to attempt to build a DFA that attempts to recognize the language L4• As you begin to see what such a machine must look like, it will become clear that no matter how many states you add (that is, no matter how large n becomes) there will always be some strings ("long" strings) that would require even more states. Your construction may also suggest what the equivalence classes of RL4 must look like (see the exercises). How many equivalence classes are there? (You should be able to answer this last question without referring to any constructions.) A similar argument can be made to show that no DFA can recognize the set of all fully parenthesized infix expressions (see the exercises). Matching parentheses, like matching Os and Is in the last example, requires unlimited storage. We have seen that DFAs are adequate vehicles for pattern matching and token identification, but a more complex model is clearly needed to implement functions like the parsing of arithmetic expressions. Pushdown automata, discussed in Chapter 10, augment the finite memory with an unbounded stack, allowing more complex languages to be recognized. Intuitively, we would not expect finite-state machines to be able to differentiate between arbitrarily long integers. While modular arithmetic, which only differentiates between a finite number of remainders, should be representable by finite automata, unrestricted arithmetic is likely to be impossible. For example, {aibickli,j, k E ~ and i + j = k} cannot be recognized by any DFA, while the language {a'bickIi,j, k E ~ and i + j == k mod 3}is FAD. Checking whether two numbers are relatively prime is likewise too difficult for a DFA, as shown by the proof in the following example. EXAMPLE 2.12 Consider L = {aibili,j E ~ and i and j are relatively prime}. We will use Theorem 2.4 to show L is not FAD: Let n be given, and choose a prime p larger than n + 1 (we can be assured such a p exists since there are an infinite number of primes). Let x = aPb(p 1)1. Since p has no factors other than 1 and p, it has no nontrivial factor in common with (p-1)*(p-2)* ... *3*2*1, and sop and (p-1)! are relatively prime, which guarantees that x E L. The length of x is clearly greater than n, so Theorem 2.3 should apply, which implies that there must exist a combination u, v, wE I* such that x = uvw, Iuv 1:5 n, Ivl:2: 1; we hope to find a value for i such that uv i w E L. Since Iuv 1:5 n and the first n letters of x are all as, there must exist integers j, k, and m for which u = a/ and v = a\ and w must be the "rest of the string"; that is, w = am b(p -1)1. The constraints on u, v, and w imply that j + k :5 n, k:2: 1, andj + k + m =p. If i = 0, we have that uvow = aP-kb(p-I)'. Butp k is a number between p 1 and p n and hence must match one of the nontrivial factors in (p -1)!, which means that uvowEtL (why?). Therefore, Theorem 2.3 has been violated, so L could not have been FAD. The details of the basic argument used to prove the pumping lemma can be Sec. 2.3 Pumping Lemmas 79 .. * varied to produce other theorems of a similar nature: for example, when processing x, there must be a state q' repeated within the last n letters. This gives rise to the following variation of the pumping lemma. V Theorem 2.5. Let L be a FAD language over I*. Then (3n E N)(Vx E L, Ixl ~ n)(3u, v, wE I*) , x =uvw Ivwl::5 n, Ivl~l and Proof. See the exercises. The new condition Ivw I::5 n reflects the constraint that some state must be repeated within the last n letters. The contrapositive of Theorem 2.5 can be useful in demonstrating that certain languages are not FAD. By repeating our original reasoning while assuming the string x takes us to a non final state, we obtain yet another variation. V Theorem 2.6. Let L be a FAD language over I*. Then (3n E N)(Vx E;t L, Ixl ~ n)(3u, v, wE I*) , x =uvw Iuv 1::5 n Ivl~ 1 and (Vi E N)(uv i w E;t L) Proof. See the exercises. Notice that Theorem 2.6 guarantees that if one "long" string is not in the language then there is an entire sequence of strings that cannot be in the language. There are some examples of languages in the exercises where Theorem 2.4 is hard to apply, but where Theorem 2.5 (or Theorem 2.6) is appropriate. When i = 0, the pumping lemma states that given a "long" string (uvw) in L there is a shorter string (uw) that is also in L. If this new string is still of length greater than n, the pumping lemma can be reapplied to find a still shorter string, and so on. This technique is the basis for proving the following theorem. 80 Characterization ofFAD Languages Chap. 2 V Theorem 2.7. Let M be an n-state DFA accepting L. Then ('fix E L 1 x =818Z' .. a; and m ~ n)(3 an increasing sequence t-, iz, ... ,ij ) for which 8/ J8iz ... 8i j E L, and j < n. Proof. See the exercises. Note that 8il8iz .•• 8i j represents a string formed by "removing" letters from perhaps several places in x, and that this new string has length less than n. Theorem 2.7 can be applied in areas that do not initially seem to relate to DFAs. Consider an arbitrary right congruence R of (finite) rank n. It can be shown that each equivalence class of R is guaranteed to contain a representative of length less than n. For example, consider the relation R given by [A]R = {A} [l1l1l]R = {y IlyIis odd, and y ends with a I} [OIOI]R = {y IlyIis even and Iy I>O} [OOOOO]R = {y IlyIis odd, and y ends with a O} In this relation, rk(R) = 4, and appropriate representatives of length less than 4 are A, I, 11, and 100, respectively. That is, [A]R = [A]R, [I]R= [l1l11]R, [l1]R = [OIOI]R, and [IOO]R = [OOOOO]R' By constructing a DFA based on the right congruence R, Theorem 2.7 can be used to prove that every equivalence class of R has a "short" representative (see the exercises). We have seen that deterministic finite automata are limited in their cognitive powers, that is, there are languages that are too complex to be recognized by DFAs. When only a finite set of previous histories can be distinguished, the resulting languages must have a certain repetitious nature. Allowing automata to instead have an infinite number of states is uninteresting for several reasons. On the practical side, it would be inconvenient (to say the least) to physically construct such a machine. Infinite automata are also of little theoretical interest as they do not distinguish between simple and complex languages: any language can be accepted by the infinite analog of a DFA. With an infinite number of states available, the state transition diagrams can look like trees, with a unique state corresponding to each word in l ". The states corresponding to desired words can simply be made final states. More reasonable enhancements to automata will be explored later. Nondeterminism will be presented in Chapter 4, and machines with extended capabilities will be defined and investigated in Chapters 10 and 11. EXERCISES 2.1 Let}; = {a, b, c}. Show that the relation 'I' ~};* x };* defined by x'l'y ¢:> Ix I-Iy Iis odd is not a right congruence. (Is it an equivalence relation?) Chap. 2 Exercises 81 2.2. Let 1 ={a, b, c}. Consider the relation Q ~ 1* x 1* defined by xQy ~Ixl. -lyl.=Omod3 (a) Show that Q is an equivalence relation. (b) Assume that part (a) is true, and show that Q is a right congruence. 2.3. Let 1 = {a, b, c}, Find all languages L such that rk(Rd = 1. Justify your conclusions. 2.4. Let P ~ {a, 1}* x {a, 1}* be the equivalence relation with the following equivalence classes: [A]p = {A} = {a, I}O [1]p= {O, I} = {a, t}' [OO]p = {O, 1}2 U {O, WU {a, 1}4 U {O, WU ... Show that P is a right congruence. 2.5. For the relation P defined in Exercise 2.4, find all languages L 1 RL = P. 2.6. For the relation Q defined in Exercise 2.2, find all languages L 1 R L = Q. 2.7. Let 1 = {a, b}. Define the relation Q by AQA, and ("Ix fA)(A<,l;lX), and ("Ix fA)(Vy f-A)[XQy~ (Ix Iis even /\ IyI is even) V (Ix Iis odd /\ IyIis odd)], which implies that ("Ix fA)(Vy fA)[X~y ~ (Ix Iis even /\ IyI is odd) V (Ix Iis odd /\ Iy Iis even)]. (a) Show that Q is a right congruence, and list the equivalence classes. (b) Define L = [A]O U [aa]o. Find a simple decription for L, and list the equivalence classes of RL • (Note that Q does refine RL . ) (c) Draw a machine with states corresponding to the equivalence classes of Q. Arrange the final states so that the machine accepts L (that is, find Ao). (d) Draw ARL' (e) Consider the machine in part (c) above (Ao). Does it look like ARL? Can you rearrange the final states in Ao (producing a new language K) so that ARK looks like your new Ao? Illustrate. (f) Consider all eight languages found by taking unions of equivalence classes from Q, and see which ones would satisfy the criteria of part (e). 2.8. Let 1 = {a}. Let I be the identity relation on 1*. (a) Show that I is a right congruence. (b) What do the equivalence classes of I look like? 2.9. Let 1 = {a}, and let I be the identity relation on 1*. Let L = {A} U {a}U {aa}, which is the union of three of the equivalence classes of I. I has infinite rank. Does Nerode's theorem imply that L is not FAD? Explain. 2.10. Define a machine M = <1, SM, Sn, 8, FM>for which rk(RL(M» f-IISMII. 2.11. Carefully show that FRL is a well-defined set; that is, show that the rule that assigns equivalence classes to rRLis unambiguous. 2.12. Carefully show that 8RLis well-defined, that is, that 8RLis a function. 2.13. Use induction to show that ("Ix E l*)(Vy E l*)(BRL([x]RuY) = [XY]RL)' 2.14. Consider the automaton P derived in Example 2.6, find RP and notice that RP = RL • 2.15. Find RA for each machine A you built in the exercises of Chapter 1; compare RA to RL(A). 82 Characterization of FAD Languages Chap. 2 2.16. Prove by induction that, for the strings defined in the discussion of the pumping lemma, (Vi E I'\ll)(~(s, uv' w) = f = ~(s, uvw». 2.17. Prove Theorem 2.1. 2.18. (a) Find a language that gives rise to the relation I defined in Exercise 2.8. (b) Could such a language be FAD? Explain. 2.19. Starting with Theorem 2.3 as a given hypothesis, prove Theorem 2.4. 2.20. Prove Theorem 2.5 by constructing an argument similar to that given for Theorem 2.3. 2.21. Prove Theorem 2.6 by constructing an argument similar to that given for Theorem 2.3. 2.22. Prove Theorem 2.7. 2.23. Let L = {x E {a, b}*Ilx la< IX Ib}. Show L is not FAD. 2.24. Let G = {x E {8, b}*llxlã Ixlb}' Show G is not FAD. 2.25. Let P = {y .{d}* /3 prime p ~ y = d"} = {dd, ddd, ddddd, d", d", d13 , ••• }. Prove that P is not FAD. 2.26. Let r = {x E{O, 1,2}*/3w E{O, 1}* h = w*2*w} = {2, 121,020, 11211, 10210, ... }. Prove that r is not FAD. 2.27. Let 'I' = {x E {O, 1}*13w E {O, 1}* ~ x = w -w}= {>., 00,11,0000,1010,1111, ... }. Prove that 'I' is not FAD. 2.28. Define the reverse of a string w as follows: If w = ala2a334 ... an-Ian, then w r = an8 n -I, .. a483a281. Let K = {w E {O, 1}*Iw = w'} = {>., 0,1,00,11,000,010, 101,111,0000,0110, ... }. Prove K is not FAD. 2.29. Let ep = {x E {a, b, c}"13i,j, k E I'\ll ~ x = aibk em, where j ~ 3 and k = m}. Prove ep is not FAD. Hint: The first version of the pumping lemma is hard to apply here (why?). 2.30. Let C = {y E {d]" 13 non prime q ~ y = d"} = {>., d, d4,d6, dB, d9,d lO ••• }. Show C is not FAD. Hint: The first version of the pumping lemma is hard to apply here (why?). 2.31. Assume}; = {a, b}and L is a language for which RL has the following three equivalence classes: {>.}, {all odd-length words}, {all even-length words except x}, (a) Why couldn't L = {x Ilxl is odd}? (Hint: Recompute R{xllxl is odd}). (b) List the languagesL that could give rise to this RL . 2.32. Let}; = {a, b} and let 'I' = {x E};* Ix has an even number of as and ends with (at least) one b}. Describe R", and draw a machine accepting '1'. 2.33. Let S = {x E{a}*13j E I'\ll ~ Ixl =/} = {>.,a,aaaa,a9,aI6,a25, ... }. Prove that S is not FAD. 2.34. Let ep = {x E {b}* 13j E I'\ll ~ Ixl = 2J} = [b, bb, bbbb, bB, b16, b32, ... }. Prove that ep is not FAD. 2.35. Let}; = {a, b}, Assume RL has the following five equivalence classes: {>.}, {a}, {aa}, {a", a", as, a6, ... }, {x Ix contains (at least)one b}. Also assume that L consists of exactly one of these equivalence classes. (a) Which equivalence class is L? (b) List the other languages L that could give rise to this RL (and note that they might consist of several equivalence classes). 2.36. Let n = {y E {O, 1}*I(y contains exactly one 0) V (y contains an even number of Is)}. Find Rn. 2.37. Let}; = {a, b} and LI = {x E};* Ilxla> Ix Ib} and L, = {x E};* Ilx la< 3}. Which of the following are FAD? Support your answers. (a) LI (b) t., (c) L I n Lz (d) -L2 (e) LI U Lz Chap. 2 Exercises 83 2.38. Let mEN and let R; be defined by x Rmy ¢::? Ixl-Iy Iis a multiple of m. (a) Prove that Rm is a right congruence. (b) Show that Rz n R3 is R6 , and hence also a right congruence. (Note, for example, that (A,aaaaaa) E R6 since (A,aaaaaa) E R3 and (A,aaaaaa) E Rz ; how do the equivalence classes of Rz and R, compare?). (e) Show that, in general, if Rand S are right congruences, then so is R n S. (d) Now consider Rz U R3 , and show that this is not a right congruence because it is not even an equivalence relation. (e) Prove that if Rand S are right congruences and R U S happens to be an equivalence relation then R U S will be a right congruence, also. 2.39. Give an example of two right congruences R1 and Rz over ~*for which R1 U Rz is not a right congruence. 2.40. Let ~ = {a, b} and let r = {A, a, ab, ba, bb, bbb} U {x E ~* Ilxl <=: 4}. (a) Use the definition of Rr to show ab Rr ba. (b) Use the definition of Rr to show ab is not related by Rr to bb. (e) Show that the equivalence classes of Rr are {A}, {a}, {b}, [aa},[bb},{ab, ba}, {x Ix '" bbb /\ Ixl = 3},{x Ix = bbb V Ixl <=: 4}. (d) Draw the minimal state DFA which accepts r. 2.41. Prove that the relation RMgiven in Definition 2.4 is a right congruence. 2.42. We can view a text file as being one long "word," that is, a string of characters (including spaces, carriage returns, and so on). In this sense, each Pascal program can be considered to be a single word over the ASCII alphabet. We can define the language Pascal as the set of all valid Pascal programs (that is, the valid words are those text files that would compile with no compiler errors). Is this language FAD? 2.43. Define "Short Pascal" as the collection of valid Pascal programs that are composed of less than 1 million characters. Is Short Pascal FAD? Any volunteers for building the appropriate DFA? 2.44. Let ~ = {a,b, c}, and define L = {an bk c']n < 3 or (n <=: 3 and k = j)}. (a) Show that for this language the conclusion of Theorem 2.3 holds, but the hypothesis of Theorem 2.3 does not hold. (b) Is the contrapositive of Theorem 2.3 true? Explain. 2.45. Carefully show that Fo in Definition 2.5 is a well-defined set. 2.46. Carefully show that 80 in Definition 2.5 is well defined. 2.47. For 80 in Definition 2.5, use induction to show that (Vx E ~*)(Vy E ~*)(~o([x]o,Y) = [xY]o). 2.48. For the Land Q in Definition 2.5, prove that L(Ao) = L. 2.49. Given Land Q as in Definition 2.5, Ao is a machine to which we can apply Definition 2.4. Prove or give a counterexample: Q = R(AQ). 2.50. Given Land Q = RL as in Definition 2.5, ARLis a machine to which we can apply Definition 2.4. Prove or give a counterexample: RL = R(ARL). 2.51. Show that the converse of Theorem 2.3 is false (Hint: See Exercise 2.29 and let L = <1». 2.52. LetL={x E{a,b}*llxI. =2Ixlb}. Prove thatLisnot FAD. 2.53. Consider the language K defined in Exercise 2.28. (a) Find [llO]RK; that is, find all strings y for which Y RK 110. (b) Describe RK • 84 Characterization of FAD Languages Chap. 2 2.54. Prove or give a counterexample: RL1n RL2= R(LI n L2)' 2.55. Given a right congruence Rover I* for which rk(R) = n. Prove that each equivalence class of R contains a representative whose length is less than n. 2.56. For the R given in Example 2.6, find all languages L for which RL= RN • 2.57. Consider the languages defined in Exercise 1.6. Find the right congruences induced by each of these four languages. 2.58. Assume L c {a}* and ARLa. List all possible choices for the language L. 2.59. Assume L ~ [a]" and a RLaa. List all possible choices for the language L. 2.60. Assume L ~ {a}* and ARLaa. List all possible choices for the language L. 2.61. (a) Give an example of a DFA M for which RM= RL(M). (b) Give an example of a DFA M for which RM1RL(M). (c) For every DFA M, show that RMrefines ~(M)' 2.62. Find RMand RL(M) for the machine M described in Example 1.5. 2.63. Is ARL(A) always equivalent to A? Explain. 2.64. Consider L4 = {y E {O, 1}* I Iy 10 = Iy II} as given in Example 2.11. Let n be given and consider x = (01)" = 010101 ... 0101. Then Ixl = 2n > n; but if u = 0101, v = 01, and w = (01)" - \ ('Vi E N)Uyi W E L4 • Does this mean L4 is FAD? Explain. 2.65. Consider L4 = {y E {O, 1}* l b 10 = Iy Idas given in Example 2.11. Find RL4* 2.66. Show that the set of all postfix expressions over the alphabet {A,B, +, -} is not FAD. 2.67. Show that the set of all parenthesized infix expressions over the alphabet {A,B, +, -, (,)} is not FAD. 2.68. For a given language L, how does RL compare to R_ L; that is, how does the right congruence generated by a language compare to the right congruence generated by its complement? Justify your statement. 2.69. Let I = {a, b]. Assume the right congruence Q has the following equivalence classes: {A}, {a},[b}, {x IlxI 2: 2}. Show that there is no language L such that RL= Q. 2.70. Let Q be the equivalence relation with the two equivalence classes {A, a, aa} and {a\a"as, ... }. (a) Show that Q is not a right congruence. (b) Attempt to build Ao(ignoring Fo for the moment), and describe any difficulties that you encounter. (c) Explain how the failure in part (a) is related to the difficulties found in part (b). 2.71. Let I ={a, b, e}, Show that {a'b ' ckli,j, kEN and i + j = k} is not FAD. 2.72. Let Q ~ {a}* x {a}* be the equivalence relation with the following equivalence classes: [A]o = {A} = {a}O [a]o = {a} = [a}' [aa]o = {a}2 U{a]" U{a]"U{a}" U ... Show that Q is a right congruence. 2.73. Let I = {a, b,c}, and let R2 be defined by x R2 y ~x and y end with the same letter. (a) Show that R2is an equivalence relation. (b) Assume that part (a) is true, and show that R2is a right congruence. 2.74. Let I = {a, b, c}, and let R3 be defined by x R 3 y <=>x and y begin with the same letter. Chap. 2 Exercises 85 (a) Show that R3 is an equivalence relation. (b) Assume that part (a) is true, and show that R3 is a right congruence. 2.75. Let I = {a, b}. Which of the following languages are FAD? (Support your answers.) (a) L1 = all words over I * for which the last letter matches the first letter. (b) ~ = all odd-length words over I* for which the first letter matches the center letter. (c) L3 = all words over I* for which the last letter matches none of the other letters. (d) L4 = all even-length words over I* for which the two center letters match. (e) L, = all odd-length words over I* for which the center letter matches none of the other letters. 2.76. In the proof of (2) => (3) in Nerode's theorem: (a) Complete the proof of case 1. (b) Could case 1 actually be included under case 2? 2.77. Consider the right congruence property (Re) in Definition 2.1. Show that the implication could be replaced by an equivalence; that is, property (Re) could be rephrased as ('v'x,y EI*)(xRy ~ ('v'u EI*)(xuRyu)) 2.78. Given a DFA M = <I,S, sO, 8, F>, assume that 8(s, u) = q, and 8(q, v) = q. Use induction to show that ('v'iE N)(8(s, zzv') = q). 2.79. Let L be the set of all strings that agree with some initial part of the pattern 0]10 21031041 ... = 0100100010000100000100 Thus, L = {O, 01, 010, 0100, 01001, 010010, 0100100, }. Prove that L is not FAD. 2.80. Consider the following BNF over the three-symbol alphabet {a,), (}, and show that the resulting language is not FAD. <simple> := a I(<simple>) 2.81. (a) Let I = {O, I}. Let ~ = {x E I* Ithe base 2 number represented by x is a power of 2}. Show that ~ is FAD. (b) LetI={0,1,2,3,4,5,6,7,8,9}. Let L IO = {x E I *Ithe base 10 number represented by x is a power of 2}. Prove that L IO is not FAD. CHAPTER MINIMIZATION ofFINITE AUTOMATA We have seen that there are many different automata that can be used to represent a given language. We would like to be able to find an automaton for a language L that is minimal, that is, a machine which can represent that language which has the fewest number of states possible. Finding such an optimal DFA will involve transforming a given automaton into the most efficient equivalent machine. To effectively accomplish this transformation, we must have a set of clear, unequivocal directions specifying how to proceed. A procedure is a finite set of instructions that unambiguously defines deterministic, discrete steps for performing some task. As anyone who has programmed a computer knows, it is possible to generate procedures that will never halt for some inputs (or perhaps for all inputs if the program is seriously flawed). An algorithm is a procedure that is guaranteed to halt on all (legal) inputs. In this chapter we will specify a procedure for finding a minimal machine and then justify that this procedure is actually an algorithm. Thus, the theorems and definitions will show how to transform an inefficient DFA into an optimal automaton in a straightforward manner that can be easily programmed. 3.1 HOMOMORPHISMS AND ISOMORPHISMS One of our goals for this chapter can be stated as follows: Given a language L, we wish to survey all the machines that recognize L and choose the machine (or machines) that is "smallest." It will be seen that there is indeed a unique smallest machine: ARL • The automaton ARL will be unique in the sense that any other optimal 86 Sec. 3.1 Homomorphisms and Isomorphisms 87 DFA looks exactly like ARL except for a trivial relabeling of the state names. The concept of two automata "looking alike" will have to be formalized to provide a basis for our rigorous statements. Machines that "look alike" will be called isomorphic, and the relabeling specification will be called an isomorphism. We have already learned some facts about ARL, which stem from the proof of Nerode's theorem. These are summarized below and show that ARL is indeed one of the optimal machines for the language L. V Corollary 3.1. For any FAD language L, L(ARJ = L. Proof. This was shown when (3)~ (1) in Theorem 2.2 was proved. Also, in the proof of (1)~ (2) in Nerode's theorem, the relation RM (defined by a given DFA M=<I,S,so,B,F> for which L(M)=L) was used to show IISII ~ rk(RM) . Furthermore, in (2)~ (3), right congruences such as RM that satisfied property (2) must be refinements of RL , and so rk(RM) ~ rk(RL ) . Thus IISII ~ rk(RM) ~ rk(RL) = IISRJ, which leads immediately to the following corollary. V Corollary 3.2. For any FAD language L, ARL is a minimal deterministic finite automaton that accepts L. Proof. The proof follows from the definition of a minimal DFA (Definition 2.7); that is, if M = <I, S, sO, B, F> is any machine that also accepts L, then IISII ~ IISRJ. A Besides being in some sense "the simplest," the minimal machine has some other nice properties. For example, if A is minimal, then the right congruence generated by A is identical to the right congruence generated by the language recognized by A; that is, R A = RL(A) (see the exercises). Examples 3.1 and 3.2 illustrate the two basic ways a DFA can have superfluous states. V Definition 3.1. A state s in a finite automaton A = <I, S, sO, B,F> is called accessibleiff 3xs E I* , B(so,xs) = s The automaton A is called connected iff (\is ES)(3xs E I*)(B(so,xs) = s) That is, a connected machine requires all states to be accessible; every state s of S must be "reachable" from sci by some string (xs) in I * (different states will require different strings, and hence it is convenient to associate an appropriate string x, with 88 Minimization of Finite Automata Chap. 3 the state s). States that are not accessible are sometimes called disconnected, inaccessible, or unreachable. EXAMPLE 3.1 The machine defined in Figure 3.1 satisfies the definition of a deterministic finite automaton, but is disconnected since r cannot be reached by any string from the start state q. Note that xq could be A or 10, while Xl might be 0 or 111. There is no candidate for X r • Furthermore, r could be "thrown away" without affecting the language that this machine accepts. This will be one of the techniques we will use to minimize finite automata: removing the inaccessible states. There is a second way for an automaton to have superfluous states, as shown by the automata in the following examples. An overabundance of states may be present, recording nonessential information and consequently distinguishing between strings in ways that are unnecessary. 0,1 Figure 3.1 The automaton discussed in Example 3.1 EXAMPLE 3.2 Consider the four-state DFA over {a, b}" in which So is the start and only final state, defined in Figure 3.2. This automaton is clearly connected, but it is still not optimal. This machine accepts all strings whose length is a multiple of 3, and sland S2 are really "remembering" the same information, that is, that we currently have read a string that is one more than a multiple of 3. The fact that some strings that end in a are sent to s., while those that end in b may be sent to S2, is of no real importance; we do not have to "remember" what the last letter in the string actually was in order to correctly accept the given language. The states s, and S2 are in some sense equivalent, since they are performing the same function. The careful reader may have noticed that this language could have been recognized with a three-state machine, in which a single state combines the functions of Sl and Si. Now consider the automaton shown in Figure 3.3, in which there are three superfluous states. This automaton accepts the same language as the DFA in Figure 3.2, but this time not only are Sl and S2 performing the same function, but S3 and S4 are "equivalent," and So and S5 are both "remembering" that there has been a multiple ofthree letters seen so far. Note that it is not enough to check that Sl and S2 take you to exactly the same places (as was the case in the first example); in this Sec. 3.1 Homomorphisms and Isomorphisms 89 Figure 3.2 The first automaton discussed in Example 3.2 Figure 3.3 The second automaton discussed in Example 3.2 example, the arrows coming out of St and S2 do not point to the same places. The important thing is that, when leaving St or S2, when a is seen, we go to equivalent states, and when processing b from s, or S2, we also go to equivalent states. However, deciding whether two states are equivalent or not is perhaps a little less straightforward than it may at first seem. This sets the stage for the appropriate definition of equivalence. V Definition 3.2. Given a finite automaton A = <!', S, So, 5, F>, there is a relation between the states of A called EA , the state equivalence relation on A, defined by ('v'sES)('v'tES)(sEAt ~ ('v'x E!'*)(B(s,X)EF ~ B(t,x)EF)) In other words, we will relate sand t iffit is not possible to distinguish whether we are starting from state s or state t; each string x E!,* will either take x to a final state 90 Minimization of Finite Automata Chap. 3 when starting from s and also take x to a final state from t, or neither s nor t will take x to a final state. Another way of looking at this concept is to define new machines that "look like" A, but have different start states. Given a finite automaton A = <~, S, sO, 8, F> and two states s, t E S, define a new automaton AI= <~, S, t, 8, F> that has t as a start state, and another automaton A"= <~, S, s, 8, F> having s as a start state. Then sEAt~L(A")=L(At ) . (Why is this an equivalent definition?) These sets of words will be used in later chapters and are referred to as terminal sets. T(A, t) will denote the set of all words that reach final states from t, and thus T(A, t) =L (At) = {x 18(t,x) E F}. In terms of the black box model presented in Chapter 1, we see that we cannot distinguish between A" and At by placing matching strings on the input tapes and observing the acceptance lights of the two machines. For any string, both A" and At will accept, or both will reject; without looking inside the black boxes, there is no way to tell whether we are starting in state s or state t. This highlights the sense in which sand t are deemed equivalent: we cannot distinguish between sand t by the subsequent behavior of the automaton. The modified automaton At, which gives rise to the terminal set T(A, t), can be contrasted with the modified automaton At = <~, S, sO, 8, [t}> from Chapter 2, which recognized the initial set I(A, t) = {x I8(so,x) = t}. Notice that initial sets are comprised of strings that move from the start state to the distinguished state t, while terminal sets are made up of strings that go from t to a final state. EXAMPLE 3.3 The automaton N discussed in Example 2.6 (Figure 3.4) has the following relations comprising EN: SI EN SI , SI EN S2 , SI EN S3 S2 EN SI , S2 EN S2 , S2 ENS3 S3 EN SI , S3 EN S2 , S3 ENS3 o 1 0,1 Figure 3.4 The automaton N discussed in Example 3.3 Sec. 3.1 Homomorphisms and Isomorphisms 91 This can be succinctly described by listing the equivalence classes: [SOlEN = {so} [StlEN = [SZ]EN = [S3]EN = {sJ, sz, S3}, and we will abuse our notation slightly and blur the distinction between the relation EN with the partition it generates by writing EN = {{so}, {sJ, Sz,S3}}' Recall that Example 2.6 showed that the minimal machine that accepted L (N) had two states; it will be seen that it is no coincidence that EN has exactly the same number of equivalence classes. V Definition 3.3. A finite automaton A= <'I,S,so,8,F> is called reduced iff ('tis, t E S)(s EA t ¢:> s = t). ~ In a reduced machine, EA must be the identity relation on S, and in this case each equivalence class will contain only a single element. EXAMPLE 3.4 The automaton N in Figure 3.4 is not reduced, since Example 3.3 shows that [szlE N contains three states. On the other hand, the automaton A displayed in Figure 3.Sa is reduced since [solEA= {so} and [StlEA = {SI}' The concepts of homomorphism and isomorphism will play an integral part in justifying the correctness of the algorithms that produce the optimal DFA for a given language. We need to formalize what we mean when we say that two automata are "the same." The following examples illustrate the criteria that must exist between similar machines. EXAMPLE 3.5 We now consider the automaton B shownin Figure 3.Sb, which looks suspiciously like the DFA A given in Figure 3.Sa. In fact, it is basically the "same" machine. While it has been oriented differently (which has rio effect on the 8 function), and the start state has been labeled qo rather than so, and the final state is called qi rather than S1> A and B are otherwise "identical." For such a relabeling to truly reflect the same automaton structure, certain conditions must be met, as illustrated in the following examples. EXAMPLE 3.6 Consider machine C, defined by the state transition diagram given in Figure 3.Sc. This machine is identical to B, except for the position of the start state. However, it is not the same machine as B, since it behaves differently (and in fact accepts a different language). Thus we see that it is important for the start state of one machine to correspond to the start state of the other machine. Note that we cannot 92 '" Minimization of Finite Automata Chap. 3 (a) (c) (b) (d) (e) Figure 3.5 (a) The automaton A (b) The automaton B (c) The automaton C (d) The automaton D (e) The automaton E circumvent this by letting qo correspond to Sl and q1 correspond to so, since other problems will develop, as shown next in Example 3.7. EXAMPLE 3.7 Let machine D be defined by the state transition diagram given in Figure 3.5d. The automata Band D (Figures 3.5b and 3.5d) look much the same, with start states corresponding, but they are not the same (and will in fact accept different languages), because we cannot get the final states to correspond correctly. Even if we do get the start and final states to agree, we still have to make sure that the transitions correspond.This is illustrated in the next example. EXAMPLE 3.8 Consider the machine E given in Figure 3.5e. In this automaton, when leaving the start state, we travel to a final state if we see 0, and remain at the start state (which is nonfinal) if we see 1; this is different from what happened in machine A, where we traveled to a final state regardless of whether we saw 0 or 1. Thus it is seen that we not only have to find a correspondence (which can be thought of as a function 1-1-) between the states of our two machines, but we must do this in a way that satisfies the above three conditions (or else we cannot claim the machines are "the same"). This is summed up in the following definition of a homomorphism. Sec. 3.1 Homomorphisms and Isomorphisms 93 V Definition 3.4. Given two finite automata, A = <I, SA,SOA' 3A,FA> and B = <I,SB,SOB,3B,FB>, and a function u: SÃSB, j.L is called a finite automata homomorphism from A toB iff the following three conditions hold: i. j.L(SOA) = SOB li, ('Is E SA)(S E h~ j.L(s) E FB) iii. ('Is E SA)(Va E I)(j.L(3A(s,a)) = 3B(j.L(s), a)) Note that j.L is called a homomorphism from A to B, but it is actually a function between state sets, that is, from SA to SB. EXAMPLE 3.9 Machines A and B in Example 3.5 are homomorphic, since the homomorphism u: {so, Sl}~ {go, gl} defined by j.L(so) = go and j.L(Sl) = gl satisfies the three conditions. The following example shows that even if we can find a homomorphism that satisfies the 3 conditions the machines might not be the same. EXAMPLE 3.10 Let M = <I, {so, S1, S2}, so,3M , {SI}> and N = <I, {go,gl}, go, 3N , {gl}> be given by the state transition diagrams in Figures 3.6a and 3.6b. Define a homomorphism ljJ: {so, S1, S2}~ {go,q.I by ljJ(so) = go, ljJ(Sl) -= gt. and ljJ(S2) = go. Note that the start state maps to the start state, final states map to final states (and nonfinal states map to nonfinal states), and, furthermore, the transitions agree. Here is the statement that the 0 transition out of state So is consistent: (a) 0,1 (b) Figure 3.6 (a) The DFA M discussed in Example 3.10 (b) The DFA N discussed in Example 3.10 94 Minimization of Finite Automata Chap. 3 Note that this really does say that the transition labeled 0 leaving the start state conforms; So has a O-transition pointing to s., and so the O-transition from ljJ(so) should point to IjJ(SI) (that is, qoshould point to q.), But the transition taken from So upon seeing a 0, in our notation, is 8M(so, 0), and the place qogoes to is 8N(qo, 0). We wish to make sure that the state in N corresponding to where the O-transition from So points, denoted by 1jJ(8M(so, 0», agrees with the state to which qo points. Hence we require 1jJ(8M(so,0» = 8N(qo, 0). In the last formula, qo was chosen because that was the state corresponding to so; that is, ljJ(so) = qo. Hence, in our formal notation, we were really checking 1jJ(8M(so,0» = 8N(IjJ(so), 0). Hence, we see that rule (iii) requires us to check all transitions leading out of all states for all letters; that is, ('Is ESM)('1a E I)(1jJ(8M(s, a» = 8N(IjJ(s), a». Applying this rule to each choice of letters a and states s, we have 1jJ(8M(so,0» = IjJ(SI) = ql = 8N(qo, 0) = 8N(IjJ(so), 0) 1jJ(8M(so, 1» = IjJ(SI) = ql = 8N(qo, 1) = 8N(IjJ(so), 1) 1jJ(8M(sl> 0» = IjJ(SI) = ql = 8N(ql> 0) = 8N(IjJ(SI), 0) 1jJ(8M(sl> 1» = ljJ(so) = qo= 8N(ql> 1) = 8N(IjJ(SI), 1) 1jJ(8M(S2'0» = IjJ(SI) = ql = 8N(qo, 0) = 8N(IjJ(S2), 0) 1jJ(8M(S2' 1» = IjJ(SI) = ql = 8N(qo, 1) = 8N(IjJ(S2), 1) Hence IjJ is a homomorphism between M and N even though M has three states and N has two states. While the existence of a homomorphism is not enough to ensure that the machines are "the same," the exercises for this chapter indicate that the existence of a homomorphism is enough to ensure that the machines are equivalent. The extra condition we need to guarantee that the machines are identical (except for a trivial renaming of the states) is that IjJ be a bijection. V Definition 3.5. Given two finite automata A = <I, SA,SOA' 8A,FA> and B = <I, SB, SOS' 8B, FB>, and a function u.; SÃ Sa, f.L is called a finite automata isomorphism from A to B iff the following five conditions hold: I, f.L( SOA) = sos' ii. ('Is E SA)(S E FA ~ f.L(s) E FB) . iii. ('Is E SA)('1aE I)(f.L(8A(s, a» = 8B(f.L(S), a». iv. f.L is a one-to-one function from SA to SB. v. f.L is onto SB. il EXAMPLE 3.11 f.L from Example 3.9 is an isomorphism. Example 3.5 illustrated that the automaton A was essentially "the same" as B except for the way the states were named. Note that f.L can be thought of as the recipe for relabeling the states of A to form a Sec. 3.1 Homomorphisms and Isomorphisms 95 machine that would then be in the very strictest sense absolutely identical to B. I\J from Example 3.10 is not an isomorphism because it is not one to one. V Definition 3.6. Given two finite automata A = <I,SA, SOA' 8A, FA> and B = <I, Sa, s08,8a, Fa>, A is said to be isomorphic to B iff there exists a finite automata isomorphism between A and B, and we will write A == B. d EXAMPLE 3.12 Machines A and B from Examples 3.4 and 3.5 are isomorphic. Machines M and N from Example 3.10 are not isomorphic (and not just because the particular function I\J fails to satisfy the conditions; we must actually prove that no function exists that qualifies as an isomorphism between M and N). Now that we have rigorously defined the concept of two machines being "essentially identical," we can prove that, given a language L, any reduced and connected machine A accepting L must be minimal, that is, have as few states as possible for that particular language. We will prove this assertion by showing that any such A is isomorphic to ARL , which was shown in Corollary 3.2 to be the "smallest" possible machine for L. V Theorem 3.1. Let L be any FAD language over an alphabet I, and let A = <I, S, sO, 8, F> be any reduced and connected automaton that accepts L. Then A==ARL• Proof. We must try to define a reasonable function I-L from the states of A to the states of ARL (which you should recall corresponded to equivalence classes of RL) . A natural way to define I-L (which happens to work!) is: For each s E S, find a string z, E l* 1 8(so,xs) = s. (Since A is connected, we are guaranteed to find such an X s' In fact, there may be many strings that take us from So to s; choose anyone of them, and call itxs' ) We need to map s to some equivalence class of Rj ; the logical choice is the class rontaining Xs' Thus we define I-L(s) = [Xs]RL An immediate question comes to mind: There may be several strings that we could use for X s ; does it matter which one we choose to find the equivalence class? It would not do if, say, RL consisted of two equivalence classes, the even-length strings = [l1]R L and the odd-length strings = [Ok, and both 8(so,0) and 8(so,11) equaled s. Then, on the one hand, I-L(s) should be [O]RL , and, on the other hand, it should be [l1]R L ' I-L must be a function; it cannot send s to two different equivalence classes. Note that there would be no problem if8(so, 11.) = sand 8(so1111) = s, since [l1]R L = [l1l1]R L , both of which represent the set of all even-length strings. Here X s could be 11, or it could be 1111, and there is no inconsistency in the way in which I-L(s) is defined; in either case, s is mapped by I-L to the class of even-length strings. Thus we must first show: 96 Minimization of Finite Automata Chap. 3 1. ,.... is well-defined (which means it is defined everywhere, and the definitions are consistent; that is, if there are two choices for xs, say, Y and z, then [y ]RL= [Z]RJ Since A is connected, each state s can be reached by some string Xs; that is, (Vs E S )(3xsE I *)(8(so,xs) = s), and so there is indeed (at least) one equivalence class ([XS]RJ to which s maps under ,..... We therefore have ,....(s) = [Xs]RL' Thus u is defined everywhere. We must still make sure that u is not multiply defined: Let x.y E I* and assume 8(so,x) =8(so,Y). Then 8(so, x) = 8(so,Y):::} (by definition of =) (Vu E I*)(8(8(so, x), u) E F ~ 8(8(so,Y), u) E F):::} (by Theorem 1.1) (Vu E I*)(8(so, xu) E F ~ 8(so,yu) E F):::} (by definition ofL) (Vu E I *)(xu E L ~ yu E L):::} (by definition of Rd x RLy:::} (by definition of [ ]) [X]RL= [Y]RL Thus, if both x and Y take us from So to s, then it does not matter whether we let ,....(s) equal [x]RLor [y ]RL, since they are identical. ,.... is therefore a bona fide function. 2. ,.... is onto SRL' Every equivalence class must be the image of some state in S, since (V[xk E SRJ([X]RL= ,....(8(so, x))), and so 8(so,x) maps to [X]RL. 3. ,....(so) = SORL' 4. Final states map to final states; that is, (Vs E S)(,....(s) E FRL ~ s E F). Choose an s E S and pick a corresponding x, E I * such that 8(so,xs) = s. Then s E F~ (by definition of x" L ) Xs E L (A)~ (by definition of L) XsE L~ (by definition of ~J [xs] E FRL~ (by definition of ,....) ,....(s) E FRL 5. The transitions match up; that is, (Vs E S)(Va E I)(,....(5(s, a)) = 5RL(,....(S), a)). Choose an s E S and pick a corresponding x, E I * such that 8(so, xs) = s. Note that this implies that [xs] = ,....(s) = ,....(8(so,xs))' Then ,....(5(s, a)) = (by definition of xs) ,....(5(8(so,xs),a)) = (by Theorem 1.1) ,....(8(so,xsa)) = (by definition of ,....) Sec. 3.2 Minimization Algorithms 97 [xSa]RL= (by definition of 8RJ 8RL([Xs]RL, a) = (by definition of IJ. and xs) 8RL(IJ.(S), a) So far we have not needed the fact that A was reduced. In fact, we have now proved that IJ. is a homomorphism from A to ARL as long as A is merely connected. However, if A is reduced, we can show: 6. IJ. is one to one; that is, if lJ.(s) = lJ.(t), then s = t. Let s, t E S and assume lJ.(s) = lJ.(t). IJ.(s) = IJ.(t)~ (by definition of =) (Vu E !,*)(8RL(IJ.(S), u) = 8RL(IJ.(t), u))~ [by property (5), induction] (Vu E !,*)(1J.(8(s, u)) = 1J.(8(t, u))) ~ (by definition of =) (Vu E !,*)(1J.(8(s, u)) E FRL~ 1J.(8(t, u)) E FRJ~ [by property (4) above] (Vu E !'*)(8(s, u) E F ~ 8(t, u) E F)~ (by definition of EA) SEA t~ (since A is reduced) s=t Thus, by results (1) through (6), IJ. is a well-defined homomorphism that is also a bijection; so IJ. is an isomorphism and therefore A == ARL. V Corollary 3.3. Let A and B be reduced and connected finite automata. Under these conditions, A is equivalent to B iffA == B. Proof. If A == B, it is easy to show that A is equivalent to B (as indicated in the exercises, this implication is true even if A and B are not reduced and connected). Now assume the hypothesis that A and B are reduced and connected does hold, and that A is equivalent to B. Since A is minimal, A == ARL(A)' Similarly, B == ARL(B)" Since L (A) = L (B), ARL(A) = ARL(B)' Therefore, A == ARL(A) = ARL(B) == B. A 3.2 MINIMIZATION ALGORITHMS From the results in the previous section, it follows that a reduced and connected finite automaton must be minimal. This section demonstrates how to transform an existing DFA into an equivalent machine that is both reduced and connected and hence is the most efficient machine possible for the given language. The designer of an automaton can therefore focus solely on producing a machine that recognizes the correct set of strings (without regard for efficiency), knowing that the techniques presented in this section can later be employed to shrink the DFA to its optimal size. 98 Minimization of Finite Automata Chap. 3 The concepts explored in Chapters 4,5, and 6 will provide further tools to aid in the design process and corresponding techniques to achieve optimality. V Corollary 3.4. A reduced and connected deterministic finite automaton A = <I, S, so, 8, F> is minimal. Proof. By Theorem3.!, there is an isomorphism between A and ARc Since an isomorphism is a bijection between the state sets, IISI/ = IISRJ. By Corollary 3.2, ARL has the smallest number of states, and therefore so does A. a Thus, if we had a machine for L that we could verify was reduced and connected, we would be able to state that we had found the minimal machine accepting L. We therefore would like some algorithms for determining if a machine M has these properties. We would also like to find a method for transforming a non optimal machine into one with the desired properties. The simplest transformation is from a disconnected machine to a connected machine: given any machine A, we will define a connected machine N that accepts the same language that A did; that is, L(A) =L(N). V Definition 3.7. Given a finite automaton A = <I, S, so,8, F>, define a new automaton Ac = <I, S', s3,8c, FC>, called A connected, by SC = {sE S 13x E I* ~ &(so, x) = s} s3 = So pc=Fnsc={fEFI3x EI* ~ &(so,x) =f} and 8c is derived from the restriction of 8 to SC x I: (\fa E I)(\fs E SC)(8«s, a) = 8(s, a)) AC is thus simply the machine A with the unreachable states "thrown away"; So can be reached by x = A,so it is a valid choice for the start state in N. F' is simply the final states that can be reached from so, and 8C is the collection of transitions that still come from (and consequently point to) states in the connected portion. Actually, 8C was defined to be the transitions that merely come from states in SC, with no mention of any restrictions on the range of 8C• We must have, however, 8c: SC x I~ sc; in order for N to be well defined, 8Cmust be shown to map into the proper range. It would not do to have a transition leading from a state in SC to a state that is not in the new state set of N. The fact that 8C does indeed have the desired properties is relegated to the exercises. EXAMPLE 3.13 Let M=<{a,b},{qo,qt.q2,q3},qo,8,{qt.Q3}>, as illustrated in Figure 3.7a. By inspection, the only states that can be reached from the start state are qo and q3' Sec. 3.2 Minimization Algorithms 99 b b b (a) (b) Figure 3.7 (a) The DFA M discussed in Example 3.13 (b) The DFA M' discussed in Example 3.13 Hence MC = <{a, b},{q-, q3}, qo,8c, {q-]'>. The resulting automaton is shown in Figure 3.7b. An algorithm for effectively computing SC will be presented later. V Theorem 3.2. Given any finite automaton A, the new machine N is indeed connected. . Proof. This is an immediate consequence of the way SC was defined. Definition 3.7 and Theorem 3.2 would be of little consequence if it were not for the fact that A and N accept the same language. N is in fact equivalent to A, as proved in Theorem 3.3. V Theorem 3.3. Given any finite automaton A = <I, S, sO, 8, F>, A and N are equivalent, that is, L (N) = L (A). Proof. Let x E I *. Then: x EL(A)~(by definition ofL) 3s E S :l (8(so,x) = s 1\ s E F)~ (by definition of SC) s E SC 1\ s E F~ (by definition of n) s E (SC n F)~ (by definition of PC) s E F'~ (by definition of s) 8(so, x) E F"~ (by definition of 8c and induction) 8C(so, x) E Pc~ (by definition of s3) 8C(s3,x) E F"~ (by definition of L) x EL(N) 100 Minimization of Finite Automata Chap. 3 Thus, given any machine A, we can find an equivalent machine (that is, a machine that accepts the same language as A) that is connected. Furthermore, there is an algorithm that can be applied to find N (that is, we don't just know that such a machine exists, we actually have a method for calculating what it is). The definition of SC implies that there is a procedure for finding SC: one can begin enumerating the strings x in ~", and by applying the transition function to each x, the new states that are reached can be included in SC. This is not a very satisfactory process because there are an infinite number of strings in ~* to check. However, the indicated proof for Theorem 2.7 shows that, if a state can be reached by a "long" string, then it can be reached by a "short" string. Thus, we will only need to check the "short" strings. In particular, SC:=: U B(so,x) = U B(so,x) xE~' xEQ where Q consists of the "short" strings: Q = {x E ~*I Ix I< II S II}. Thus, Q is the set of all strings of length less than the number of states in the DFA. Q is a finite set, and therefore we can check all strings x in Q in a finite amount of time; we therefore have an algorithm (that is, a procedure that is guaranteed to halt) for finding SC, and consequently an algorithm for constructing AC. Thus, given any machine, we can find an equivalent machine that is connected. The above method is not very efficient because many calculations are constantly repeated. A better algorithm based on Definition 3.10 will be presented later. We now turn our attention to building a reduced machine from an arbitrary machine. The following definition gives a consistent way to combine the redundant states identified by the state equivalence relation EA. V Definition 3.8. Given a finite automaton A = <~, S, so,B,F>, define a new finite automaton A/E A, called A modulo its state equivalence relation, by A/E A = <~, SEA' SOEA' BEA, FEA> where SEA = {[S]EAI s E S} SOEA = [SO]EA PEA = {[S]EAI s E F} and BEAis defined by ('VaE ~)('V[s] E SEA)(BEA([S]EA, a) = [B(s, a)]EA) Thus, there is one state in A/EA for each equivalence class in EA , the new start state is the equivalence class containing so, and the final states are those equivalence classes that are made up of states from F. The transition function is also defined in a natural manner: Given an equivalence class [t]EAand a letter a, choose one state, say Sec. 3.2 Minimization Algorithms 101 t, from the class and see what state the old transition specified (8(t, a)). The new transition function will choose the equivalence class containing this new state ([8(t, a)]EA)' Once again, there may be several states in an equivalence class and thus several states from which to choose. We must make sure that the definition of 8EA does not depend on which state of [t]EAwe choose (that is, we must ascertain that 8EA is well defined.) Similarly, FEAshould be shown to be well defined (see the exercises). It stands to reason that if we coalesce all the states that performed the same function (that is, were related by EA) into a single state the resulting machine should no longer have distinct states that perform the same function. We can indeed prove that this is the case, that is, that A/EAis reduced. V Theorem 3.4. Given a finite automaton A = <!', S ,so, 8, F>, A/EA is reduced. Proof. Note that the state equivalence relation for A/EAis E(A/E ), not EA. We need to show that if two states of A/EAare related by the state equiva1ence relation for A/EAthen those two states are identical; that is, (Vs, t E SEA)(S E(A/EA) t ¢::> s = t) Assume s, t E SEA' Then 3s', t' E S ~ s = [S']EAand t = [t']EA; furthermore, s E(A/EA) t ¢::> (by definition of s', t') [S']EA E(A/EA) [t']EA¢::> (by definition of E(AfE)) (Vx E !, *)(8EA([S'], x) E FEA¢::> 8EA([t'],x) E FEA) ¢::> (by 8EAand induction) (Vx E!, *)([8(s', x)] E FEA¢::> [8(t', x)] E FEA) ¢::> (by definition of FEA) (Vx E !'*)(8(s' ,x) E F ¢::> 8(t' ,x) E F) ¢::> (by definition of EA) s' EAt' ¢::> (by definition of [ ]) [S']EA = [t']EA ¢::> (by definition of s, t) s=t Since we ultimately want to first apply Definition 3.7 to find a connected DFA and then apply Definition 3.8 to reduce that DFA, we wish to show that this process of obtaining a reduced machine does not destroy connectedness. We can be assured that if Definition 3.8 is applied to a connected machine the result will then be both connected (Theorem 3.5) and reduced (Theorem 3.4). V Theorem 3.5. If A = <!', S, sO, 8,F> is connected, then A/EAis connected. Proof. We need to show that every state in A/EAcan be reached from the start state of A/EA' Assume s E SEA' Then 3s' E S ~ s = [S']EA; but A was connected, and so there exists an x E!,* such that 8(so,x) = s'; that is, there is a string that will take 102 Minimization of Finite Automata Chap. 3 us from So to s' in the original machine A. This same string will take us from SOE to s in A/E A since A 8(so,x) = s' ~ (by definition of =) [8(so,X)]EA= [S']EÃ (by definition of 8EAand induction) 8EA([SO]EA'X) = [S']EÃ (by definition of SOE) 8EA(SOEA' X) = [S']EA Therefore, every state s E SEA can be reached from the start state and A/EA is thus connected. d Finally, we want to show that we do not change the language by reducing the machine. The following theorem proves that A/EA and A are indeed equivalent. V Theorem 3.6. Given a finite automaton A=<I,S,so,8,F>, then L(A/EA) =L(A). Proof. x E L (A/EA) ~ (by definition of L ) 8EA(SOEA' x) E FEÃ (by definition of SOE) 8EA([SO]EA' x) E FEÃ (by definition of 8EAand induction) [8(so, x) ]EA E FEÃ (by definition of FEA) 8(so,x) E F~ (by definition of L ) x EL(A) V Theorem 3.7. Given a finite automaton definable language L and any finite automaton A that accepts L, then there exists an algorithm for constructing the unique (up to isomorphism) minimum-state finite automaton accepting L. Proof. For the finite automaton A that accepts L, there is an algorithm for finding the set of connected states in A, and therefore there exists an algorithm for constructing N, which is a connected automaton with the property that L (N) = L (A) = L. Furthermore, there exists an algorithm for computing EAc, the state equivalence relation on N; consequently, there is an algorithm for constructing N/EAc, which is a reduced, connected automaton with the property that L (N/EAc) = L (N) =L (A) = L. From the main theorem on minimization (Theorem 3.1), we know that N/EAc == ARL, and ARL is the unique (up to isomorphism) minimum-state finite auSec. 3.2 Minimization Algorithms 103 tomaton accepting L. Consequently, the derived automaton AC!EAc is likewise a minimum-state automaton. a The remainder of the chapter is devoted to developing the methods for computing SC and EAand justifying that the resulting algorithms are indeed correct. Our formal definition of EArequires that an infinite number of strings be checked before we can find the equivalence classes upon which AC!EAc is based. Ifwe could find an algorithm to generate EA, we would then have an algorithm for building the minimal machine. This is the motivation for Definition 3.9. V Definition 3.9. Given a finite automaton A = <I, S, sO, 3, F> and an integer i, define the ith partial state equivalence relation on A, a relation between the states of A denoted by EiA, by ('Is, t E S)(SEiAt ¢> ('Ix E I* :l[x I::; i)(3(s, x) E F ¢> 3(t,x) E F)) Thus EiArelates states that cannot be distinguished by strings of length i or less. Contrast this to the definition of E A, which related states that could not be distinguished by any string of any length. EOAdenotes a relatively weak criterion that is progressively strengthened with successive EiArelations. As illustrated by Example 3.14, these relations culminate in the relation we seek, EA. EXAMPLE 3.14 Let B be the DFA illustrated in Figure 3.8. Consider the relation EOB' The empty string A can differentiate between qo and the final states, but cannot differentiate between qr, qz, q3, and q4' Thus EOB has two equivalence classes, {qo} and {q., qz, q3,q4}' In E1B, A still differentiates qo from the other states, but the string 1 can distinguish q, from qb qz, and q, since 3(q3' 1) $.F, but 3(qi' 1) E Ffor i = 1, 2, and 4. We still cannot distinguish between q., qz, and q, with strings of length 0 or 1, so these remain together and E1B={{qO},{q3},{qbqz,q4}}' Similarly, since 3(q}, 11) E F but 3(qz, 11) $.F and 3(q4' 11) $.F, EZB= {{qo}, {q3}, {q.}, {qz,q4}}' Further investigation shows EZB= E3B= E4B= ESB= ... , and indeed E B= EZB' 1 1 Figure 3.8 The DFA 8 discussed in Example 3.14 104 Minimization of Finite Automata Chap. 3 The ith state equivalence relation provides a convenient vehicle for computing EA. The behavior exhibited by the relations in Example 3.14 follow a pattern that is similar for all deterministic finite automata. The following observations will culminate in a proof that the calculation of successive partial state equivalence relations is guaranteed to lead to the relation EA. Given an integer i and a finite alphabet I, there is clearly an algorithm for finding EiAsince there are only a finite number of strings in IO U II U I Zu ... U Ii. Furthermore, given every EiA, there is an expression for EA: '" EA= EOAn EIAn EZAn E3An ... n E An* .. = n E An j=O J The proof is relegated to the exercises and is related to the fact that I* = IO U II U I Zu*** u In u*** Finally, it should be clear that if two states cannot be distinguished by strings of length 7 or less, they cannot be distinguished by strings of length 6 or less, which means E7Ais a refinement of E6A. This principle generalizes, as formalized below. V Lemma 3.1. Given a finite automaton A = <I, S, so, 8, F> and an integer m, Em+ IAis a refinement of EmA, which means (\:Is, t E S)(s Em+IAt =.? s EmAt) or Proof. See the exercises. Lemma 3.2 shows that each EmAis related to the desired EA. Lemma 3.1 thus shows that successive EmArelations come closer to "looking like" EA. V Lemma 3.2. Given a finite automaton A = <I, S, so, 8, F> and an integer m, EAis a refinement of EmA, and so That is, Proof. Let s, t E S. Then SEA t =.? (by definition of EA) (\:Ix EI*)(8(s,x) EF ~ 8(t,x) EF)=.? (true for all x, so it is true for all "short" x) (\:Ix E I* 11xl::5 m)(8(s,x) E F ~ 8(t,x) E F) =.? (by definition of EmA) sEmAt Sec. 3.2 Minimization Algorithms 105 While it is clearly possible to find a given EmAby applying the definition to each of the strings in !,O U!,l U!,2 U ... U !,m, there is a much more efficient way if Em-1Ais already known, as outlined in Theorem 3.8. A starting point is provided by EOA' which can be found very easily, as shown by Lemma 3.3. From EOA' E1Acan then be found using Theorem 3.8, and then E2A, and so on. 'il Lemma 3.3. Given a finite automaton A= <!"S,so,3,F>, EOA has two equivalence classes, F and S-P (unless either For S-Fis empty, in which case there is only one equivalence class, S). Proof. The proof follows immediately from the definition of EOA; the empty string A differentiates between final and nonfinal states, producing the equivalence classes outlined above. Ll Given EOAas a starting point, Theorem 3.8 shows how successive relations can be efficiently calculated. 'il Theorem 3.8. Given a finite automaton A = <!', S, so,3, F>, (Vs E S)(Vt E S)(Vi E N)(s Ei+1At ~ SEiAU\ (Va E !')(3(s, a) EiA3(t, a))) Proof. Let s E S, t E S. Then SEi+lA t~(Vx E!'* ~ Ixl::s; i + 1)(3(s, x) E F ~3(t,x) E F) ~(Vx E!'* ~ Ixl::S;i)[3(s,x)EF~3(t,x)EF]/\ (Vy E!'* ~ Iyl = i + 1)[3(s,y) EF~3(t ,y) E F] ~(Vx E!'* ~ Ixl::S;i )[3(s,x) EF~3(t,x) EF]/\ (Vy E!,* ~ 1 ::s; Iy I-s i + 1)[3(s, y) E F~ 3(t, y) E F] ~ (Vx E!'* ~ Ixl::s; i )[3(s,x) E F ~3(t,x) E F]/\ (Va E !')(Vx E!'* ~ Ixl ::s; i )[3(s, ax) E F~ 3(t, ax) E F] (Va E !')(Vx E!'* ~ IxI::s; i )(3(s, ax) E F~ 3(t, ax) E F) ~SEiA t /\ (Va E !')(Vx E!'* ~ IxI::s; i )(3(3(s, a), x) E F~ 3(3(t, a), x) E F) ~SEiA t /\ (Va E !')(3(s,a) EiA3(t, a)) Note that Theorem 3.8 gives a far superior method for determining successive EjArelations. The definition required the examination of many (long) strings using the 3 function; Theorem 3.8 allows us to simply check a few letters using the 3 function. Theorems 3.9, 3.10, and 3.11 will assure us that EAwill eventually be 106 Minimization of Finite Automata Chap. 3 found. The following theorem guarantees that the relations, should they ever begin to look alike, will continue to look alike as successive relations are computed. V Theorem 3.9. Given a finite automaton A = <I" s,so,8, F>, (3mE N ;l EmA= E m+1A) => (Vk E N)(Em+kA= EmA) Proof. By induction on k; see the exercises. The result in Theorem 3.9 is essential to the proof of the next theorem, which guarantees that when successive relations look alike they are identical to EA. V Theorem 3.10. Given a finite automaton A = <I" s,so,8, F>, (3m EN ;l EmA= Em+1A) => EmA= E A Proof. Assume 3m EN ;l EmA= Em+1Aand let q, rES: 1. By Lemma 3.2, qEAr => qEmAr. 2. Conversely, assume q EmAr. Then q EmAr => (by assumption) q Em+1Ar => (by Theorem 3.9) (Vj 2:m)(qEjAr) Furthermore, by Lemma 3.1, (Vj:5m)(qEjAr), and so (VjEN)(qEjAr); but by definition or EA, this implies q E Ar. We have just shown that qEmAr => qEAr. 3. Combining (1) and (2), we have (Vq, r E S)(qEmAr¢:>qEAr), and so EmA= EA. Ll The next theorem guarantees that these relations will eventually look alike (and so by Theorem 3.10, we are assured that successive computations of EiAwill yield an expression representing the relation EA). V Theorem 3.11. Given a finite automaton A = <I" s, so,8, F>, (3m EN ;l m :5IISIIA EmA= Em+1A). Proof. Assume the conclusion is false; that is, that EOA,E1A, ... ,EllsllA are all distinct. Since EllsllA C ... C E1AC EOA' the only way for two successive relations to be different is for the number of equivalence classes to increase. Thus, 0< rk(EoA) < rk(E1A) < rk(EzA) < ... < rk(EllsIIA), which means that rk(Ells1IA) > [S'], which is a contradiction (why?). Therefore, Sec. 3.2 Minimization Algorithms 107 not all these relations can be distinct, and so there is some index m for which EmA = Em+1A ' ~. V Corollary 3.5. Given a DFA A = <I, S, so, fl, F>, there is an algorithm for computing EA. Proof. EA can be found by using Lemma 3.2 to find EOA' and computing successive EiArelations using Theorem 3.8 until EiA= E i+1A; this EiAwill equal EA, and this will all happen before i reaches liS II, the number of states in S. The procedure is therefore guaranteed to halt. ~ Since EA was the key to producing a reduced machine, we now have an algorithm for taking a DFA and finding an equivalent DFA that is reduced. The other necessary step needed to find the minimal machine was to produce a connected DFA from a given automaton. This construction hinged on the calculation of SC, the set of connected states. The algorithm suggested by the definition of SC is by no means the most efficient; it involves checking long strings with the 5 function and hence massive duplication of effort. Furthermore, the definition seems to imply that all the strings in I* must be checked, which certainly cannot be completed if it is done one string at a time. Theorem 2.7 can be used to justify that it is unnecessary to check any strings longer than IISII (see the exercises). Thus SC = {S(so,x)Ilxl< IISII}. While this set, being based on a finite number of words, justifies that there is an algorithm for finding S" (and hence there exists an algorithm for constructing N), it is still a very inefficient way to calculate the set of accessible states. As with the calculation of EA, there is a way to avoid using 5 to process long strings when computing SC. In this case, a better strategy is to begin with So and find all the new states that can be reached from So with just one transition. Note that this can be done by simply examining the row of the state transition table corresponding to so, and hence the computation can be accomplished quite fast. Each of these new states should then be examined in the same fashion to see if they lead to still more states, and this process can continue until all connected states are found. A sequence of state sets is thereby constructed, in a similar manner to the way successive partial state equivalence relations E iA were built. This approach is reflected in Definition 3.10. V Definition 3.10. Given a finite automaton A = <I, S, so, fl, F>, the ith partial state set C; is defined by the following rules: Let Co = {so} and recursively define Ci+l = C, U U fl(q, a). qECj,aEI qS11 must equal SC (why?), and we will often arrive at the final answer long before /lSII iterations have been calculated (see the exercises and refer to the 108 Minimization of Finite Automata Chap. 3 treatment of EiA) . It can also be proved (by induction) that C; represents the set of all states that can be reached from So by strings of length i or less (see the exercises). Recall that the definition of SC involved the extended state transition function 8. Definition 3.10 instead uses the information found in the previous iteration to avoid calculating paths for long strings. As suggested earlier, there is an even more efficient method of calculating Ci+1 from C;, since only paths from the newly added states need be explored anew. EXAMPLE 3.15 Consider the DFA D given in Figure 3.9. Co = {so} and since 8(so,a) = s, and 8(so,b)= S3, CI = {so, SJ, S3} Note that there is no need to check So again, but Sl and S3 generate Checking these two new states generates one more state, so and since Ss leads to no new states, we have C4 = C3 ; as with E iA , we will now find C3= C4 = Cs = C6 = ... = SC. The exercises will develop the parallels between the generation of the partial state sets C; and the generation of the partial state equivalence relations EiA • The procedure for recursively calculating successive C;s to determine SC proS S9 a,b a b b a a a b Figure 3.9 The DFA D discussed in Example 3.15 Sec. 3.2 Minimization Algorithms 109 vides the final algorithm needed to efficiently find the minimal machine corresponding to a given automaton A. From A, we use the CiS to calculate SC and thereby define N. Theorems 3.8 and related results suggest an efficient algorithm for computing EAc, from which we can construct N/EAc. N/EAc is indeed the minimal machine equivalent to A, as shown by the results in this chapter. Theorems 3.3 and 3.6 show that AC/EACis equivalent to A. By Theorems 3.2,3.4, and 3.5, this automaton is reduced and connected, and Corollary 3.4 guarantees that N/EAc must therefore be minimal. The proof of Theorem 3.7 suggests building a minimal equivalent determinstic finite automaton for A by first shrinking to a connected machine and then reducing modulo the state equivalence relation, that is, by finding N/EN . Theorem 3.5 assures us that when we reduce a connected machine it will still be connected. An alternate strategy would be to first reduce modulo EA and then shrink to a connected machine, that is, to find (A/EAt In this case, we would want to make sure that connecting a reduced machine will still leave us with a reduced machine. It can be shown that if A is reduced then N is reduced (see the exercises), and hence this method could also be used to find the minimal equivalent DFA. Finding the minimal equivalent DFA by reducing A first and then eliminating the disconnected states is, however, less efficient than applying the algorithms in the opposite order. Finding the connected set of states is simpler than finding the state equivalence relation, so it is best to eliminate as many states as possible by finding SC before embarking on the more complex search for the state equivalence relation. It should be clear that the algorithms in this chapter are presented in sufficient detail to easily allow them to be programmed. As suggested in Chapter 1, the final states can be represented as a set and the transition function as a matrix. The minimization procedures would then return the minimized matrix and new final state set. As a practical matter then, when generating an automaton to perform a given task, our concern can be limited to defining a machine that works. No further creative insight is then necessary to find the minimal machine. Once a machine that recognizes the desired language is found (however inefficient it may be), the minimization algorithms can then be applied to produce a machine that is both correct and efficient. The proof that a reduced and connected machine is the most efficient was based on the properties of the automaton ARL obtained from the right congruence RL . This can be proved without relying on the existence of ARL • We close this chapter with an outline of such a proof. The details are similar to the proofs given in Chapter 7 for finite-state transducers. Theorem 3.3, which was not based in any way on RL , implies that a minimal DFA must be connected. Similarly, an immediate corollary of Theorem 3.6 is that a minimal DFA must be reduced. Thus, a minimal machine is forced to be both reduced and connected. We now must justify that a reduced and connected machine is minimal. This result will follow from Corollary 3.3, which can also be proved without relying on ARL • The implication (A == B ~ A is equivalent to B) is due solely 110 Minimization of Finite Automata Chap. 3 to the properties of isomorphisms and is actually true irrespective of any other hypotheses (see the exercises). Conversely, if A is equivalent to B, then the fact that A and B are both reduced and connected allows an isomorphism to be defined from A to B (see the exercises). Corollary 3.3 allows us to argue that any reduced and connected automaton A is isomorphic to a minimal automaton M, and hence A has as few states as M and is minimal. The argument would proceed as follows: Since M is minimal, we already know that Theorems 3.3 and 3.6 imply that M is reduced and connected. Thus, M and A are two reduced and connected equivalent automata, and Corollary 3.3 ensures that A 0= M. Thus, minimal machines are exactly those that are reduced and connected. EXERCISES 3.1. Use induction to show ("Is ES)(Vx E I*)(fL(B(s, x» = BRL(fL(S), X»for the mapping fL defined in Theorem 3.1. Do not appeal to the results of (5) in the proof of Theorem 3.1. 3.2. Consider the state transition function given in Definition 3.8 and use induction to show ("Ix E I *)(V[S]EA E SEA)(BEA([S]EMX) = [B(S,X)]EA) 3.3. Prove that EA=EoAnElAnEzAnE3An*** nEnAn*** = n EjA j~O 3.4. Given a finite automaton A = <I, S, sO, B,F>, show that the function BEAgiven in Definition 3.8 is well defined. 3.5. Given a finite automaton A = <I, S, sO, B,F>, show that the set FEAgiven in Definition 3.8 is a well-defined set. 3.6. Show that the range of the function Be given in Definition 3.7 is contained in S". 3.7. Prove Lemma 3.1. 3.8. Prove Lemma 3.3. 3.9. Prove Theorem 3.9. 3.10. Given a homomorphism fL from the finite automaton A = <I, SA,SOM BA,FA> to the DFA B = <I, SB, SOB' BB, FB>,prove by induction that ("Is .SA)(Vx E I*)(fL(BA(s,x» = BB(fL(S), x» 3.11. Given a homomorphism fL from the finite automaton A = <I, SA, SOM BA,FA> to the DFA B = <I, SB,SOB' BB, FB>,prove thatL(A) = L(B). As long as it is explicitly cited, the result of Exercise 3.10 may be used without proof. 3.12. (a) Give an example of a DFA for which A is not connected and A/EAis not connected. (b) Give an example of a DFA for which A is not connected but A/EAis connected. 3.13. Given a finite automaton A = <I, S, sO, B,F> and the state equivalence relation EA, show there exists a homomorphism from A to A/E A. 3.14. Given a connected finite automaton A = <I, S, sO, B,F>, show there exists a homomorphism from A to ARL(A) by: (a) Define a mapping IjJ from A to ARL(A)' (No justification need be given.) (b) Prove that your IjJ is well defined. (c) Prove that IjJ is a homomorphism. Chap. 3 Exercises 111 3.15. Give an example to show that there may not exist a homomorphism from A to ARL(A) if A is not connected (see Exercise 3.14). 3.16. Give an example to show that there may still exist a homomorphism from A to ARL(A) even if A is not connected (see Exercise 3.14). 3.17. Give an example to show that, for the relations Rand RL given in Theorem 2.2, there need not exist a homomorphism from AR L to AR • 3.18. == is an equivalence relation; in Chapter 2 we saw some relations were also right congruences. Comment on the appropriateness of asking whether == is a right congruence. 3.19. Is EAa right congruence? Explain your answer. 3.20. Prove that if A is reduced then Ae is reduced. 3.21. For a homomorphism J.L between two finite automata A <k, SA,SOA' SA, FA> and B = <k, Se,Soa,Ss,Fs>, prove (Vs, t E SA)(J.L(s)Es J.L(t) ~ sEA t). 3.22. Let M be a DFA, and let L =L(M). (a) Define a mapping", from Me to A(RM) . (No justification need be given.) (b) Prove that your", is well defined. (e) Prove that e is a homomorphism. (d) Prove that", is a bijection. (e) Argue that Me == A(RM). 3.23. For the machine A given in Figure 3.10a, find: (a) EA (list each EfA) (b) L(A) (e) ARL(A) (d) RL(A) (e) A/EA 3.24. For the machine B given in Figure 3.lOb, find: (a) Ee (list each Eie) (b) L(B) (e) ARL(B) (d) RL(B) (e) alEe Note that your answer to part (e) might contain some disconnected states. 3.25. For the machine C given in Figure 3.lOc, find: (a) Ec (list each Efc) (b) L(C) (e) ARL(C) (d) RL(c) (e) ClEe Note that your answer to part (e) might contain some disconnected states. 3.26. For the machine D given in Figure 3.10d, find: (a) E (list each E, D) (b) L(D) (e) ARLIJ) (d) RL(D) (e) DIED Note that your answer to part (e) might contain some disconnected states. 112 Minimization of Finite Automata Chap. 3 a a b b a (d) Figure 3.10 (a) The DFA A discussed in Exercise 3.23 (b) The DFA B discussed in Exercise 3.24 (c) The DFA C discussed in Exercise 3.25 (d) The DFA D discussed in Exercise 3.26 Chap. 3 Exercises 113 3.27. Without relying on AR L , prove that if A and B are both reduced and connected equivalent DFAs then A 2: B. Give the details for the following steps: (a) Define an appropriate function l\J between the states of A and the states of B. (b) Show that l\J is wen defined. (c) Show that l\J is a homomorphism. (d) Show that l\J is a bijection. 3.28. In the proof of (6) in Theorem 3.1, one transition only involved c> rather than ¢:>. Show by means of an example that the two expressions involved in this transition are not equivalent. 3.29. Supply reasons for each of the equivalences in the proof of Theorem 3.8. 3.30. Minimize the machine defined in Figure 3.3. 3.31. (a) Give an example of a DFA for which A is not reduced and AC is not reduced. (b) Give an example of a DFA for which A is not reduced and AC is reduced. 3.32. Note that 2: relates some automata to other automata, and therefore 2: is a relation over the set of an deterministic finite automata. (a) For automata A, B, and C, show that if g is an isomorphism from A to Bandfis an isomorphism from B to C, thenf 0 g is an isomorphism from A to C. (b) Prove that 2: is a symmetric relation; that is, formally justify that if there is an isomorphism from A to B then there is an isomorphism from B to A. (c) Prove that 2: is a reflexive relation. (d) From the results in parts (a), (b), and (c), prove that 2: is an equivalence relation over the set of an deterministic finite automata. 3.33. Show that homomorphism is not an equivalence relation over the set of alJ deterministic finite automata. 3.34. For the relations Rand RL given in Theorem 2.2, show that there exists a homomorphism from AR to AR L • 3.35. Prove that if there is a homomorphism from A to B then RArefines RB • 3.36. Prove that if A is isomorphic to B then RA= R B (a) By appealing to Exercise 3.35. (b) Without appealing to Exercise 3.35. 3.37. Consider two deterministic finite automata for which A is not homomorphic to B, but RA=RB • (a) Give an example of such automata for which L (A) = L (B). (b) Give an example of such automata for which L(A) =F L(B). (c) Can such examples be found if both A and B are connected and L(A) = L(B)? (d) Can such examples be found if both A and B are reduced and L (A) = L (B)? 3.38. Disprove that if A is homomorphic to B then RA = RB • 3.39. Prove or give a counterexample [assume L =L(M)]. (a) For any DFA M, there exists a homomorphism l\J from A(RM) to M. (b) For any DFA M, there exists an isomorphism l\J from A(RM ) to M. (c) For any DFA M, there exists a homomorphism l\J from M to A(RM) . 3.40. Prove that if A is a minimal DFA then RA= RL(A). 3.41. Give an example to show that Exercise 3.40 can be false if A is not minimal. 3.42. Give an example to show that Exercise 3.40 may still hold if A is not minimal. 3.43. Definition 3.8 takes an equivalence relation ofthe set of states S and defines a machine based on that relation. In general, we could choose a relation R in S and define a 114 Minimization of Finite Automata Chap. 3 machine AIR (as we did when we defined A/E A when the relation R was EA). (a) Consider R = EOA' Is A/E oA always well defined? Give an example to illustrate your answer. (b) Assume R is a refinement of EA. Is AIR always well defined? FOrthe cases where it is well defined, consider the theorems that would correspond to Theorems 3.4,3.5, and 3.6 if EA were replaced by such a refinement R Which of these theorems would still be true? 3.44. Given a DFA M, prove or give a counterexample. (a) There exists a homomorphism from M/EM to ARL(M)' (b) There exists a homomorphism from ARt.(M) to M/EM • 3.45. Prove that the bound given for Theorem 3.11 can be sharpened: given a finite automaton A = <I,S,so,8,F>, (3m EN?J m < Iisil/\ Em A = Em+ IA). 3.46. Prove or give a counterexample: (a) If A and B are equivalent, then A and B are isomorphic. (b) If A and B are isomorphic, then A and B are equivalent. 3.47. Given a finite automaton A = <I, S, sO, 8, F>, prove that the C;s given in Definition 3.10 are nested: (Vi E N)(C;~ Ci+I)' 3.48. Prove (by induction) that C; does indeed represent the set of all states that can be reached from So by strings of length i or less. 3.49. Prove that, given a finite automaton A = <I, S, sO, 8, F>, (3i E N ?J C;= Ci+I)=? (Vk E N)(C; = Ci+k)' 3.50. Prove that, given a DFA A = <I, S, sO, 8, F>, (3i EN ?J C;= Ci+I)=? (C; = SC). 3.51. Prove that, given a finite automaton A = <I, S, sO, 8, F>, 3i EN ?J C; = C;+I' 3.52. Prove that, given a DFA A = <I, S, sO, 8, F>, (3i E N ?J i :511 S 11/\ C; = SC). 3.53. Use the results of Exercises 3.47 through 3.52 to argue that the procedure for generating SC from successive calculations of C, is correct and is actually an algorithm. 3.54. Give an example of two DFAs A and B that simultaneously satisfy the following three criteria: 1. There is a homomorphism from A to B. 2. There is a homomorphism from B to A. 3. There does not exist any isomorphism between A and B. 3.55. Assume R and Q are both right congruences of finite rank, R refines Q, and L is a union of equivalence classes of Q. (a) Show that L is also a union of equivalence classes of R (b) Show that there exists a homomorphism from AR to AQ • (Hint: Do not use the fJ. given in Theorem 3.1; there is a far more straightforward way to define a mapping.) (c) Give an example to show that there need not be a homomorphism from Ao to AR • 3.56. Prove that Ao must be connected. 3.57. Prove that if there is an isomorphism from A to B and A is connected then B must also be connected. 3.58. Prove that if there is an isomorphism from A to Band Bis connected then A must also be connected. 3.59. Disprove that if there is a homomorphism from A to B and A is connected then B must also be connected. Chap. 3 Exercises 115 3.60. Disprove that if there is a homomorphism from A to Band Bis connected then A must also be connected. 3.61. Given a DFA A, recall the relation RAon I* induced by A. This relation gives rise to another DFA A(RA) [with Q = RAand L =L(A)J. Consider also the connected version of A,Ac• (a) Define an isomorphism IjJ from A(RA) to AC• (No justification need be given.) (b) Prove that your IjJ is well defined. (c) Prove that IjJ is a homomorphism. (d) Prove that IjJ is an isomorphism. 3.62. Assume that A and B are connected DFAs. Assume that there exists an isomorphism IjJ from A to B and an isomorphism fJ.. from B to A. Prove that IjJ = fJ.. -1. 3.63. Assume that A and Bare DFAs. Assume that there exists an isomorphism IjJ from A to B and an isomorphism fJ.. from B to A. Give an example for which IjJ =f fJ..-1. 3.64. Give an example of a three-state DFA for which EOAhas only one equivalence class. Is it possible for EOAto be different from E1Ain such a machine? Explain. 3.65. Assume A and B are both reduced and connected. If IjJ is a homomorphism from A to B, does IjJ have to be an isomorphism? Justify your conclusions. 3.66. Prove Corollary 3.2. 3.67. Prove that SC = {8(So, x) l l-l $IISII}. 3.68. Given a finite automaton A = <I, S, Sn, 8, F>, two states s, t E S, and the automata A' = <I,S, t, 8, F> and AS = <I, S, s, 8, F>, prove that sEA t ~ L(AS ) =L(At ) . 3.69. Given a finite automaton A = <I, S, ss,8, F>, consider the terminal sets T(A, t) = {x 1"B(t,x) E F} and initial sets I(A, t) = {x 18(so, x) = t} for each t E S. (a) Prove that the initial sets of A must form a partition of I ", (b) Give an example to show that the terminal sets of A might not partition I *. (c) Give an example to show that the terminal sets of A might partition I*. CHAPTER NONDETERMINISTIC FINITE AUTOMATA A nondeterministic finite automaton, abbreviated NDFA, is a generalization of the deterministic machines that we have studied in previous chapters. Although nondeterministic machines lack some of the restrictions imposed on their deterministic cousins, the class of languages recognized by nondeterministic finite automata is exactly the same as the class of languages recognized by deterministic finite automata. In this sense, the recognition power of nondeterministic finite automata is equivalent to that of deterministic finite automata. In this chapter we will show the correspondence of nondeterministic finite automata to deterministic finite automata, and we will prove that both types of machines accept the same class of languages. In a later section, we will again generalize our computational model to allow nondeterministic finite automata that make transitions spontaneously, without an input symbol being processed. It will be shown that the class of languages recognized by these new machines is exactly the same as the class of languages recognized by our first type of nondeterministic finite automata and is thus the same as the class of languages recognized by deterministic finite automata. 4.1 DEFINITIONS AND BASIC THEOREMS Whereas deterministic finite automata are restricted to having exactly one transition from a state for each a E~, a nondeterministic finite automaton may have any number of transitions for a given input symbol, including zero transitions. When processing an input string, if an NDFA comes to a state from which 116 Sec. 4.1 Definitions and Basic Theorems 117 there is no transition arc labeled with the next input symbol, the path through the machine which is being followed is terminated. Termination can take the place of the "garbage state" (a permanent rejection state) found in many deterministic finite automata, which is used to reject some strings that are not in the language recognized by the automaton (the state S7played this role in Examples 1.11 and 1.9). EXAMPLE 4.1 Let L = {w E {a, b, e}*j3y E {b}* ~ w = aye}. We can easily build a nondeterministic finite automaton that accepts this set of words. One such automaton is displayed in Figure 4.1. In this example there are no transitions out of Solabeled with either b or c, nor are there any transitions from Sl labeled with a. From state S2 there are no transitions at all. This means that if either b or e is encountered in state So or a is encountered in state Sl, or any input letter is encountered once we reach state S2, the word on the input tape will not be able to follow this particular path through the machine. Thus, if a word is not fully processed by the NDFA, it will not be considered accepted (even if the state in which it was prematurely "stuck" was a final state). Figure 4.1 The NDFA described in Example 4.1 An equivalent, although more complicated, deterministic finite automaton is given in Figure 4.2. Note that this deterministic finite automaton requires the introduction of an extra state, a dead state or garbage state, to continue the processing of strings that are not in the language. Figure 4.2 A deterministic version of the NDFA in Example 4.1 118 Nondeterministic Finite Automata Chap. 4 A nondeterministic finite automaton may also have multiple transitions from any state for a given input symbol. For example, consider the following construction of a nondeterministic acceptor for the language L, which consists of all evenlength strings along with all strings whose number of Is is a multiple of 3. That is, L = {x E{O, 1}* Ilxl =Omod2 V Ixll =Omod3}. EXAMPLE 4.2 In the NDFA given in Figure 4.3, there are multiple transitions from state so: processing the symbol 0 causes the machine to enter states S1 and 81, whereas processing a 1 causes the machine to enter both state S3 and state S2. Figure 4.3 The NDFA discussed in Example 4.2 Within a nondeterministic finite automaton there can be multiple paths that are labeled with the components of a string. For example, if we let w = 01, then there are two paths labeled by the components of w: (sõ S1~ S3) and (sõ S2~ S4)' The second path leads to a final state, S4, while the first path does not. We will adopt the convention that this word w is accepted by the automaton since at least one of the paths does terminate in a final state. These concepts will be formalized later in Definition 4.3. This ability to make multiple transitions from a given state can simplify the construction of the machine, but adds no more power to our computational model. The deterministic machine equivalent to Example 4.2 is substantially more complex, and its construction is left as an exercise for the reader. Another restriction that is relaxed when we talk about nondeterministic finite automata is the number of initial states. While a deterministic machine is constrained to having exactly one start state, a nondeterministic finite automaton may have any number, other than zero, up to IISII. Indeed, some applications will be seen in which all the states are start states. Sec. 4.1 Definitions and Basic Theorems 119 EXAMPLE 4.3 We can build a machine that will accept the same language as in Example 4.2, but in a slightly different way. Note that in Figure 4.4 the multiplicity of start states simplifies the construction considerably. Figure 4.4 The NDFA discussed in Example 4.3 As before, the addition of multiple start states to our computational model facilitates machine construction but adds no more recognition power. We turn now to the formal definition of nondeterministic finite automata. V Def'mition 4.1. A nondeterministic finite automaton (NDFA) is a quintuple A = <I, S, So, 8, F> where: i. I is an alphabet. fl, S is a finite nonempty set of states. iii. So is a set of initial states, a nonempty subset of S. iv, 8: S x I-lop(S) is the state transition function. v, Fis the set of accepting states, a (possibly empty) subset of S. The input alphabet, the state space, and even the set of final states are the same as for deterministic finite automata. The important differences are contained in the definitions of the initial states and of the 8 function. The set of initial states can be any nonempty subset of the state space. These can be viewed as multiple entry points into the machine, with each start state beginning distinct, although not necessarily disjoint, paths through the machine. 120 Nondeterministic Finite Automata Chap. 4 The 8 function for nondeterministic finite automata differs from the 8 function of deterministic machines in that it maps a single state and a letter to a set of states. In some texts, one will find 8 defined as simply a relation with range S and not as a function; without any loss of generality we define 8 as a function with range p(S), which makes the formal proofs of relevant theorems considerably easier. EXAMPLE 4.4 Consider the machine A = <I, S, So, 8, F>, where I = {a, b} S={r,s,t} So={r,s} F ={t} and 8: {r, s, t] x {a,b}~ {0, {r}, is}, it}, is, t},{r, s},{r, t},{r, s, tH is given in Figure 4.5. 8 a b r {s} 0 s 0 {r,t} t 0 it} Figure 4.5 An NDFA state transition diagram corresponding to the formal definition given in Example 4.4 We will see later that this machine accepts strings that begin with alternating as and bs and end with one or more consecutive bs. ' V Definition 4.2. Given an NDFA A = <I, S, So,8, F>, the extended state transition/unction/or A is the function 8: S x I*~ p(S) defined recursively as follows: (V's E S) 8(s, >t) = is} (V'sES)(V'aEI)(V'xEI*) 8(s,xa)= U 8(q, a) q E8(s,x) Sec. 4.1 Definitions and Basic Theorems 121 Once again, &(s,x) is meant to represent where we arrive after starting at a state s and processing all the letters of the string x. In the case of nondeterministic finite automata, &does not map to a single state but to a set of states because of the multiplicity of paths. EXAMPLE 4.5 Consider again the NDFA displayed in Example 4.4. To find all the places a string such as bb can reach from s, we would first determine what can be reached by the first b. The reachable states are rand t, since &(s,b) = {r, t}, From these states, we would then determine what could be reached by the second b (from r, no progress is possible, but from t, we can again reach t), These calculations are reflected in the recursive definition of &: &(s,bb) = l) &(q, b) = U &(q, b) = &(r, b) U &(t, b) ={ }U {t}= {t} qE8(s,b) qE{r,t} Because of the multiplicity of initial states and because the & function is now set valued, it is possible for a nondeterministic finite automaton to be active in more than a single state at one time. Whereas in all deterministic finite automata there is a unique path through the machine labeled with components of w for each wEI*, this is not necessarily the case for nondeterministic finite automata. At any point in the processing of a string, the &function maps the input symbol and the current state to a set of states. This implies that multiple paths through the machine are possible or that the machine can get "stuck" and be unable to process the remainder of the string if there is no transition from a state labeled with the appropriate letter. There is no more than one path for each word if there is exactly one start state and the & function always maps to a singleton set (or 0). If we were to further require that the &function have a defined transition to another state for every input symbol, then the machine that we have would essentially be a deterministic finite automaton. Thus, all deterministic finite automata are simply a special class of nondeterministic finite automata; with some trivial changes in notation, any DFA can be thought of as an NDFA. Indeed, the state transition diagram of a DFA could be a picture of a well-behaved NDFA. Therefore, any language accepted by a DFA can be accepted by an NDFA. EXAMPLE 4.6 Consider the machine given in Example 4.4 and let x =b; the possible paths through the machine include (1) starting at s and proceeding to r, and (2) starting at s and proceeding to t. Note that it is not possible to start from t (since t ct=. So), and there is no way to proceed with x = b by starting at r, the other start state. Now let x = ba and consider the possibilities. The only path through the machine requires that we start at s, proceed to r, and return to s; starting at sand proceeding to t leaves no way to process the second letter of x. Starting from r is again hopeless (what types of strings are good candidates for starting at r?). 122 Nondeterministic Finite Automata , ' I Chap. 4 Now letx = bab; the possible paths through the machine include (1) starting at s, proceeding to r, returning to s, and then moving again to r, and (2) starting at s, proceeding to r, returning to s, and then proceeding to t. Note that starting at sand moving immediately to t again leaves us with no way to process the remainder of the string. Both band bab included paths that terminated at the final state t (among other places). These strings will be said to be recognized by this NDFA (compare with Definition 4.3). ba had no path that led to a final state, and as a consequence we will consider ba to be rejected by this machine. There are a number of ways in which to conceptualize a nondeterministic finite automaton. Among the most useful are the following two schemes: 1. At each state where a multiple transition occurs, the machine replicates into identical copies of itself, with each copy following one of the possible paths. 2. Multiple states of the machine are allowed to be active, and each of the active states reacts to each input letter. It happens that the second viewpoint is the most useful for our purposes. From a theoretical point of view, we use this as the basis for proving the equivalence of deterministic and nondeterministic finite automata. It is also a useful model upon which to base the circuits that implement NDFAs. The concept of a language for nondeterministic finite automata is different from that for deterministic machines. Recall that the requirement for a word to be contained in the language accepted by a deterministic finite automaton was that the processing of a string would terminate in a final state. This is also the condition for belonging to the language accepted by a nondeterministic finite automaton; however, since the path through a nondeterministic finite automaton is not necessarily unique, only one of the many possible paths need terminate in a final state for the string to be accepted. V Definition 4.3. Let A = <I, S, So, 8, F> be a nondeterministic finite automaton and w be a word in I*. A accepts w iff ( U 8(q, w)) n F 4= 0. Ll qESO Again conforming with our previous usage, a word that is not accepted is rejected. The use of the symbol L will be consistent with its usage in previous chapters, although it does have a different formal definition. As before, L(A) is used to designate all those strings that are accepted by a finite automaton A. Since the concept of acceptance must be modified for nondeterministic finite automata, the formal definition of L is necessarily different (contrast Definitions 4.3 and 1.12). V Definition 4.4. Given an NDFA A = <I, S, So, 8, F>, the language accepted byA,denotedL(A),is{xEI*I( U 8(q,x))nF4=0}. Ll qESO Definitions and Basic Theorems 123 Occasionally, it will be more convenient to express L (A) in the following fashion: L(A) = {x E ~* 13t E So' 8(t,x) n F =F 0}. The concept of equivalent automata is unchanged: two machines are equivalent iff they accept the same language. Thus, if one or both of the machines happen to be nondeterministic, the definition still applies. For example, the NDFAs given in Figures 4.3 and 4.4 are equivalent. The language recognized by a nondeterministic finite automaton is the set of all words where at least one of the paths through the machine labeled with components of.that word ends in a final state. In other words, the set of terminal states at the ends of the paths labeled by components of a word w must have a state in common with the set of final states in order for w to belong to L (A). As a first example, refer to the NDFA defined in Example 4.4. As illustrated in Example 4.6, this machine accepts strings that begin with alternating as and bs and end with one or more consecutive bs. EXAMPLE 4.7 For a more concrete example, consider the problem of a ship attempting to transmit data to shore at random intervals. The receiver must continually listen, usually to noise, and recognize when an actual transmission starts so that it can record the data that follow. Let us assume that the start of a transmission is signaled by the string 010010 (in practice, such a signal string should be much longer to minimize the possibility of random noise triggering the recording mechanism). In essence, we wish to build an NDFA that will monitor a bit stream and move to a final state when the substring 010010 is detected (note that nonfinal states correspond to having the recording mechanism off, and final states signify that the current data should be recorded). The reader is encouraged to discover firsthand how hard it is to build a DFA that correctly implements this machine and contrast that solution to the NDFA T given in Figure 4.6. Figure 4.6 An NDFA for pattern recognition Since the transitions leading to higher states are labeled by the symbols in 010010, it is clear that the last state cannot be reached unless the sequence 010010 is actually scanned at some point during the processing of the input string. Thus, the NDFA clearly accepts no word that should be rejected. Conversely, since all possible legal paths are explored by an NDFA, valid strings will find a way to the final state. It is sometimes helpful to think of the NDFA as remaining in So while the initial part of the input string is being processed and then "guessing" when it is the right time to move to Sl' lt is also possible to model an end-of-transmission signal that turns the record- ", 124 Nondeterministic Finite Automata Chap. 4 ing device off (see the exercises). The device would remain in various final states until a valid end-of-transmission string was scanned, at which point it would return to the (nonfinal) start state. While the NDFA given in Example 4.6 is very straightforward, it appears to be hard to simulate this nondeterminism in real time with a deterministic computer. It has not been difficult to keep track of the multiple paths in the simple machines seen so far. However, if each state has multiple transitions for a given symbol, the number of distinct paths a single word can take through an NDFA grows exponentially as the length of the word increases. For example, if each transition allowed a choice of three destination states, a word of length m would have 3m possible paths from one single start state. An improvement can be made by calculating, as each letter is processed, the set of possible destinations (rather than recording all the paths). Still, in an n-state NDFA, there are potentially 2n such combinations of states. This represents an improvement over the path set, since now the number of state combinations is independent of the length of the particular word being processed; it depends only on the number of states in the NDFA, which is fixed. We will see that keeping track of the set of possible destination states is indeed the best way to handle an NDFA in a deterministic manner. Since we have seen in Chapter 1 that it is easy to implement a DFA, we now explore methods to convert an NDFA to an equivalent DFA. Suppose that we are given a nondeterministic finite automaton A and that we want to construct a corresponding deterministic finite automaton N. Using the concepts in Definitions 4.1 through 4.4, we can proceed in the following fashion. Our general strategy will be to keep track of all the states that can be reached by some string in the nondeterministic finite automaton. Since we can arbitrarily label the states of an automaton, we let the state space of Ad be the power set of S. Thus, S" = peS), and each state in the new machine will be labeled by some subset of S. Furthermore, let the start state of N, denoted sg, be labeled by the member of peS) containing those states that are initial states in A; that is, sg= So. Since our general strategy is to "remember" all the states that can be reached for some string, we can define the afunction in the following natural manner: For every letter in ~, let the new state transition function, ad, map to the subset of peS) labeled by the union of all those states that are reached from some state contained in the current state name (according to the old state transition function a). According to Definition 4.4, for a word to be contained in the language accepted by some nondeterministic finite automaton, at least one of the terminal states was required to be contained in the set of final states. Thus, let the set of final states in the corresponding deterministic finite automaton be labeled by the subsets of S that contain at least one of the accepting states in the nondeterministic counterpart. The formal definition of our corresponding deterministic finite automaton is given in Definition 4.5. V Definition 4.5. Given an NDFA A = <~, S, So, a, F>, the corresponding deterministic finite automaton, N = <~, s-, sg,ad, r-», is defined as follows: Sec. 4.1 Definitions and Basic Theorems s-= peS) 125 Fd={Q ESdlQ nFf0} and 3dis the state transition function, 3d: Sd x ~-'> s-, defined by (VQ ES~(VaE~) 3d(Q,a) = U 3(q,a) qEQ 3dextends to the function "8d: sx ~* -'> Sd as suggested by Theorem 1.1: ('tJQ E Sd) "8d(Q, A) = Q (VQ E Sd)(Va E ~)(Vx E ~*) "8d(Q,xa) = 3d("8d(Q,x), a) Definition 4.5 describes a deterministic finite automaton that observes the same restrictions as all other deterministic finite automata (a single start state, a finite state set, a well-defined transition function, and so on). The only peculiarity is the labeling of the states. Note that the definition implies that the state labeled by the empty set is never a final state and that all transitions from this state lead back to itself. This is the dead state, which is reached by strings that are always prematurely terminated in the corresponding nondeterministic machine. EXAMPLE 4.8 Consider the NDFA B given in Figure 4.7. As specified by Definition 4.5, the corresponding DFA Bd would look like the machine shown in Figure 4.8. Note that all the states happen to be accessible in this particular example. a '.* Figure 4.7 The NDFA B discussed in Example 4.8 Figure 4.8 The deterministic equivalent of the NDFA given in Example 4.8 126 Nondeterministic Finite Automata Chap. 4 Since the construction of the corresponding deterministic machine involves p(S), it should be obvious to the reader that the size of this deterministic finite automaton can grow exponentially larger as the number of states in the associated nondeterministic finite automaton increases. In general, however, there are often many inaccessible states. Thus, only the states that are found to be reachable during the construction process need to be included. The reader is encouraged to exploit this fact when constructing corresponding deterministic finite automata. The language accepted by the DFA Adfollows the definition given in Chapter 1. To show that the deterministic finite automaton that we have just defined accepts the same language as the corresponding nondeterministic finite automaton, we must first show that the Sd function behaves in the same manner for strings as the I)d function does for single letters. For any state Q E s-, the I)d function maps this state and an input letter a E!" according to the mapping of the I) function for each q E Q and the letter a. The following lemma establishes that Sd performs the corresponding mapping for strings. V Lemma 4.1. Let A = <!', S, So, I),F> be a nondeterministic finite automaton, and let Ad= <!', S", sg,I)d, F"> represent the corresponding deterministic finite automaton. Then (VQ E Sd)(VX E !'*)(Sd(Q,X) = U 8(q,x)) qEQ Proof. By induction on [x]: Let P(k) be defined by P(k): (VQ E Sd)(VX E !,k)(Sd(Q, x) = U 8(q,x)) qEQ Basis step: Ixl =°=;> x = A and therefore Sd(Q, A) = Q = U {q} = U 8(q, A) q_Q qEQ Inductive step: Suppose that the result holds for all x :1 Ix I= k; that is, P(k) is true. Let y E!,k + 1. Then 3x E !,k and 3a E!, :1 y = xa. Then Sd(Q,y) = (by definition ofy) Sd(Q,xa) = (by Theorem 1.1) I)d(Sd(Q, x), a) = (by the induction hypothesis) I)d( U 8(q,x), a) = (VA, BE p(S))(Va E !,)(l)d(A U B, a) = I)d(A, a) U I)d(B, a)) qEQ U I)d(S(q, x), a) = (by Definition 4.5) qEQ U ( l) I)(p, a)) = (by Definition 4.2) qEQ pE8(q,x) U 8(q,xa) = (by definition ofy) qEQ U 8(q,y) qEQ Therefore, P(k) =;> P(k + 1) for all k ;::: 0, and thus by the principle of mathematical induction we can say that the result holds for all x E !,*. Ll Sec. 4.1 Definitions and Basic Theorems 127 Having established Lemma 4.1, proving that the language accepted by a nondeterministic finite automaton and the corresponding deterministic machine are the same language becomes 'a straightforward task. The equivalence of A and Ad is given in the following theorem. V Theorem 4.1. Let A = <!" S, So, 8, P> be a nondeterministic finite automaton, and let N = <!', s-, sg, 8d, pd> represent its corresponding deterministic finite automaton. Then Aand N are equivalent; that is, L(A) = L(N). Proof. Let x E !,*. Then x E L (A)~ (by Definition 4.4) ( U "8(s,x)) n p +0~ (by Definition 4.5) sESo (U "8(s,x))Epd~(byLemma4.1) sESo "8d(So,x) E pd~ (by Definition 4.5) "8d(sg, x) E pd~ (by Definition 1.15) x EL(N) Now that we have established that nondeterministic finite automata and deterministic finite automata are equal in computing power, the reader might wonder why we bother with nondeterministic finite automata. Even though nondeterministic finite automata cannot recognize any language that cannot be defined by a DFA, they are very useful both in theory and in machine construction (as illustrated by Example 4.7). The following examples further illustrate that NDFAs often yield more natural (and less complex) solutions to a given problem. EXAMPLE 4.9 Recall the machine from Chapter 1 that accepted a subset of real constants in scientific notation according to the following BNF: <sign>:: = +1- <digit>:: =0111213141516171819 <natural> :: = <digit> I<digit><natural> <integer> :: = <natural> I<sign><natural> <real constant> ::=<integer> <integer>. <integer>. <natural> I <integer>. <natural>E<integer> 128 Nondeterministic Finite Automata Chap. 4 By using nondeterministic finite automata, it is easy to construct a machine that will recognize this language (compare with the deterministic version given in Example 1.11). One such NDFA is shown in Figure 4.9. Figure 4.9 The NDFA discussed in Example 4.9 EXAMPLE 4.10 Let L = {x E {a, b}"Ix begins with a V x contains ba as a substring}. We can easily build a machine that will accept this language, as illustrated in Figure 4.10. Now suppose we wanted to construct a machine that would accept the reverse of this language, that is, to accept L' = {x E {a, b}"I x ends with a V x contains ab]. The machine that will accept this language can be built using nondeterministic finite automata by simply exchanging the initial states and the final states and by reversing the arrows of the B function. The automaton (definitely an NDFA in this case!) arising in this fashion is shown in Figure 4.11. Figure 4.10 An NDFA accepting the language given in Example 4.10 It can be shown that the technique employed in Example 4.10, when applied to any automaton, will yield a new NDFA that is guaranteed to accept the reverse of the original language. The material in Chapter 5 will reveal many instances where the ability to define multiple start states and multiple transitions will be of great value. Sec. 4.1 Definitions and Basic Theorems 129 Figure 4.11 An NDFA representing the reverse of the language represented in Figure 4.10 EXAMPLE 4.11 Assume we wish to identify all words that contain at least one of the three strings 10110, 1010, or 01101 as substrings. Consequently, we let L be the set of all words that are made up of some characters, followed by one of our three target strings, followed by some other characters. That is, L = {w E {O, 1}* Iw = xyz,x E {O, 1}* ,y E {10110, 1010, 01101}, z E {O, 1}*} We can construct a nondeterministic finite automaton that will accept this language as follows. First construct three machines each of which will accept one of the candidates for y. Next, prepend a single state (so in Figure 4.12) that loops on I*; make this state an initial state and draw arrows from it which mimic the transitions from each of the other three initial states (as shown in Figure 4.12). Finally, append a single state machine (SIS) that accepts I*; draw arrows from each of the final states to this state. The machine that accepts this language is given in Figure 4.12. The Figure 4.12 An NDFA for recognizing any of several substrings 130 Nondeterministic Finite Automata Chap. 4 reader is encouraged to try to construct a deterministic version of this machine in order to appreciate the simplicity of the above solution. EXAMPLE 4.12 Recall the application in Chapter 1 involving string searching (Example 1.15). The construction of DFAs involved much thought, but there is an NDFA that solves the problem in an obvious and straightforward manner. For example, an automaton that recognizes all strings over the alphabet {a, b} containing the substring aab might look like the NDFA in Figure 4.13. a a b Figure 4.13 An automaton recognizing the substring aab As is the case for this NDFA, it may be impossible for certain sets of states to all be active at once. These combinations can never be achieved during the normal operation of the NDFA. The DFA states corresponding to these combinations will not be in the connected part of N. Applying Definition 4.5 to find the entire deterministic version and then pruning it down to just the relevant portion is very inefficient. A better solution is to begin at the start state and "follow transitions" to new states until no further new states are uncovered. At this point, the relevant states and their transitions will have all been defined; the remainder of the machine can be safely ignored. For the NDFA in Figure 4.13, the connected portion of the equivalent DFA is shown in Figure 4.14. This automaton is still not reduced; the last b a ba b a Figure 4.14 The connected portion of the DFA equivalent to the NDFA given in Example 4.12 Sec. 4.2 Circuit Implementation of NDFAs 131 a b a b Figure 4.15 A reduced equivalent of the DFA given in Figure 4.14 three states are all equivalent and can be coalesced to form the minimal machine given in Figure 4.15. The above process can be easily automated; an interesting but frustrating exercise might involve producing an appropriate set of rules for generating, given a specific string y, a DFA that will recognize all strings containing the substring y. Definition 4.5 can be used to generate the appropriate DFA from the obvious NDFA without subjecting the designer to such frustrations! 4.2 CIRCUIT IMPLEMENTATION OF NDFAs As mentioned earlier, the presence of multiple paths within an NDFA for a single word characterizes the nondeterministic nature of these automata. The most profitable way to view the operation of an NDFA is to consider the automaton as having (potentially) several active states, with each of the active states reacting to the next letter to determine a new set of active states. In fact, by using one D flip-flop per state, this viewpoint can be directly translated into hardware. When a given state is active, the corresponding flip-flop will be on, and when it is inactive (that is, it cannot be reached by the substring that has been processed at this point), it will be off. As a new letter is processed, a state will be activated (that is, be placed in the new set of active states) if it can be reached from one of the previously active states. Thus, the state transition function will again determine the circuitry that feeds into each flip-flop. Following the same conventions given for DFAs, the input tape will be assumed to be bounded by special start-of-string <SOS> and end-of-string <EOS> symbols. The <EOS> character is again used to activate the accept circuitry so that acceptance is not indicated until all letters on the tape have been processed. As before, the <SOS> symbol can be employed at the beginning of the string to ensure that the circuitry beginsprocessing the string from the appropriate start state(s). Alternately, SR (set-reset) flip-flops can be used to initialize the configuration without relying on the <SOS> conventions. EXAMPLE 4.13 Consider the NDFA D given in Figure 4.16. With the <SOS> and <EOS> transitions illustrated, the complete model would appear as in Figure 4.17. Two bits of input data (31 and 32) are required to represent the symbols 132 Nondeterministic Finite Automata Chap. 4 <EOS>,<SOS>a,<EOS>,<SOS> Figure 4.16 The NDFA discussed in ExFigure 4.17 The expanded state transiample 4.13 tion diagram for the NDFA in Figure 4.16 <EOS>, a, b, and <SOS>. The standard encodings described in Chapter 1 would produce <EOS> = 00, a = 01, b = 10, and <SOS> = 11. If the flip-flop t1 is used to represent the activity of s., and tz is used to record the status of Sz, then the subsequent activity of the two flip-flops can be determined from the current state activity and the current letter being scanned, as shown in Table 4.1. TABLE 4.1 t1 h a1 a2 t; H accept 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0 1 1 0 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 1 1 1 0 1 0 0 0 1 0 0 1 0 0 1 1 1 0 1 0 1 0 0 0 0 1 0 1 1 1 1 0 1 1 0 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 0 0 1 1 1 1 1 1 0 The first four rows of Table 4.1 reflect the situation in which a string is hopelessly stuck, and no states are active. Processing subsequent symbols from I will not change this; both t{ and tzremain O. The one exception is when the <SOS> symbol is scanned; in this case, each of the start states is activated (t{ = 1 andtj = 1). This corrects the situation in which both flip-flops happen to initialize to 0 when power is first applied to the circuitry. Scanning the <SOS> symbol changes the state of the flip-flops to reflect the appropriate starting conditions (in this machine, both states are start states, and therefore both should be active as processing is begun). Note that each of the rows of Table 4.1 that correspond to scanning <SOS> show that t1 and tz are reset in the same fashion. Determining the circuit behavior for the symbols in I closely parallels the definition of ad in Definition 4.5. For example, when state Sl is active but Sz is Sec. 4.2 Circuit Implementation of NDFAs 133 " * inactive (t1 = 1 and t2 = 0) and a is scanned (a, = 0 and a2= 1), transitions from s\ cause both states to next be active (t; =1 and t2=1). The other combinations are calculated similarly. Minimized expressions for the new values of each of the flipflops and the accept circuitry are ~=~Ã~V~ÃV~ÃV~Ã t2= (t2Ãa\Ãa2)V(tlAa2)V(alAa2) accept = (t2ÃalÃa2) Since similar terms appear in these expressions, these three subcircuits can "share" the common components, as shown in Figure 4.18. ..-+---+-.... "'''-+---t ..,.. aeee t Figure 4.18 Circuitry for the automaton discussed in Example 4.13 Note that the accept circuitry reflects that a string should be recognized when some final state is active (S2 in this example) and <EOS> is scanned. In more complex machines with several final states, lines leading from each of the flip-flops corresponding to final states would be joined by an OR gate before being ANDed with the <EOS> condition. An interesting exercise involves converting the NDFA 0 given in Example 4.1 to the equivalent DFA Dd, which will have four states: 0, {St}, {S2}, and [s., S2}. The 134 Nondeterministic Finite Automata Chap. 4 deterministic automaton Dd can be realized by a circuit diagram as specified in Chapter 1. This four-state DFA willrequire 2 bits.of state data. If the state encoding conventions 0= 00, {s.}= 10, {S2} = 01, and {s., 52} = 11 ate used, the circuitry for the DFA o-will be identical to that for the NDFA D. For DFAs, m bits of state data (th h, ... , tm) can encode up to 2m distinct states. With NDFAs, an n-state machine required a full n bits Ofstate data (1 bit per state). This apparently "extravagant" use of state data is offset by the fact that an n-state NDFA may require 2n states to form an equivalent DFA. This was the case in the preceding example, in which nand m were equal to 2; the two-state NDFA D required two flip-flops, and the equivalent four-state DFA also required two flipflops; the savings induced by the DFA state encoding was exactly; offset by the multiplicity of states needed by the NDFA. A DFA may tum out to need less hardware than an equivalent NDFA, as illustrated by Example 4.12. The four-state NDFA C needs four flip-flops, and the (nonminimal, 16-state) DFA cwould also need four. However, the minimal equivalent DFA derived in Example 4.12 has only four states and therefore can be encoded with just 2 bits of state data. Hence only two flip-flops are necessary to implement a recognizer for L(G). . 4.3 NDFAs WITH LAMBDA-TRANSITIONS We now extend our computational model to include the nondeterministic finite automata that allow transitions between states to occur "spontaneously," without any input being processed. Transitions that occur without an input symbol being processed are called 'A-transitions or lambda-moves. Intexts that denote the empty string by the symbol E, such transitions are usually referred to as epsilon-moves. '. V Definition 4.6. A nondeterministic finite automaton with 'A-transitions is a quintuple AI.. = <~, S, So, SA' F>, where i. I is an alphabet. il, Sis afinite nonempty set of states. iii. So is a set of initial states, a nonempty subset of S. iv. SA: (S x (I U {'A}))~ peS) is the state transition function. v, Fis the set of accepting states, a (possibly empty) subset of S. 6. A nondeterministic finite automaton with 'A-transitions is very similar in structure to anNDFA that does not have 'A-transitions. The only different aspect is the definition of the 5 function. Instead of mapping state/letter pairs [from S x I) to peS), it maps pairs consisting of a state and either a letter or the empty string [from S x (I U {'An to PtS)). From any state that has a 'A-transition, we adopt the conSec. 4.3 NDFAs with Lambda-Transitions 135 " * vention that the machine is capable of making a spontaneous transition to the new state specified by that A-transition without processing an input symbol. However, the machine may also "choose" not to follow this path and instead remain in the original state. Before we can extend the 8~ function to operate on strings from I*, we need the very useful concept of lambda-closure. V Definition 4.7. Given a nondeterministic finite automaton Ã = <I, S, So, 8~,F> with A-transitions, the A-closure of a state t E S, denoted A(t), is the set of all states that are reachable from t without processing any input symbols. The A-closure of a set ofstates Tis then A(T) = U A(t). Ii. lET The A-closure of a state is the set of all the states that can be reached from that state, including itself, by following A-transitions only. Obviously, one can always reach the state currently occupied without having to move. Consequently, even if there are no explicit arcs labeled by A going back to state t, t is always in the A-closure of itself. EXAMPLE 4.14 Consider the machine given in Figure 4.19, which contains A-transitions from So to SI and from SI to S2. By Definition 4.7, A(so) = {So, s., S2} A(SI) = [s., S2} A(S2) = {S2} A(S3) = {S3} d Figure 4.19 An NDFA with lambdamoves 136 Nondeterministic Finite Automata Chap. 4 ,.. V Definition 4.8. Given a nondeterministic finite automaton with A-transitions, the extended state transition function for AI.. is a function 8x: S x I*~ p(S) defined as follows: i, (Vs E S) 8A(s, A) = A(s) ii. (Vs E S)(Va E I) 8A(s, a) = A( U 8A(q, a)) qEA(s) iii. (Vs E S)(Vx E I*)(Va E I) 8A(s,xa) = A( U 8A(q,a)) Ii. qE8,(s,x) The 81.. function is not extended in the same way as for the nondeterministic finite automata given in Definition 4.2. Most importantly, due to the effects of the A-closure, 8A(s, a) =1= 8A(s, a). Thus, not only does the 81.. function map to a set of states based on asingle letter, but it also includes the A-closure of those states. This may seem strange for single letters (strings of length 1), but it is required for consistency when the 81.. function is presented with strings of length greater than 1, since at each state along the path there can be A-transitions. Each A-transition maps to a new state (which may have A-transitions of its own) that must be included in this path and processed by the 81.. function. The nondeterministic finite automaton without A-transitions that corresponds to a nondeterministic finite automaton with A-transitions is given in Definition 4.9. V Definition 4.9. Given a nondeterministic finite automaton with A-transitions, AI.. = <I, S, So, 81.., F>, the corresponding nondeterministic finite automaton without A-transitions, A).= <I,S U{qo},SoU{qo},8LF">, is defined as follows: { F iff A$. L (AI..) F" = F U {qo} iff AEL(AA) (Va E I)8X(qo, a) = 0 (Vs E S)(Va E I) 8x(s, a) =8A(s, a) = A( U 8A(q, a)) qEA(s) and which is extended in the "usual" way for nondeterministic finite automata to the function 8x: (S U {qo}) x I*~ p(S U {qo}). Ii. Note that from a state in AI.. several A-transitions may be taken, then a single letter a may be processed, and then several more A-movesmay occur; all this activity can result from just a single symbol on the input tape being processed. The definition of 8xreflects these types of transitions. The 8xfunction is defined to be the same as the 81.. function for all single letters (strings of length 1), which adjusts for the A-closure of AI..' The 8xfunction can then be extended in the usual nondeterministic manner. Sec. 4.3 NDFAs with Lambda-Transitions 137 To account for the case that A might be in the language accepted by the automaton AI., we add an extra start state qo to the corresponding machine A'{., which is disconnected from the rest of the machine. If AE L (AI.), we also make qo a final state. EXAMPLE 4.15 Let AI. represent the NDFA given in Example 4.14. A'{. would then be given by the NDFA shown in Figure 4.20. This new NDFA does indeed accept the same language as AI.' To show in general that L(AI.) =L(Ao, we must first show that the respective extended state transition functions behave in similar fashions. However, these two functions can be equivalent only for strings of nonzero length (because of the effects of the A-closure in the definition of So. This result is established in Lemma 4.2. d a,c,d Figure 4.20 An NDFA without lambdamoves that is equivalent to the automaton in Figure 4.19 V Lemma 4.2. Given a nondeterministic finite automaton AI. with A-transitions and the corresponding nondeterministic finite automaton A'{. without A-transitions, then (Vs E S)(Vx E I+)(8'{.(s,x) =81.(s,x» Proof. The proof is by induction on [x]; see the exercises. Once we have shown that the extended state transition functions behave (almost) identically, we can proceed to show that the languages accepted by these two machines are the same. V Theorem 4.2. Given a nondeterministic finite automaton that contains A-transitions, there exists an equivalent nondeterministic finite automaton that does not have A-transitions. Proof. Assume AI. = <I, S, So, SA' F> is an NDFA with A-transitions. Construct the corresponding NDFA A'{. = <I, S U {qo}, So U {qo}, S'{., Fa>, which has no 138 Nondeterministic Finite Automata Chap. 4 A-transitions. We will show L (AI.) = L (An, and thereby prove that the two machines . are equivalent. Because the way Ã was constructed limits the scope of Lemma 4.2, the proof is divided into two cases. Case 1: If x = A, then by Definition 4.9 (qoE FlY iff AEL(A,,)) and so (A EL(An*~ AEL(AA)) Case 2: Assume x =/= A. Since there are no transitions leaving qo, it may be disregarded as one of the start states of Ã. Then x E L (An =? (by definition of L) ( U 8~(so,x)) n FlY =/= 0=? (by Lemma 4.2) soESo ( U 8A(so, x)) n FlY =/= 0=? (since if qo were the common element, then x soESo would have to be A, which violates the assumption) ( U 8A(so, x)) n F =/= 0=?(by definition ofL) soESo x EL(AA) Conversely, and for many of the same reasons, we have x E L (AI.) =? (by definition of L) ( U 8A(so, x)) n F =/= 0=? (by Lemma 4.2) soESo ( U 8~(so, x)) n F =/= 0=? (since F ~ FlY) soESo ( U 8~(so, x)) n FlY=/= 0=? (by definition of L) soESo* x EL(M) Consequently, (\Ix E I*)(x EL(An ~ x EL(AA))' A Although nondeterministic finite automata with A-transitions are no more powerful than nondeterministic finite automata without A-transitions and consequently recognize the same class of languages as deterministic finite automata, they have their place in theory and machine construction. Because such machines can be constructed very easily from regular expressions (see Chapter 6), NDFAs are used by the UNIXTM text editor and by lexical analyzer generators such as LEX for pattern-matching applications. Example 4.16 involves the regular expression (a U c)*be(a U c)", which describes the set of words composed of any number of as and es, followed by a single b, followed by a single c, followed by any number of as and es. Sec. 4.3 NDFAs with Lambda-Transitions 139 EXAMPLE 4.16 Suppose that we wanted to construct a machine that will accept the language L ={x E {a, b, e}" Ix contains exactly one b, which is immediately followed by c}. A machine that accepts this language is given in Figure 4.21. b c a, Figure 4.%1 The NDFA described in Example 4.16 Suppose VIe now wish to build a machine that will accept any positive number of occurrences of various strings from this language concatenated together. In this case, the resulting language would include all strings (with at least one b) with the property that each and every b is immediately followed by c. By simply adding a A-transition from every final state to the start state, we achieve our objective. The machine that accepts this new language is shown in Figure 4.22. Figure 4.22 The modification of the NDFA in Figure 4.21 The previous section outlined how to implement nondeterministic finite automata without A-transitions; accommodating A-movesis in fact quite straightforward. A A-transition from state s to state t indicates that state t should be considered active whenever state s is active. This can be assured by an obvious modification, as shown by the following example. EXAMPLE 4.17 As an illustration of how circuitry can °bedefined for machines with A-transitions, consider theDFA E given in figure 4.23. This machineis similar to theNDFA 0 in Example 4.13, but a A-transition has been added from StOto S2; that is, 8(S1. A)= {S2}' This transition implies that S2 should be considered active °whenever S1 is active. Consequently, the circuit diagram produced in Example 4.13 need only be slightly modified by establishing the. extra connection indicated by the dotted line shown in Figure 4.24. . , In general, the need for such "extra" connections leaving a given flip-flop input t, is determined by examining 8(s;, A), the set of A-transitions for S;. Note that the propagation delay in this circuit has been increased; there are signals that must 140 Nondeterministic Finite Automata Chap. 4 l\ --11----4'"'-'-4 -la,-i---. ....... .. Fi&ure 4.23 A simple NDFA with lambda moves In t,t---.... t"2 acee t Figure 4.24 Circuitry for the automaton discussedin Example 4.17 now propagate through an extra gate during a single clock cycle. The delay will be exacerbated in automata that contain sequences of A-transitions. In such cases, the length of the clock cycle may need to be increased to ensure proper operation. This problem can be minimized by adding all the connections indicated by A(s,), rather than just adding those implied by 8(sj, A). EXERCISES 4.1. Draw the deterministic versions of each of the nondeterministic finite automata shown in Figure 4.25. In each part, assume I = {a, b, e]. 4.2. Consider the automaton given in Example 4.17. (a) Convert this automatoninto an NDFA without ?.-transitionsusing Definition 4.9. (b) Convert this NDFA into a DFA using Definition 4.5. Chap. 4 Exercises 141 (a) (e) ~@- (d) (g) Figure 4.25 Automata for Exercise 4.1 4.3. Consider the automaton given in Example 4.4. (a) Using the standard encodings, draw a circuit diagram for this NDFA (include neither <SOS> nor <EOS». (b) Using the standard encodings, draw a circuit diagram for this NDFA (include both <SOS> and <EOS». (c) Convert the NDFA into a DFA using Definition 4.5 (draw only the connected portion of the machine). 142 Nondeterministic Finite Automata Chap. 4 (d) Convert the NDFA into a DFA using Definition 4.5 (draw the entire machine, including the disconnected portion). 4.4. Consider the automaton given in Example 4.2. (a) Using the standard encodings, draw a circuit diagram for this NDFA (include both <SOS> and <EOS». (b) Convert the NDFA into a DFA using Definition 4.5. 4.5. Consider the automaton given in Example 4.3. (a) Using the standard encodings, draw a circuit diagram for this NDFA (include neither <SOS> nor <EOS». (b) Using the standard encodings, draw a circuit diagram for this NDFA (include both <SOS> and <EOS>). (c) Convert the NDFA into a DFA using Definition 4.5 (draw only the connected portion of the machine). (d) Is this DFA isomorphic to any of the automata constructed in Exercise 4.4? 4.6. Consider the automaton given in Example 4.14. (a) Using the standard encodings, draw a circuit diagram for this NDFA (include neither <SOS> nor <EOS>). (b) Using the standard encodings, draw a circuit diagram for the NDFA in part (b) (include neither <SOS> nor <EOS». 4.7. Consider the automaton given in the second part of Example 4.16. (a) Using the standard encodings, draw a circuit diagram for this NDFA (include <EOS> but not <SOS». (b) Build the equivalent automaton without A-transitions using Definition 4.9. (c) Using the standard encodings, draw a circuit diagram for the NDFA in part (b) (include <EOS> but not <SOS». (d) Convert the NDFA into a DFA using Definition 4.5 (draw only the connected portion of the machine). 4.8. It is possible to build a deterministic finite automaton Asuch that the language accepted by this machine is the absolute complement of the language accepted by a machine A [that is, L(A) = ~* L(A)] by simply complementing the set of final states (see Theorem 5.1). Can a similar thing be done for nondeterministic finite automata? If not, why not? Give an example to support your statements. 4.9. Given a nondeterministic finite automaton A without A-transitions, show that it is possible to construct a nondeterministic finite automaton with A-transitions A' with the properties (1) A' has exactly one start state and exactly one final state and (2) L (A') = L (A). 4.10. Consider (ii) in Definition 4.8. Can this fact be deduced from parts (i) and (iii)? Justify your answer. 4.11. If we wanted another way to construct a nondeterministic finite automaton without A-transitions corresponding to one that does have them, we could try the following: Let S' = S, Sb= A(So), F' = F, and 8'(s, a) = 8,(s, a) for all a E~, s E S. Show that this works (or if it does not work, explain why not and give an example). 4.12. Using nondeterministic machines with A-transitions, give an algorithm for constructing a A-NDFA having one start state and one final state that will accept the union of two FAD languages. Chap. 4 Exercises 143 4.13. Give an example of an NDFA A for which: (a) Adis not connected. (b) Adis not reduced. (c) Adis minimal. 4.14. Why was it necessary to include an "extra" state qo in the construction of Ã in Definition 4.9? Support your answer with an example. 4.15. (a) Using nondeterministic machines without A-transitions, give an algorithm for constructing a machine that will accept the union of two languages. (b) Is this easier or more difficult than using machines with A-transitions? (c) Is it possible to ensure that this machine both (i) has exactly one start state and (ii) has exactly one final state? 4.16. Consider the automaton Ã given in Example 4.15. (a) Using the standard encodings, draw a circuit diagram for Ã (include neither <SOS> nor <EOS>). (b) Convert Ãinto Ãd using Definition 4.5 (draw only the connected portion of the machine). 4.17. (a) Prove that for any NDFA without A-transitions the definitions of 8 and 8 agree for single letters; that is, (VsE S)(Va E I)(8(s, a) = 8(s, a)). (b) Give an example to show that this need not be true for an NDFA with A-transitions. 4.18. Consider the NDFA that accepts the original language L in Example 4.10. (a) Using the standard encodings, draw a circuit diagram for this NDFA (include neither <SOS> nor <EOS». (b) Convert the NDFA into a DFA using Definition 4.5 (draw only the connected portion of the machine). 4.19. Consider the NDFA which accepts the modified language L' in Example 4.10. (a) Using the standard encodings, draw a circuit diagram for this NDFA (include neither <SOS> nor <EOS>). (b) Convert the NDFA into a DFA using Definition 4.5 (draw only the connected portion of the machine). 4.20. Consider the arguments leading up to the pumping lemma in Chapter 2. Are they still valid when applied to NDFAs? 4.21. Consider Theorem 2.7. Does the conclusion still hold if applied to an NDFA? 4.22. Given a nondeterministic finite automaton A (without A-transitions) for which At/=. L (A), show that it is possible to construct a nondeterministic finite automaton (also without A-transitions) AI/with the properties: 1. A" has exactly one start state. 2. A" has exactly one final state. 3. L(A") =L(A). 4.23. Give an example to show that if AE L(A) it may not be possible to construct an NDFA without A-transitions satisfying all three properties listed in Exercise 4.22. 4.24. Prove Lemma 4.2. 4.25. Given a DFA A, show that it can be thought of as an NDFA An and that, furthermore, L(An) = L(A). Hint: Carefully define your "new" machine An, justify that it is indeed an NDFA, make the appropriate inductive statement, and argue that L(An) = L(A). 144 Nondeterministic Finite Automata Chap. 4 4.26. Give an example to show that the domain of Lemma 4.2 cannot be expanded to include >..; that is, show that 8~(s, >..) 18~(s, x). 4.27. Refer to Definition 4.5 and prove the fact used in Lemma 4.1: (VA E p(S))(VB E p(S))(Va E ~)W(A U B, a) = Sd(A, a) U Sd(B,a)) 4.28. Recall that if a word can reach several states in an NDFA, some of which are final and some nonfinal, Definition 4.4 requires us to accept that word. (a) .Change the definition of L (A) so that a word is accepted only if every state the word can reach is final. (b) Change the definition of Ad to produce a deterministic machine that accepts only those words specified in part (a). 4.29. Draw the connected part of r-, the deterministic equivalent of the NDFA T in Example 4.7. 4.30. Refer to Example 4.7 and modify the NDFA T so that the machine reverts to a nonfinal state (that is, turns the recorder off) when the substring 000111 is detected. Note that 000111 functions as the EOT (end of transmission) signal. 4.31. Consider the automaton A given in Example 4.14. (a) Draw a diagram of Ã. (c) Draw Ãd (draw only the connected portion of the machine). 4.32. What is wrong with the following "proof' of Lemma 4.2? Let P(k) be defined by P(k): (Vs E S)(Vx E ~k)(8~(s, x) = 8~(s,x)). Basis step (k = 1): (Vs E S)(Va E ~)(8~(s, a) = S~(s, a) = 8~(s, a)). Inductive step: Suppose that the result holds for all x E ~k and let y E ~k + 1. Then (3x E ~k)(3a E ~ ;)Y = xa). Then 8~(s,y) = 8~(s,xa) = S~(8~(s, x), a) =8~(8~(s,x),a) = 8~(8~(s, x), a) = 8~(s,xa) = 8~(s,y) Therefore, P(k):::} P(k + 1) for all k 2: 1, and by the principle of mathematical induction, we are assured that the equation holds for all x E ~". 4.33. Consider the automaton given in Example 4.7. (a) Using the standard encodings, draw a circuit diagram for this NDFA (include neither <SOS> nor <EOS». (b) Convert the NDFA into a DFA using Definition 4.5 (draw only the connected portion of the machine). 4.34. Consider the automaton given in Example 4.11. (a) Using the standard encodings, draw a circuit diagram for this NDFA (include neither <SOS> nor <EOS». (b) Convert the NDFA into a DFA using Definition 4.5 (draw only the connected portion of the machine). 4.35. Consider the automaton 8 given in Example 4.8. Chap. 4 Exercises 145 (a) Using the standard encodings, draw a circuit diagram for B (include neither <SOS> nor <EOS». (b) Using the standard encodings, draw a circuit diagram for Bd (include neither <SOS> nor <EOS». Encode the states in such a way that your circuit is similar to the one found in part (a). 4.36. Draw a circuit diagram for each NDFA given in Exercise 4.1 (include neither <SOS> nor <EOS». Use the standard encodings. 4.37. Draw a circuit diagram for each NDFA given in Exercise 4.1 (include both <SOS> and <EOS». Use the standard encodings. 4.38. Definition 3.10 and the associated algorithms were used in Chapter 3 for finding the connected portion of a DFA. (a) Adapt Definition 3.10 so that it applies to NDFAs. (b) Prove that there is an algorithm for finding the connected portion of an NDFA. CHAPTER CLOSURE PROPERTIES In this chapter we will look at ways to combine languages that are recognized by finite automata (that is, FAD languages) and consider whether the combinations result in other FAD languages. These results will provide insights into the construction of finite automata and will provide useful information that will have bearing on the topics covered in later chapters. After the properties of the collection of FAD languages have been fully explored, other classes of languages will be investigated. We begin with a review of the concept of closure. 5.1 FAD LANGUAGES AND BASIC CLOSURE THEOREMS Notice that when many everyday operators combine objects of a given type they produce an object of the same type. In arithmetic, for example, the multiplication of any two whole numbers produces another whole number. Recall that this property is described by saying that the set of whole numbers is closed under the operation of multiplication. In contrast, the quotient of two whole numbers is likely to produce a fraction: the whole numbers are not closed under division. The formal definition of closure, both for operators that combine two other objects (binary operators) and those that modify only one object (unary operators) is given below. yo Definition 5.1. The set K is closed under the (binary) operator ® iff (Vx,y E K)(x®y E K). .:l 146 Sec. 5.1 FAD Languages and Basic Closure Theorems 147 V Definition 5.2. The setK is closed under the (unary) operator T) iff ('fix E K)(T)(x) E K). A EXAMPLE 5.1 I\J is closed under + since, if x and yare nonnegative integers, then x + y is another nonnegative integer; that is, if x,y E I\J, then x + y E I\J. EXAMPLE 5.2 ois closed under I I(absolute value), since if xis an integer Ix Iis also an integer. EXAMPLE 5.3 Let p = {X IX is a finite subset of I\J}; then p is closed under U, since the union of two finite sets is still finite. (If Y and Z are subsets for which II YII = n < 00 and IIZII = m <00, then II Y UZII:5 n + m < 00. Under what conditions would IIYUZII<n +m?) To show a set K is not closed under a binary operator e, we must show -,[('fIx,y E K)(xey E K)], which means 3x,y E K ., xey f1. K. EXAMPLE 5.4 I\J is not closed under - (subtraction) since 3 5 = -2 $.I\J, even though both 3 E I\J and 5 E I\J. Notice that the set as well as the operator is important when discussing closure properties; unlike I\J, the set of all integers 0is closed under subtraction. As with the binary operator in Example 5.4, a single counterexample is sufficient to show that a given set is not closed under a unary operator. EXAMPLE 5.5 I\J is not closed under V (square root) since 7 E I\J but V7$.I\J. We will not be concerned so much with sets of numbers as with sets of languages. As in Example 5.3, the collection will be a set of sets. Of prime concern are those languages that are related to automata. V Definition 5.3. Let I be an alphabet. The symbol ~I is used to denote the set of all FAD languages over I; that is, ~I ={L k I* 13 deterministic finite automaton M ., L(M) = L} 148 Closure Properties Chap. 5 ~~ is the set of all languages that can be recognized by finite automata. In this chapter, it is this set whose closure properties with respect to various operations in I* we are most interested in investigating. For example, if there exists a machine that accepts a language K, then there is also a machine that accepts the complement of K. That is, if K is FAD, then -K is FAD: ~~ is closed under r«, V Theorem 5.1. For any alphabet I, ~~ is closed under - (complementation). Proof. Let K E ~~. We must show -K E ~~ also; that is, there is a machine that recognizes K. But K E ~~, and thus there is a deterministic finite automaton that recognizes K: Let A = <I, S, so, 8, F> and L(A) = K. Define a new machine Aas follows: A- = <I,S-,so,8-,F-> = <I,S,so,8,S-F>, which looks just like A except that the final and nonfinal states have been interchanged. We claim that L(A-) = -K. To show this, letx be an arbitrary element of E". Then x E L (A-) ~ (by definition of L) 8-(so ,x) E F-~ (by induction and the fact that 8 =8-) 8(so, x) E F-~ (by definition of so) 8(so,x) E F-~ (by definition of F-) 8(so,x ) E S-F~ (by definition of complement) 8(so,x) $.F~ (by definition of L) x f/:. L (A)~ (by definition of K) x f/:. K~ (by definition of complement) xE-K Thus L(A-) = -K as claimed, and therefore the complement of a FAD language can also be recognized by a machine and is consequently also FAD. Thus ~~ is closed under complementation. ~ It turns out that ~~ is closed under all the common set operators. Notice that the definition of ~~ implies that we are working with only one alphabet; if we combine two machines in some way, it is understood that both automata use exactly the same input alphabet. This turns out to be not much of a restriction, however, for if we wish to consider two machines that use different alphabets It and I z, we can simply modify each machine so that it is able to process the new common alphabet I =It U I z• It should be clear that this can be done in such a way as not to affect the language accepted by either machine (see the exercises). We will now prove that the union of two FAD languages is also FAD. This can be shown by demonstrating that, given two automata Mt and Mz, it is possible to Sec. 5.1 FAD Languages and Basic Closure Theorems 149 construct another automaton that recognizes the union of the languages accepted by M1 and M2• EXAMPLE 5.6 Consider the two machines M1 and M2 displayed in Figure 5.1. These two machines can easily be employed to construct a nondeterministic finite automaton that clearly accepts the appropriate union. We simply need to combine them into a single machine, which in this case will have two start states, as shown in Figure 5.2. The structure inside the dotted box should be viewed as a single NDFA with two start states. Any string that would be accepted by M1 will reach a final state if it starts in the "upper half" of the new machine, while strings that are recognized by M2 will be accepted by the "lower half' of the machine. Recall that the definition of acceptance by a nondeterministic finite automaton implies that the NDFA in Figure 5.2 will accept a string if any path leads to a final state. This new NDFA will therefore accept all the strings that M1 accepted and all the strings that M2 accepted. Furthermore, these are the only strings that will be accepted. This trick is the basis of the following proof, which demonstrates the convenience of using the NDFA concept; a proof involving only DFAs would be both longer and less obvious (see the exercises). JMIll Figure 5.1 The two automata discussed in Example 5.6 r: I I I I I .. ~ I I I I I ~ Figure 5.2 The resulting automaton in Example 5.6 150 Closure Properties Chap. 5 V Theorem 5.2. For any alphabet S, <21J1 is closed under U. Proof. Let L1 and L, belong to <21J1 . Then there are nondeterministic finite automata Al = <~, S1> SOl' 81> 1\> and A2= <~, S2'SOz' 82,F2> such that L(A1) = L1 andL(A2 ) = L2(why?). Define AU = <~,Su,S~,8u,Fu>, where SU = Sl U S2 (without loss of generality, we can assume Sl n S2= 0) Sou =: So, U SOz F U = 1\ U F2 and 8u: (Sl U S2) x ~~ P(SI U S2) is defined by 1 81(S, a) if s E s. 8U(s,a) = , Vs E S, U S2, 'Va E ~ 82(s, a) if s E S2 We claim that L(AU)=L(AI)UL(A2)=LIU~,This must be proved using the definition of L from Chapter 4, since AI, A2, and AU are all NDFAs. x EL(AU) ¢:> (from Definition 4.4) (3s E SoU)[8U(s,x) n F U '" 0]¢:> (by definition of SOU) (3s E SOl U Soz)[8U(s,x) n F U '" 0]¢:> (by definition of U) (3s E So)[8U(s,x) n F U '" 0]V (3s E SoJ[8U(s,x) n F U '" 0]¢:> (by definition of 8u and induction) (3s E So)[81(s, x) n F U '" 0]V (3s E SoJ[82(s,x) n F U '" 0]¢:> (by definition of F U ) (3s E So)[81(s, x) n 1\ '" 0]V (3s E SoJ[82(s,x) n F2 '" 0]¢:> (from Definition 4.4) x EL(A1) V x EL(A2) ¢:> (by definition of U) x E (L(A1 ) UL(A2))¢:>(by definition of L[,~) The above "proof" is actually incomplete; the transition from line 4 to line 5 actually depends on the assumed properties of 8u, and not the known properties of 8u. A rigorous justification should include an inductive proof of (or at least a reference to) the fact that 8u reflects the same sort of behavior that 8u does; that is, _ 181(S,X) if s E Sl 8U(s,x) = Vs E S, U S2, Vx E ~* 82(s,x) ifsES2 The above rule essentially states that the definition that applies to the single letter a also applies to the string x, and it is easy to prove by induction on the length of x (see the exercises). Sec. 5.1 FAD Languages and Basic Closure Theorems 151 The following theorem, which states that 2LlI is closed under n, will be justified in two separate ways. The first proof will argue that the closure property must hold due to previous results; no new DFA need be constructed. The drawback to this type of proof is that we have no suitable guide for actually combining two existing DFAs into a new machine that will recognize the appropriate intersection (although, as outlined in the exercises, in this case a construction based on the first proof is fairly easy to generate). Some operators are so bizarre that a nonconstructive proof of closure is the best we can hope for; intersection is definitely not that strange, however. In a second proof of the closure of 2LlI under n, Lemma 5.1 will explicitly outline how an intersection machine could be built. When such constructions can be demonstrated, we will say that 2LlI is effectively closed under the operator in question (see Theorem 5.12 for a discussion of an operator that is not effectively closed). Til Theorem 5.3. For any alphabet ~, 2LlI is closed under n. Proof. Let LI and Lz belong to 2LlI . Then by Theorem 5.1, -LI and -Lz are also FAD. By Theorem 5.2, -LI U -Lz is also FAD. By Theorem 5.1 again, -(-L U -Lz) is also FAD. By De Morgan's law, this last expression is equivalent to LI n Lz, so LIn Lz is FAD, and thus LIn Lz E 2lJI . ~ Note that the above argument could be made to apply to any collection C of sets that were known to be closed under union and complementation. A second proof of Theorem 5.3 might rely on the following lemma, using the "direct" method of constructing a deterministic machine that accepts LI n Lz. This would show that 2LlI is effectively closed under the intersection operator. Til Lemma 5.1. Given deterministic finite automata Al = <~, SJ, SOl' ~h, F;> and A2= <~, S2' S02' S2, Fi> such that L (AI) = LI and L (A2) = Lz, define a new DFA An = <~, s», s~, Sn,Fn>,where S" = SI X S2 s~ == (SOl' S02) F" = F; x Fi, and Sn: (SI x S2) x ~~ SI X S2 is defined by Sn(s, t), a) = (SI(S, a), S2(t, a) VsE SJ, Vt E S2,Va E ~ Then L(An) = LIn Lz. Proof. As usual, the key is to show that x E L(An)~x ELI n Lz. The proof hinges on the inductive statement that Sn obeys the same rule that defines Sn; that is, (VsE SI)(Vt E S2)(VX E ~*)(Sn(s, t),x) = (SI(S,X), S2(t,X)). The details are left for the reader (see the exercises). ~ 152 Closure Properties Chap. 5 The idea behind the above construction is to build a machine that "remembers" the state changes that both Al and Az make as they each process the same string, and hence the state set consists of all possible pairs of states from Al and Az . The goal was to design the transition function an so that being in state (s, t) in An indicates that Al would currently be in state sand Azwould be in state t. This goal also motivates the definition of the new start state; we want to begin in the start states of Al and Az, and hence s~ = (so\, so). We only wish to accept strings that are common to both languages, which means that the terminating state in Al belongs to Fi and the last state reached in Azis likewise a final state. This requirement naturally leads to the definition of F", where (s, t) is a final state if and only if both sand t were final states in their respective machines. EXAMPLE 5.7 Consider the two machines Al and Azdisplayed in Figure 5.3. Note that Az"remembers" whether there have been an even or an odd number of bs, while Al "counts" the number of letters (mod 3). We now demonstrate how the definition in Lemma 5.1 can be applied to form a deterministic machine that accepts the intersection of L(A I ) and L(Az). The structure of An would in this .case look like the automaton shown in Figure 5.4. Note that An does indeed keep track of the criteria that both Al and Az use to accept or reject strings. We will be in a state on the right side of An if (a) A" (b) Figure 5.3 The automata discussed in Example 5.7 Sec. 5.1 FAD Languages and Basic Closure Theorems 153 Figure 5.4 The resulting DFA for ExampleS.? an odd number of bs have been seen and on the left side when an even number of bs have been processed. At the same time, we will be in the upper, middle, or lower row of states depending on the total number of letters (mod 3) that have been processed. There is but one final state, corresponding to the situation where we have both an odd number of bs and the letter count is 0 (mod 3). The operations used in the previous three theorems are common to set theory. We now present some new operators that are special to string algebra. We have defined concatenation (.) for individual strings, but there is a natural extension of the definition to languages, as indicated by the next definition. V Definition 5.4. Let L1 and L, be languages. The concatenation of L1 with Lz, written L1 •Lz, is defined by L 1 •L, = {x .yIx E L 1 r; Y E Lz} EXAMPLES.S If L1 = {A, b, ee}and L, = {A, aa, baa}, then L1 •L, = {A, b, ee, aa, baa, eeaa, bbaa, eebaa}. 154 Closure Properties Chap. 5 Note that baa qualifies to be in Lj-L, for two reasons: baa = A*baa and baa = b*aa. Thus we see that the concatenation contains only eight words rather than the expected 9 ( = 3*3). In general, Lj*L, consists of all words that can be formed by the concatenation of a word from L, with a word from Lz; for finite sets, concatenating an n word set with an m word set results in no more than n-m words. As shown in this example, the number of words can actually be less than n *m. Larger languages can be concatenated, also. For example, I**I = I+. The concatenation of two FAD languages is also FAD, as can easily be seen by employing NDFAs with A-transitions. EXAMPLE 5.9 Figure 5.5 illustrates two nondeterministic finite automata B, and B, that accept the languages L, and ~ given in Example 5.8. Combining these two machines and linking the final states of B, to the start states of B, with A-transitions yields a new NDFA that accepts Lj'~, as shown in Figure 5.6. Figure 5.5 Two candidates for concatenation Figure 5.6 An NDFA which accepts the concatenation of the machines discussed in Example 5.9 EXAMPLE 5.10 Consider the deterministic finite automata A, and Az displayed in Figure 5.7. These can similarly be linked together to form an NDFA that accepts the concatenation of the languages accepted by A j and Az, as shown in Figure 5.8. Sec. 5.1 FAD Languages and Basic Closure Theorems 155 if Ã~ if AE~ Figure 5.7 A pair of candidates for concatenation Figure 5.8 Concatenation of the machines in Example 5.10 via lambda-moves It is also possible to directly build a machine for concatenation without using any A-transitions, although the penalty for limiting our attention to less exotic machines is a loss of clarity in the construction. While the proof of the following theorem does not depend on A-transitions, the resulting machine is still nondeterministic. V Theorem 5.4. For any alphabet I, ~~ is closed under* (concatenation). Proof. Let L, and L, belong to ~~. Then there are deterministic finite automata Al = <I, S11 SOl' 811 r.» and A2 = <I, S2, S02' 82, F2> such that L(Al) = L, and L(A2 ) =~. Without loss of generality, assume Sl n S2 =F 0. Define a nondeterministic machine A' = <I, S', So, 8', F>, where S' = Sl U S2 (without loss of generality, assume Sl n S2 = 0) So = {soJ r-{FzFl.UFz and 8': rs, U S2) x I~ P(SI US2) is defined by 156 Closure Properties Chap. 5 !{81(S, an if s E SI Pi8'(s, a) = {81(s, a), 82(S02' an if s E Pi{82(s, an if s E S2 It can be shown thatL(A') =L(A1)*L(A2 ) = Ld-2 (see the exercises). Ll EXAMPLE 5.11 Conside the deterministic finite automata Al and A2in Example 5.10. These can be linked together to form the NDFA A', and the reader can indeed verify that the machine illustrated in Figure 5.9 accepts the concatenation of the languages accepted by Al and A2. Notice that the new transitions from the final states of Al mimic the transitions out of the start state of A2. Thus we see that avoiding A-transitions while defining a concatenation machine is relatively simple. Unfortunately, avoiding the nondeterministic aspects of the construction is relatively impractical and would basically entail re-creating the construction in Definition 4.5 (which outlined the method for converting an NDFA into a DFA). Whereas it was merely convenient (rather than necessary) to employ NDFAs to demonstrate that 2lJ1 is closed under union, the use of nondeterminism is essential to the proof of closure under concatenation. Figure 5.9 Concatenation of the machines in Example 5.10 without lambdamoves (Example 5.11) EXAMPLE 5.12 Consider the nondeterministic finite automata 81 and 82from Example 5.9. Applying the analog of Theorem 5.4 (see Exercise 5.43) yields the automaton shown in Figure 5.10. Notice that each final state of 81 now mimics the start state of 82, and to has become a disconnected state. Both So and SI are still final states since AE L (8 2 ) , EXAMPLE 5.13 Consider the nondeterministic finite automata 8 1 and 83 shown in Figure 5.11, where 81 is the same as that given in Example 5.9, while 83 differs just slightly from 82(tois no longer a final state). Note that L(83) = {aa, baa}, and AEt: L(83 ) . ApplySec. 5.1 FAD Languages and Basic Closure Theorems a 157 Figure S.10 An NDFA without lambda-moves which accepts the concatenation of the languages discussed in Example 5.8 a Figure S.11 Candidates for concatenation in which the second machine does not accept A (Example 5.13) ing Theorem 5.4 in this case yields the automaton shown in Figure 5.12. In this construction, So and S1 are no longer final states since the definition of F"must follow a different rule when Af/;. L (B3).By examining the resulting machine, the reader can verify that having t3 as the only final state is indeed the correct strategy for this case. Besides concatenation, string algebra allows other new operators on Iana Figure 5.12 The concatenation of the NDFAs in Example 5.13 158 Closure Properties Chap. 5 " , guages. The operators * and ", which have at this point only been defined for alphabets, likewise have natural extensions to languages. Loosely, we would expect L* to consist of all words that can be formed by the concatenation of several words from L. V Definition 5.5. Let L be a language over some alphabet I. Define LO= {A} U=L U=L*L U=L*U=L*L*L and in general L" = L*Ln-l, for n = 1,2,3, ... cc L * = U L i = LOU U U U u ... = {A} U L U L* L U L* L* L U ... i=O cc U= U V=UUUUUU***=LUL*LUL*L*LU ... i=l L* is called the Kleene closure of the language L. .:l EXAMPLE 5.14 If L = {aa, e}, then L * = {A, aa, e, aae, eaa, aaaa, ee, aaaaaa, aaaae, aaeaa, ... }. EXAMPLE 5.15 If K = {db, b, c}, then K* consists of all words (over [b, c, d}) for which each occurrence of d is immediately followed by (at least) one b. ~I is closed under both * and +. The technique for Kleene closure is outlined in Theorem 5.5. The construction for L + is similar (see the exercises). V Theorem 5.5. For any alphabet I, ~I is closed under * (Kleene closure). Proof. Let L belong to ~I' Then there is a nondeterministic finite automaton A = <I, S, So,3, F> such that L(A) = L. Define a nondeterministic machine A. = <I,S.,So.,3.,F.>, where S. = S U {go} (where go is some new element; gof/=. S) So' = So U {go} p. = F U {go} and 3. : (S U {go} x I~ p(S U {go}) is defined by 1 5(S' a) if sf/=. F U {go} 3.(s, a) = 3(s, a) U ( U 3(t, a») if s E F o tESo if s = go Sec. 5.1 FAD Languages and Basic Closure Theorems 159 We claim that L(A.) = L(A)* =L* (see the exercises). Il EXAMPLE 5.16 Consider the nondeterministic finite automaton B displayed in Figure 5.13, which accepts all words that contain exactly two (consecutive) bs. Using the modifications described above, the new NDFA B. would look like the automaton shown in Figure 5.14. Notice that the new automaton does indeed accept L(B)*, the set of all words in which the bs always occur in side-by-side pairs. This example also demonstrates the need for the special extra start (and final) state qo (see the exercises). Figure 5.13 The NDFA B in Example 5.16. Figure 5.14 The resulting NDFA for Example 5.16 It is instructive to compare the different approaches taken in the proofs of Theorems 5.4 and 5.5. In both cases, nondeterministic automata were built, but Theorem 5.4 began with deterministic machines, while Theorem 5.5 assumed that a NDFA was provided. Note that, in the construction of 3' in Theorem 5.4,31 was a deterministic transition function and as such produced a single state, whereas 3', on the other hand, must adhere to the nondeterministic definition and produce a set of states. As a consequence, the definition of 3' involved expressions like {31(s, a)}, which indicated that the single state given by 31(s, a) should be viewed as a singleton set. By contrast, Theorem 5.5 specified the nondeterministic transition function 3. in terms of 3, which was also assumed to be nondeterministic. This gave rise to definitions of the form 3.(s, a) = 3(s, a). In this case, no set brackets { }were necessary since 3(s, a) by assumption already represented a set (rather than just a single element as in the deterministic case). The definition of the new set of start states So is also affected by the type of 160 Closure Properties Chap. 5 machine from which the new NDFA is formed. In reviewing Theorems 5.4 and 5.5, the reader should be able to see the parallel between the differences in the specifications of the Bfunction and the differences in the definitions of So and So', It is also instructive to compare and contrast the proof of Theorem 5.2 to those discussed above. 5.2 FURTHER CLOSURE PROPERTIES The operators discussed in this section, while not as fundamental as those presented earlier, illustrate some useful techniques for constructing modified automata. Also explored are techniques that provide existence proofs rather than constructive proofs. V Theorem 5.6. For any alphabet l, 9JI is closed under the operator Z, where Z is defined by Z(L) = {x Ix is formed by deleting zero or more letters from a word in L}.• Proof. See the exercises and the following example. A EXAMPLE 5.17 Consider the deterministic finite automaton C displayed in Figure 5.15, which accepts the language {anbmln;::: I,m;::: I}. Z(L(C» would then be {anbmln ;:::O,m ;:::O} Figure 5.15 The automaton C discussed in Example 5.17 and can be accepted by modifying C so that every transition in the diagram has a corresponding X-move (allowing that particular letter to be skipped), as shown in Figure 5.16. Figure 5.16 An automaton accepting Z(C) in Example 5.17 Sec. 5.2 Further Closure Properties 161 b Figure 5.17 The automaton 0 discussed in Example 5.18 V Theorem 5.7. For any alphabet I, ~I is closed under the operator Y, where Y is defined by Y(L) = {x Ix is formed by deleting exactly one letter from a word in L}. Proof. See the exercises and the following example. EXAMPLE 5.18 We need a way to skip a letter as was done in Example 5.17, but we must now skip one and only one letter. The technique for accomplishing this involves using copies of the original machine. Consider the deterministic finite automaton D displayed in Figure 5.17. We will use A-moves to mimic normal transitions, but in this case we will move from one copy of the machine to an appropriate state in a second copy. Being in the first copy of the machine will indicate that we have yet to skip a letter, and being in the second copy will signify that we have followed exactly one A-move and have thus skipped exactly one letter. Hence the second copy will be the only one in which states are deemed final, and the first copy will contain the only start state. The modified machine for this example might look like the NDFA shown in Figure 5.18. The string aba, which is accepted by the original machine, should cause ab, aa, and ba to be accepted by the new machine. Each of these three are indeed accepted, by following the correct A-move at the appropriate time. A similar technique, with the state transition function slightly redefined, could be used to accept words in which every other letter was deleted. If one wished only to acknowledge every third letter, three copies of the machine could be suitably connected together to achieve the desired result (see the exercises). Figure 5.18 The modified machine in Example 5.18 162 Closure Properties Chap. 5 While 0J l is certainly the most important class of languages we have seen so far, we will now consider some other classes whose properties can be investigated. The closure properties of other collections of languages will be considered in the exercises and in later chapters. V Definition 5.6. Let ~ be an alphabet. Then OWl is defined to be the set of all languages over ~ recognized by NDFAs; that is, OWl = {LC~*13 NDFA N:1 L(N) = L}. V Lemma 5.2. Let ~ be an alphabet. Then OWl = ~l' Proof. The proof follows immediately from Theorem 4.1 and Exercise 4.25. The reader should note that Lemma 5.2 simply restates in new terms the conclusion reached in Chapter 4, where it was proved that NDFAs were exactly as powerful as DFAs. More specifically, it was shown that any language that could be recognized by a NDFA could also be recognized by a DFA, and conversely. While every subset of ~* represents a language, those in ~l have exhibited many nice properties owing to the convenient representation afforded by finite automata. We now focus our attention on "the other languages," that is, those that are not in ~l' V Definition 5.7. Let ~ be an alphabet. Then Xl is defined to be the set of all non-FAD languages over S; that is, Xl ={L C ~* Ithere does not exist any finite automaton M :1 L(M) =L}. Xl is all the "complicated" languages (subsets) that can be formed from ~*; that is, Xl = p(~ *) - ~l' Be careful not to confuse Xl with the set of languages that can be recognized by NDFAs (OWl in Definition 5.6). V Theorem 5.8. Let ~ be an alphabet. Then Xl is closed under ~ (complementation). Proof. (by contradiction): Assume the lemma is not true, which means that there exists a language K for which K E Xl 1\ ~K $. X l ~ (by definition of Xl) ~K E ~l~ (by Theorem 5.1) -(~K) E ~l~ (since ~(~K) = K) K E ~l~ (by definition of Xl) K $. Xl Sec. 5.2 Further Closure Properties 163 .., which contradicts the assumption. Thus the lemma must be true. A V Lemma 5.3. X{a, b} is not closed under n. Proof. Let L1 = {aP Ip is prime} and let L, = {lY'lp is prime}. Then L1 E X{a, bj, ~ E X{a,b}' but L1 ñ = 0tt. X{a,b} (why?). A As another useful example of closure, we consider the transformation of one language to another via a language homomorphism, which represents the process of consistently replacing each single letter a, by a word Wi. Such transformations are commonplace in computer science; some applications expect lines in a text file to be delimited with a carriage return/line feed pair, while other applications expect only a carriage return. Stripping away the unwanted line feeds is tantamount to applying a homomorphism that replaces most ASCII characters by the same symbol, but replaces line feeds by A. Converting all lowercase letters in a file to uppercase is another common transformation that can be defined by a language homomorphism. V Definition 5.8. Let S = [a., az, ... ,am}be an alphabet and let I' be a second alphabet. Given words Wb Wz, •.. , Wm over I'", define a language homomorphism ~: !,~P by ~(ai) = w;for each i, which can be extended to \jJ: !,*~ P by: \jJ(A) = A (V'aE!')(V'x E!,*)(\jJ(a*x) = ~(a)*(\jJ(x))) \jJ can be further extended to operate on a language L by defining \jJ(L) = {\jJ(z) E P [z E L} In this context, \jJ: p(!,*)~ pep). A EXAMPLE 5.19 Let!' = {a, b} and I' = [c, d}, and define ~ by ~(a) = cd and ~(b) = d. For K = {A, ab, bb}, \jJ(K) = {A, cdd, dd}, while for L = {a, b}*, Ij/(L) represents all words over {c,d} in which every c is immediately followed by d. EXAMPLE 5.20 As a second example, let!' =n, (} and let I' be the ASCII alphabet. If j.L is defined by j.L(O = begin and j.L()) = end, then the set M of all strings of matched parentheses maps to K, the set of all matched begin-end pairs . 164 Closure Properties Chap. 5 ,.. A general homomorphism IjJ maps a language over I to a language over r. However, to consider the closure of ~};, we will restrict our attention for the moment to homomorphisms for which I' =I. It is more generally true, though, that even for language homomorphisms between two different alphabets, if L is FAD, the homomorphic image of L is also FAD (see the exercises). V Theorem 5.9. Let I be an alphabet, and let 1jJ: I~ I* be a language homomorphism. Then ~}; is closed under $. Proof. See the exercises and the following examples. A much more concise way to handle this transformation will be seen in Chapter 6 when substitutions are explored. a If the homomorphism is length preserving, that is, if it always maps letters to single letters, it is relatively easy to define a new automaton from the old one. Indeed, the state transition diagram hardly changes at all; all transitions marked b are simply relabeled with ljJ(b). For more complex homomorphisms, extra states must be added to accommodate the processing of the surplus letters. The following two examples illustrate the appropriate transformation of the state transition function and suggest a convenient labeling for the new states. EXAMPLE 5.21 Consider the DFA B displayed in Figure 5.19a. For the homomorphism f1 defined by £(a) = a and £(b) = a, the automaton that will accept f1(L(B)) is shown in Figure 5.19b. Note that even in simple examples like this one the resulting automaton can be nondeterministic. (a) (b) Figure 5.19 (a) The automaton discussed in Example 5.21 (b) The resulting automaton for Example 5.21 EXAMPLE 5.22 For the NDFA C displayed in Figure 5.20a and the homomorphism f1 defined by f1(a) = cc and f1(b) = a, the automaton that will accept f1(L(C)) is shown in Figure 5.20b. Note that each state of C requires an extra state to accommodate the cc transition. Sec. 5.2 Further Closure Properties a 165 b (a) (b) Figure 5.20 (a) The automaton discussed in Example 5.22 (b) The resulting automaton for Example 5.22 EXAMPLE 5.23 Consider the identity homomorphism J.L: II* defined by (Va E I)(J.L(a)= a). Since jI(L) = L, any collection of languages, including .N',£, is clearly closed under this homomorphism. Unlike ~,£, though, there are many homomorphisms under which .N''£ is not closed. V Lemma 5.4. Let I ={a, b}, and let ~: II* be defined by ~(a) =a and ~(b) a. Then X,£ is not closed under ~. Proof. Consider the set L of all strings that have the same number of as as bs. This language is in X ,£, but ~(L) is the set of all even-length strings of as, which is clearly not in X,£. !:J. A rather trivial example involves the homomorphism defined by ljJ(a) = 'A for every letter a E I. Then for all languages L, whether or not LEX,£, iii(L) = {'A}, which is definitely not in XsV Definition 5.9. Let 1jJ: II" be a language homomorphism and consider z E I'", The inverse homomorphic image of z under iii is then iji-l(Z) ={x EI*/iji(x) z} For a language L ~ I" , the inverse homomorphic image of Lunder iji is defined by iji-l(L) = {x E I* liji(x) E L} Thus, x Eiji-l(L) ~ iji(x) E L. While the image of a string under a homomorphism is a single word, note that the inverse image of a single string may be an entire set of words. 166 Closure Properties Chap. 5 EXAMPLE 5.24 Consider ~ from Lemma 5.4 in which ~: !,-,)!,* was defined by ~(a) = a and ~(b) = a. Let z = aa. Since ~(bb) = ~(ba) = ~(ab) = ~(aa) = aa, ~-l(aa) = {bb, ba, ab, aa}. Note that ~-I(ba) = 0. For L={xE{a}*llxl=Omod3}, ~-I(L)={xE{a,b}*llxl=Omod3}.Note that this second set is definitely larger, since it also contains words with bs in them. It can be shown that 2ll l is closed under inverse homomorphism. The trick is to make the state transition function of the new automaton simulate, for a given letter a, the action the old automaton would have taken for the entire string ljJ(a). As the following proof will illustrate, the only change that need take place is in the 8 function; the newly constructed machine is even deterministic! V Theorem 5.10. Let E be an alphabet, and let ljJ: !, -,) !, * be a language homomorphism. Then 2ll l is closed under 1jI-1. Proof. Let L E 2lll . Then there exists a DFA A = <!', s, so, 8, F> such that L(A) = L. Define a new DFA N = <!', s-,s~, 80/, r» by So/=S s~ = So Fo/=F and 80/ is defined by 80/(s, a) = 8(s, ljJ(a)) "IsE S, Va E!, Induction can be used to show 80/(s, x) = 8(s, ljI(x)) "Is E S, "Ix E!'*, and in particular 80/(so, x) = 8(so,ljI(x)) "Ix E!'*. Hence L(Ao/) = 1jI-I(L(A)). 6 . . This theorem makes it possible to extend the range of the pumping lemma (Theorem 2.3) to many otherwise unpleasant problems. The set M given in Example 5.20 can easily be shown to violate Theorem 2.3 and is therefore not FAD. The set K given in Example 5.20 is just as clearly not FAD, but this is quite tedious to formally prove by the pumping lemma (the number of choices for u, v, and w is prohibitively large to thoroughly cover). An argument might proceed as follows: Assume K were FAD. Then M, being the inverse homomorphic image of a FAD language, must also be FAD. Since M is known (by an easy pumping lemma proof) to be definitely not FAD, the assumption that K is FAD must be incorrect. Thus, K E Xl. V Lemma 5.5. Let!' = {a, b}, and let ~: !,-,)!,* be defined by ~(a) = a and Hb) = a. Then Xl is not closed under ~-I. Sec. 5.2 Further Closure Properties 167 Proof. Consider the set L of all strings that have the same number of as as bs. This language is in Xl, but ~-1(L) is {X.}, which is clearly not in Xl' d We close this chapter by considering two operators for which it is definitely not convenient to modify the structure of an existing automaton to construct a new automaton with which to demonstrate closure. V Theorem 5.11. Let I be an alphabet. Define the operator b by L b={xI3y EI* j (xy EL 1\ [xl = [y[)} Then ~l is closed under the operator b. Proof. Lb represents the first halves of all the words in L. For example, if K = {ad, abaa, ccccc}, then Kb = {a,ab}, Assume that L is FAD. Then there exists a DFA A = <I, S, so, 8, F> that accepts L. The proof consists of identifying those states q that are "midway" between the start state and a final state; specifically, we need to identify the set of strings for which q is the midpoint. The previous closure results for union, intersection, homomorphism, and inverse homomorphism will be used to construct the language representing Lb. Define the length homomorphism 1\1: I~ {I}* by l\1(a) = I for all a E I. Note that tjJ effectively counts the number of letters in a word: tjJ(x) = Ilxl The following argument can be applied to each state q to determine the set of strings that use it as a "midway" state. . Consider the initial set for q, I(A, q) ={xI8(so,x) = q}and the terminal set for q, T(A,q)={xj8(q,x)EF}. We are interested in finding those words in I(A,q) that are the same length as words in T(A, q). tjJ(I(A, q)) represents strings of Is whose lengths are the same as words in I(A, q). A similar interpretation can be given for tjJ(T(A, q)). Therefore, tjJ(I(A, q)) n tjJ(T(A, q)) will reflect those lengths that are common to both the initial set and the terminal set. The inverse image under tjJ for this set will then reflect only those strings in I * that are of the correct length to reach q from so. This set is tjJ-l(tjJ(I(A, q)) n tjJ(T(A, q))). Not all strings of a given length are likely to reach q, though, so this set must be intersected with I (A, q) to correctly describe those strings that are both of the proper length and that reach q from the start state. This set, I(A, q) n tjJ-l(tjJ(I(A, q)) n tjJ(T(A, q))), is thus the first halves of all words that have q as their midpoint. This process can be repeated for each of the (finite) number of states in the automaton A, and the union of the resulting sets will form all the first halves of words that are accepted by A; that is, the union will equal Lb. Note that by moving the start state of A and forming the automaton Aq= <I,S,q,8,F>, each of the initial sets I(A,q) can be shown to be FAD. Similarly, the automaton Aq = <I, S, so, fl,{q}> illustrates that each terminal set T(A, q) must be FAD, also. Since L b has now been shown to be formed from these 168 Closure Properties Chap. 5 basic FAD sets by applying homomorphisms, intersections, inverse homomorphisms, and unions, Lb must be FAD since ~~ is closed under each of these types of operations. a EXAMPLE 5.25 Consider the automaton A displayed in Figure 5.21. For the state highlighted as q, the quantities discussed in Theorem 5.11 would be as follows: leA, q) = {abc, abcabc, abcabcabc, abcabcabcabc, ... } T(A, q) = {aa, aaaa, aaaaaa, aaaaaaaa, ... } = {a2 a4 a6 a8 a10 a12 }, , , , , , ... \jI(I(A,q)) = {13, 16, 19, 112,115,... } \jI(T(A, q)) = W, 14, 16, 18, 110, 112, ••• } \jI(I(A, q)) n \jI(T(A, q)) = {16, 112, 118,... }= {x E {IV Ilx1== Omod 6} \jI-1(\j1(I(A, q)) n \jI(T(A, q))) = {x E {a, b, cV Ilx1== 0 mod 6} = {aaaaaa, aaaaab,aaaaac, aaaaba, aaaabb, ... } leA, q) n \jI-1(\j1(I(A, q)) n \jI(T(A, q))) = {abcabc, abcabcabcabc, ... } Figure 5.21 The automaton discussed in Example 5.25 Similar calculations would have to be done for each of the other states of A. Once again,.N'~does not enjoy the same closure properties that ~~ does. V Lemma 5.6. Let ~ be an alphabet. Then .N'~ is not closed under the operator b. Proof. Let L = {a'b" In;;:: O} E.N'~. Then Lb = {an In;;:: O} ft:.N'~. Sec. 5.2 Further Closure Properties 169 Other examples that show .Ñ is not closed under the operator b abound. If K = {x E {a, b]" I IxI. = Ix Ib}, then Kb = {a, b]". The last operator we will cover in this chapter is useful for illustrating closures that may not be effective, that is, for which there may not exist an algorithm for constructing the desired entity. V Definition 5.10. Let LI and L, be languages. The quotient of LI with ~, written LI/~' is defined by LI/~ = {x 13y E I* ;,Y E L, A xy ELI} ~ Roughly speaking, the quotient consists of the beginnings of those words in LI that terminate in a word from ~. EXAMPLE 5.26 Let I = {a, b}*. {b",b\ b6 , b8 , b", bl2 , ••• }/{b} = {b', b", b5, b", b19 , b", ... }. Note that {b', b", b6, b8, b lO, bl2 , ••• }/{a} ={ }. V Theorem 5.12. For any alphabet I, 2ll~ is closed under quotient. Proof. Let LI and L, belong to 2ll~. Then there is a deterministic finite automaton Al = <I, SI, SOl' 8I, Pi> such that L (AI) = LI. An automaton that accepts LI/~ can be defined by AI= <I, SI, SOl' 81) pi>, where pi is defined to be the set of all states t for which there is a word in L, that reaches a final state from t. That is, pi ={t/3y EI*;, (y E~ A ~I(t,y) EP)}. It can be shown that A'does indeed accept LI/~, and hence 2ll~ is closed under quotient (see the exercises). ~ Note that the above proof did not mention the automaton associated with the second Ianguage Lj. Indeed, the definition given for pi is sufficient to argue that the new automaton does recognize the quotient of the two languages. It was not actually necessary to deal with an automaton for L, in order to argue that there must exist a DFA that recognizes LI/~' The proof of Theorem 5.12 is thus an existence proof, but does not indicate whether 2ll~ is effectively closed under quotient. Indeed, Theorem 5.12 actually proves that the quotient of a FAD language with any other language (including those in .N'~) will always be FAD. However, if it is hard to determine just which strings in the other language may have the properties we need to define pi; we may not really know which subset of states pi should actually be [after all, we could hardly check the property 31(q , y) E P, one string at a time, for each of an infinite number of strings y iñ in a finite amount of time]. Fortunately, it is not necessary to know pi exactly, since there are only a finite number of ways to choose a set of final states in the automaton A', and the proof of Theorem 5.12 assures us that one of those ways must be the correct one that admits the conclusion L (AI) =LI/L2. It would, however, be quite convenient to know what pi actually is so that we 170 Closure Properties Chap. 5 could construct the automaton that actually accepts the quotient; this seems much more satisfying that just knowing that such a machine must exist! If L, is FAD, the existence of an automaton Az = <!', Sz, S02' ~>z, F2> for which L(Az) = L, does make it possible to calculate F1 exactly (see the exercises). Thus, ~~ is effectively closed under quotient. In later chapters, languages that may make it impossible to determine F1 will be studied. We defer the details of such problems until then. V Lemma 5.7. Let!, = {a, b}. .N'~ is not closed under quotient. Proof. Consider the set L of all strings that have a different number of as than bs. This language is in .N'~, but L/L =!,* (why?). Ll From the exercises it will become clear that.N'~ is not closed over most of the usual (or unusual!) operators. Note that ~~ is by contrast a very special set, in that it appears to be closed over every reasonable unary and binary operation that we might consider. The question of closure will again arise as more complex classes of machines and languages are presented in later chapters. EXERCISES 5.1. Let ~ be an alphabet. Define F~ to be the collection of all finite languages over ~. Prove or give counterexamples to the following: (a) F~ is closed under complementation. (b) F~ is closed under union. (c) F~ is closed under intersection. (d) F~ is closed under concatenation. (e) F~ is closed under Kleene closure. (0 F~ is closed under relative complement. 5.2. Let ~ be an alphabet. Define C~ to be the collection of all cofinite languages over ~ (a language is cofinite if it is the complement of a finite language). Prove or give counterexamples to the following: (a) C~ is closed under complementation. (b) C~ is closed under union. (c) C~ is closed under intersection. (d) C~ is closed under concatenation. (e) C~ is closed under Kleene closure. (0 C~ is closed under relative complement. 5.3. Let ~ be an alphabet. Define B~= F~ U C~ (see Exercises 5.1 and 5.2). Prove or give counterexamples to the following: (a) B~ is closed under complementation. (b) B~ is closed under union. (c) B~ is closed under intersection. (d) B~ is closed under concatenation. (e) B~ is closed under Kleene closure. (0 B~ is closed under relative complement. Chap. 5 Exercises 171 5.4. Let ~ be an alphabet. Define Is to be the collection of all infinite languages over ~. Note that Is = p(~*) F}; (see Exercise 5.1). Prove or give counterexamples to the following: (a) I}; is closed under complementation. (b) I}; is closed under union. (c) I}; is closed under intersection. (d) I}; is closed under concatenation. (e) I}; is closed under Kleene closure. (I) I}; is closed under relative complement. 5.5. Let ~ be an alphabet. Define J}; to be the collection of all languages over ~ that have infinite complements. Note that J}; = p(~*) C}; (see Exercise 5.2). Prove or give counterexamples to the following: (a) J}; is closed under complementation. (b) J}; is closed under union. (c) J}; is closed under intersection. (d) J}; is closed under concatenation. (e) J};is closed under Kleene closure. (I) J}; is closed under relative complement. 5.6. Let ~ be an alphabet. Define E to be the collection of all languages over {a, b} that contain the word abba. Prove or give counterexamples to the following: (a) E is closed under complementation. (b) E is closed under union. (c) E is closed under intersection. (d) E is closed under concatenation. (e) E is closed under Kleene closure. (I) E is closed under relative complement. 5.7. If a collection of languages is closed under intersection, does it have to be closed under union? Prove argive a counterexamaple. 5.8. If a collection of languages is closed under intersection and complement, does it have to be closed under union? Prove or give a counterexample. 5.9. Show that if a collection of languages is closed under concatenation it is not necessarily closed under Kleene closure. 5.10. Show that if a collection of languages is closed under Kleene closure it is not necessarily closed under concatenation. 5.11. Show that if a collection of languages is closed under complementation it is not necessarily closed under relative complement. 5.12. Give a finite set of numbers that is closed under V. 5.13. Give an infinite set of numbers that is closed under V. 5.14. Given deterministic machines Al and Az, use the definition of AU and Definition 4.5 to describe an algorithm for building a deterministic automaton AU that will accept L(A I ) UL(Az) . 5.15. Given deterministic machines Al and Az , and without relying on the construction used in Theorem 5.2: (a) Build a deterministic automaton AU that will accept L (AI) U L (A z). (b) Prove that your construction behaves as advertised. (c) If no minimization is performed in Exercise 5.14, how do the number of states in 172 Closure Properties Chap. 5 AU, AU, and' AU compare? (Assume A\ has n states and A2 has m states, and give expressions based on these variables.) 5.16. Let l be an alphabet. Define the (unary) operator P by peL) = {x 13y E l* j xy E L} (for any collection of words L) peL) then represents all the prefixes of words in L. For example, if K = {a, bbe, dd}, then P(K) = {A, a, b, bb, bbe, d, dd}, Prove that 2lJ:l; is closed under the operator P. 5.17. Let l be an alphabet. Define the (unary) operator S by S(L) = {x 13y E l* j yx E L} (for any collection of words L) S(L) then represents all the suffixes of words in L. For example, if K = {a, bbe, dd}, then S(K) = {A, a, e, be, bbe, d, dd}, (a) Given an automaton accepting L, describe how to modify it to produce an automaton accepting S(L). (b) Prove that your construction behaves as advertised. (e) Argue that 2lJ:l; is closed under the operator S. 5.18. Let l be an alphabet. Define the (unary) operator C by CCL) = {x 13y, Z E l* j yxz E L} (for any collection of words L) C(L) then represents all the centers of words in L. For example, if K = {a, bbe, dd}, then C(K) = {A, a, e, be, bbe, b, bb, d, dd}, (a) Given an automaton accepting L, describe how to modify it to produce an automaton accepting C (L). (b) Prove that your construction behaves as advertised. (e) Argue that 2lJ:l; is closed under the operator C. 5.19. Let l be an alphabet. Define the (unary) operator F by F(L) = {x Ix E L!\ (if3y E l* hy E L, then y = A)} F(L) then represents all the words in L that are not the beginnings of other words in L. For example, if K = {ad, ab, abbad}, then F(K) = {ad, abbad}. Prove that 2lJ:l; is closed under the operator F. 5.20. Let l be an alphabet, and x = a\a2 ... an_ IanE l "; define x r' = a.a, 1 .•• a2a\. For a language Lover l, define V = {xrlx E L}. Note that the (unary) reversal operator r is thus defined by V = [a.a, _\ ... a3a2a\1 a\a2a3... an-Ian E L}, and V therefore represents all the words in L written backward. For example, if K = {A, ad, bbe, bbad}, then K' = {A, da, ebb, dabb]. (a) Given an automaton accepting L, describe how to modify it to produce an automaton accepting V. (b) Prove that your construction behaves as advertised. (e) Argue that 2lJ:l; is closed under the operator r. 5.21. Let l = {a, b, e, d}. Define the (unary) operator G by ={wr,wlwEL} (see the definition of wr in Exercise 5.20). As an example, if K = {A, ad, bbe, bbad}, then G (K) = {A, daad, ebbbbe, dabbbbad}. Chap. 5 Exercises 173 (a) Prove that ~l: is not closed under the operator G. (b) Prove that Xl: is closed under the operator b. 5.22. Prove that Xl: is closed under the operator r (see Exercise 5.20). 5.23. Prove that Xl: is not closed under the operator P (see Exercise 5.16). 5.24. Prove that Xl: is not closed under the operator S (see Exercise 5.17). 5.25. Prove that Xl: is not closed under the operator C (see Exercise 5.18). 5.26. Prove that Xl: is not closed und.er the operator F (see Exercise 5.19). 5.27. Consider the following alternate "proof' of Theorem 5.1: Let A be an NDFA and define Aas suggested in Theorem 5.1. Give an example to show that L(A) might not be equal to -L(A). 5.28. Complete the proof of Lemma 5.7. 5.29. Give an example of a collection of languages that is closed under union, concatenation, and Kleene closure, but is not closed under intersection. 5.30. If a collection of languages is closed under union, does it have to be closed under intersection? Prove or give a counterexample. 5.31. Refer to the construction in Theorem 5.4 and prove that L(A') = Lt*Lz. Warning! This involves a lot of tedious details. 5.32. Refer to the construction in Theorem 5.5 and prove that L(A.) = L*. Warning! This involves a lot of tedious details. 5.33. Amplify the explanations for each of the equivalences in the proof of Theorem 5.2. 5.34. Given a DFA A = <I, S, sO, 5, F>, define an NDFA that will accept L(A)+. 5.35. Given a NDFA A = <I,S,So,5, F>, define a NDFA that will accept LtA)". 5.36. If L is FAD, is it necessarily true that all subsets of L are FAD? Prove or give a counterexample. 5.37. If L E ~l:, is it necessarily true that all supersets of L are in ~l:? Prove or give a counterexample. 5.38. If LEXl:, is it necessarily true that all subsets of L are in Xl:? Prove or give a counterexample. 5.39. If LEXl:, is it necessarily true that all supersets of L are in Xl:? Prove or give a counterexample. 5.40. Explain the purpose of the new start state qo in the proof of Theorem 5.5. 5.41. Redesign the construction in the proof of Theorem 5.4, making use of x-transitions where appropriate. 5.42. Redesign the construction in the proof of Theorem 5.5, making use of x-transitions where appropriate. Do this in such a way as to make the "extra" start state qounnecessary. 5.43. Redesign the construction in the proof of Theorem 5.4, assuming that At and Az are NDFAs. 5.44. Redesign the construction in the proof of Theorem 5.5, assuming that A is a DFA., 5.45. How does the right congruence generated by a language L compare to the right congruence generated by the complement of L? Hint: It may be helpful to consider the construction of Agiven in Theorem 5.1 when A is a minimal machine accepting L. 5.46. (a) Give examples of languages L, and L, for which R(Lt n L2) = RL1n RL2. 174 Closure Properties Chap. 5 (b) Give examples of languages L1and L2for which RCL 1n L2) =f. RL 1 n RL2 • Hint: It may be helpful to consider the construction of An given in Lemma 5.1 to direct your thinking. 5.47. Consider the following assertion: ~~ is closed under relative complement; that is, if L1 and L2are FAD, then L1L, is also FAD. (a) Prove this by appealing to existing theorems. (b) Define an appropriate "new" machine. (c) Prove that the machine constructed in part (b) behaves as advertised. 5.48. Define;e~ to be the set of all languages recognized by NDFAs with A-transitions. What sort of closure properties does ;e~ have? How does ;e~ compare to ~~? 5.49. (a) Give an example of a language L for which AE L". (b) Give three examples of languages L for which L+ = L. 5.50. Recall that l)u: (51U 52) X I-+p(51U 52) was defined by ( 8l(S, a) if s E s. l)U(s, a) = 'tis E Sl U Sz, 'tIa E ~ 8z(s, a) if s E Sz (a) Prove (by induction) that 8u conforms to a similar formula: , if s E S: 'tIx E ~* (b) Was this fact used in the proof of Theorem 5.2? 5.51. Let I be an alphabet. Prove or give counterexamples to the following: (a) .N'~ is closed under relative complement. (b) .N'~ is closed under union. (c) .N'~ is closed under concatenation. (d) .N'~ is closed under Kleene closure. (g) IfLE.N'~, then L+ E.N'~.. 5.52. Why was it necessary to require that 51n 52= 0in the proof of Theorem 5A? Would any step of the proof be invalid without this assumption? Explain. 5.53. Let I be an alphabet. Define E(L) = {z I(3y E I+)(3x E L)(z = yx)}. (a) Given an automaton accepting L, describe how to modify it to produce an automaton accepting E(L). (b) Prove that your construction behaves as advertised. (c) Argue that ~~ is closed under the operator E. 5.54. Let I bean alphabet. Define B(L) ={z 1(3x E L)(3y EI*)(z =xy)}. (a) Given an automaton accepting L, describe how to modify it to produce an automaton accepting B(L). (b) Prove that your construction behaves as advertised. (c) Argue that ~~ is closed under the operator B. 5.55. Let I be an alphabet. Define M(L) ={z 1(3x EL)(3y EI+)(z =xy)}. (a) Given an automaton accepting L, describe how to modify it to produce an automaton accepting M(L). (b) Prove that your construction behaves as advertised. . . (c) Argue that ~~ is closed under the operator M. Chap. 5 Exercises 175 5.56. Refer to the definitions given in Lemma 5.1 and use induction to show that (Vs E SI)(Vt E Sz)(Vx E I*)(Bn«s, t),x) = (B1(s, x), Bz(t,x») 5.57. Refer to Lemma 5.1 and prove that L(A n ) = L1 n Lz. As long as the reference is explicitly stated, the result in Exercise 5.56 can be used without proof. 5.58. Prove Theorem 5.6. 5.59. Prove Theorem 5.7. 5.60. (a) Cleverly define a machine modification that does not use any A-moves that could be used to prove Theorem 5.7 (your new machine is still likely to be nondeterministic, however). (b) Prove that your modified machine behaves as advertised. 5.61. Let W(L) = {x Ix is formed by deleting one or more letters from a word in L}. (a) Given an automaton accepting L, describe how to modify it to produce an automaton accepting W(L). (b) Prove that your construction behaves as advertised. (c) Argue that Q1)~ is closed under the operator W. 5.62. Let V(L) = {x Ix is formed by deleting the odd-positioned letters from a word in L}. [Note: This refers to the first, third, fifth, and so on, letters in a word. For example, if abcdefE L, then bdfE V(L).] (a) Given an automaton accepting L, describe how to modify it to produce an automaton accepting V (L). (b) Prove that your construction behaves as advertised. (c) Argue that Q1)~ is closed under the operator V. 5.63. Let U(L) = {x Ix is formed by deleting the even-positioned letters from a word in L}. [Note: This refers to the second, fourth, sixth, and so on, letters in a word. For example, if abcdefg E L, then aceg E U(L).] (a) Given an automaton accepting L, describe how to modify it to produce an automaton accepting U (L). (b) Prove that your construction behaves as advertised. (c) Argue that Q1)~ is closed under the operator U. 5.64. Let T(L) = {x Ix is formed by deleting every third, sixth, ninth, and so on, letters from a word in L}. [Note: This refers to those letters in a word whose index position is congruent to 0 mod 3. For example, if abcdefg E L, then abdeg E T(L).] (a) Given an automaton accepting L, describe how to modify it to produce an automaton accepting T(L). (b) Prove that your construction behaves as advertised. (c) Argue thatQ1)~ is closed under the operator T. 5.65. Let P = {x Ilxl is prime} and let I(L) be defined by I(L) = L n P. (a) Show that Q1)~ is not closed under I. (b) Show that F~ is closed under I (see Exercise 5.1). (c) Prove or disprove: C~ is closed under I (see Exercise 5.2). (d) Prove or disprove: B~ is closed under I (see Exercise 5.3). (e) Prove or disprove: Is is closed under I (see Exercise 5.4). (I) Prove or disprove: J~ is closed under I (see Exercise 5.5). (g) Prove or disprove: E is closed under I (see Exercise 5.6). (h) Prove or disprove: Nz is closed under I. 176 Closure Properties Chap. 5 5.66. Define C to be the collection of all languages over {a,b} that do not contain A. Prove or give counterexamples to the following: (a) C is closed under complementation. (b) C is closed under union. (c) C is closed under intersection. (d) C is closed under concatenation. (e) C is closed under Kleene closure. (0 C is closed under relative complement. (g) If LEe, then L+ E C. 5.67. (a) Consider the statement that 2ll}; is closed under finite union: (i) Prove by existing theorems and induction. (ii) Prove by construction. (b) Prove or disprove that 2ll}; is closed under infinite union. Justify your assertions. 5.68. Let I = {a, b}. (a) Give examples of three homomorphisms under which j(}; is not closed. (b) Give examples of three homomorphisms under which j(}; is closed. 5.69. Let I = {a}. Can you find two different homomorphisms under which j(}; is not closed? Justify your conclusions. 5.70. Refer to the construction given in Theorem 5.10. (a) Prove 8"(s,x) = 8(s, iJI(x)) \Is E S, \Ix E I*. (b) Complete the proof of Theorem 5.10. 5.71. Consider the homomorphism ~ given in Lemma 5.4 and the set L of all strings that have the same number of as as bs. (a) 2ll}; is closed under inverse homomorphism, but ~(L) is the set of all even-length strings of as, and it appears that under ~-l the DFA language ~(L) maps to the non-FAD language L. Explain the apparent contradiction. Hint: First compute ~-l(~(L)). (b) Give an example of a homomorphism for which iJI(iJI-\L)) 1L. (c) Give an example of a homomorphism for which iJI-\iJI(L)) 1L. (d) Prove iJI(iJI-l(L)) c L. (e) Prove L c iJI-l(iJI(L)). 5.72. Let I be an alphabet. Define the (unary) operator e by L' = {x 13y EI* Ol (yx EL 1\ Ixl = Iyl)} L" then represents the last halves of all the words in L. For example, if K = {ad, abaa, ccccc}, then K"= {d,aa}. Prove that 2ll}; is closed under the operator e. 5.73. Refer to the proof of Theorem 5.11 and show that there exists an automaton A for which it would be incorrect to try to accept L b by redefining the set of final states to be the set of "midway" states. 5.74. Consider the sets M and K in Example 5.20. Assume thatwe have used the pumping lemma to show that M is not FAD. What would be wrong with arguing that, since M was not FAD, its homomorphic image cannot be FAD either, and hence K is therefore not FAD. 5.75. Prove Theorem 5.9. Chap. 5 Exercises 177 5.76. Let I be the ASCII alphabet. Define a homomorphism that will capitalize all lowercase letters (and does not change punctuation, spelling, and the like). 5.77. Consider the proof of Theorem 5.12. (a) Show that for AI defined by AI = <I, St, SOl' lit, FI>,where FI ={tl3y .I*:J (y.Lz A 8t (t , y) .F)},L(N) = LdLz. (b) Given deterministic finite automata At = <I, St, SOl' lit, F;> such that L (At) = L, and Az = <I, Sz, saz, liz,Fz> for which L(Az) = Lz, give an algorithm for computing FI = {tl3y E I* :J (y E Lz A 8t (t , y) E F)}. 5.78. Given two alphabets It and I z and a DFA A = <It, S, sO, 8, F>: (a) Define a new automaton A' = <It U Iz,S', so,8' ,F'> for which L(A') =L(A). (b) Provethat A' behaves as advertised. 5.79. Let S be a collection of languages that is closed under union, concatenation, and Kleene closure. Prove or disprove: IfS contains an infinite number of languages, every language in S must be FAD. 5.80. Let S be a collection of languages that is closed under union, concatenation, and Kleene clsoure. Prove or disprove: If S is a finite collection, every language in S must be FAD. 5.81. Let u be a unary language operator that, when composed with itself, yields the identity function. Prove that .N:l: must be closed under u. CHAPTER REGULAR EXPRESSIONS In this chapter we will develop a standard notation for denoting FAD languages and thus explore yet another characterization of these languages. The specification of a language by an automaton unfortunately does not provide a convenient summary of those strings that are accepted; it is straightforward to check whether any particular word belongs to the language, but it is often difficult to get an overall sense of the set of accepted words. Were the language finite, the individual words could simply be explicitly listed. The delineation of an infinite set in this manner is clearly impossible. Up to this point, we have relied on English descriptions of the languages under consideration. Natural languages are unfortunately imprecise, and even small machines can have impossibly complex descriptions. the concept of regular expressions provides a clear and concise vehicle for denoting many of the languages we have studied in the previous chapters. 6.1 ALGEBRA OF REGULAR EXPRESSIONS The definition of set union and the concepts of language concatenation (Definition 5.4) and Kleene closure (Definition 5.5) afford a convenient and powerful method for building new languages from existing ones. The expression ({a, b}*{c))**{d} is an infinite set built from simple alphabets and the operators presented in Chapter 5. We will see that this type of representation is quite suitable for our purposes and is intimately related to the finite automaton definable languages. 178 Sec. 6.1 Algebra of Regular Expressions 179 V Definition 6.1. Let I = {a., az, ... ,am}be an alphabet. A regular set over I is any set that can be formed by a sequence of applications of the following rules: i. {a.},{a-}, ... ,{am} are regular sets. ii, { } (the empty set of words) is a regular set. iii. {A} (the set containing only the empty word) is a regular set. iv, If L, and L, are regular sets, then so is Lj"Lz. v, If L, and L, are regular sets, then so is L, U Lz. vi. If L, is a regular set, then so is Lt. EXAMPLE 6.1 Let I = {a, b, e}. Each of the following languages are regular sets: {A} {a}* {b} U {e} ({a} U {A})' ({b}*) {a}'({b} U {e}) { }* {b'A} {e}'{ } The multitude of set brackets in these expressions is somewhat undesirable; we now present a common shorthand notation to represent such sets. Expressions like {a}* will simply be written as a", and {a}'{b} will be shortened to abo The notation we wish to use can be formally defined in the following recursive manner. V Definition 6.2. Let I = {a., az, ... ,am}be an alphabet. A regular expression over I is a sequence of symbols formed by repeated application of the following rules: i. a., az, ... ,am are all regular expressions, representing the regular sets {a.}, {az}, ... ,{am}, respectively. li, 0is a regular expression representing { }. iii. E is a regular expression representing {A}. lv, If R, and R, are regular expressions corresponding to the sets L, and Lz, then (R, .Rz) is a regular expression representing the set L, .Lz. v. If R, and R, are regular expressions corresponding to the sets L, and Lz, then (R, U Rz) is a regular expression representing the set L, U Lz. vi. If R, is a regular expression corresponding to the set Lj, then (Rj)" is a regular expression representing the set Lt. EXAMPLE 6.2 Let I ={a, b, e}. The regular sets in Example 6.1 can be represented by the following regular expressions: E (a)* (b U c) ((a U E)' (b) *) (a*(b U c)) (0)* (b-e) (e'0) 180 Regular Expressions Chap. 6 Note that each expression consists of the "basic building blocks" given by 6.2i through 6.2iii and are connected by the operators U, . and * according to rules 6.2iv through 6.2vi. Each expression is intended to denote a particular language over I. Such representations of languages are by no means unique. For example, (a*(b U c)) and (Ia-b) U (a*c)) both represent the same set, {ab, ac}. Similarly, (b-e) and b both represent {b}, The intention of the parentheses is to prevent ambiguity; a-b U c could mean (a*(b U c)) or (Ia-b) U c), and the difference is important: the first expression represents {ab, ac}, while the second represents {ab, c}, which are obviously different languages. To ease the burden of all these parentheses, we will adopt the following simplifying conventions. Notational Convention: The precedence of the operators, from highest to lowest, shall be *, " U. When writing a regular expression, parentheses that conform to this hierarchy may be omitted. In particular, the outermost set of parentheses can always be omitted. Juxtaposition may be used in place of the concatenation symbol (. ). EXAMPLE 6.3 Thus, a*b U c will be taken to mean (Ia-b) U c), not (a-Ib U c)), since* has precedence over U. Redundant parentheses that are implied by the precedence rules can be eliminated, and thus «(a*b) U cj-d) can be written as (ab U c)d. Notice that b U c* represents (b U (c*)), not (b U c)*. Kleene closure therefore behaves much like exponentiation does in ordinary algebraic expressions in that it is given precedence over the other operators. Concatenation and union behave much like the algebraic operators multiplication and addition, respectively. Indeed, some texts use + instead of U for union; the symbol for concatenation already agrees with that for multiplication ( .), and we will likewise allow the symbol to be omitted in favor of juxtaposition. The constants 0 and e behave much like the numbers 0 and 1 do in algebra. The common identities x + 0 = x, x*1 = x and x *0 = 0 have parallels in language theory (see Lemma 6.1). Indeed, 0 is the identity for union and e is the identity for concatenation. Thus far we have been very careful to distinguish between the name of an object and the object itself. In algebra, we are used to saying that the symbol 4 equals the string of symbols (that is, the word) 20 -;5; we really mean that both names refer to the same object, the concept we generally call the number/our. (You should be able to think of many more strings that are commonly used as a name for this number, for example, 1111, IV, and 1002, We will be equally inexact here, writing a*(b U c) = (a-b) U (a-c), This will be taken to mean that the sets represented by the two expressions are equal (as is the case here; both equal {ab, ac}) and will not be construed to mean that the two expressions themselves are identical (which is clearly not the case here; the right-hand side has more as, more parentheses, and more concatenation symbols). Sec. 6.1 Algebra of Regular Expressions 181 'il Definition 6.3. Let R be a regular expression. The language represented by R is formally denoted by L(R). Two regular expressions R I and R, will be said to be equivalent if the sets represented by the two expressions are equal, and we will write RI=Rz. Ll Thus, R I and R, are equivalent if L(RI ) = L(Rz), but this is commonly abbreviated R I = Rz. The word "equivalent" has been seen in three different contexts so far: there are equivalent DFAs, equivalent NDFAs, and now equivalent regular expressions. In each case, the intent has been to equate constructs that are associated with the same language. Now that the idea of equality (equivalence) has been established, some general identities can be outlined. The properties given in Lemma 6.1 follow directly from the definitions of the operators. 'il Lemma 6.1. Let I be an alphabet, and let Rl> Rz, and R3 be regular expressions. Then: (a) R I u0= R I (b) RI'E = RI = E*RI (c) RI*0=0=0*RI (d) R I U Rz = Rz U R I (e) R I URI = RI (0 R I U (Rz U R3) = (R I U Rz) U R3 (g) RdRz*R3) = (R I*Rz)*R3 (h) RI* (Rz U R3) = (R I.Rz) U (R I.R3) (i) E*=E (j) 0* =.E (k) (R I U R z)* = (Ri U Ri)* (I) (R I U R z)* = (Ri-Ri)* (m) (R]")" = Ri (n) (R I)*' (R]') = Ri Furthermore, there are examples of sets for which: (b') R I U E +R I (d') RI'Rz +Rz'RI (e') RI*RI +R I (h') RI U (R z*R3) +- (R I U Rz)*(RI U R3) (k') (RI'Rz)* +- (Ri'Ri)* (I') (RI'Rz)* +- (Ri URi)* Proof. Property (h) will be proved here. The remainder are left as exercises. 182 Regular Expressions Chap. 6 w E RIO (R2 U R3) ~ (by definition of .) (3x,y)(y E (R2 U R3) A x E R, A w = x .y) ~(by definition of U) (3x,y)«y E R2 V Y E R3) A (x E R, A w = x .y)) ~(by the distributive law) (3x,y)«(y E R 2) A (x E R 1 A w = x oy)) V «y E R3) A (x E R1 A w = x oy)))~ (by definition of .) (3x,y)«w = x 0y E R1*'R2) V (w = x-y E R1 •R3) ) ~ (by definition of U) w E (Re R2) U (R 1•R3) Note that identity (c) in Lemma 6.1 implies that {a, b}'0 = 0, which follows immediately from the definition of concatenation. If w E {a, b}'0, then w would have to be of the form x 'y, where x E {a, b} and y E 0; there are clearly no valid choices for y, so {a, b}'0 is empty. 6.2 REGULAR SETS AS FAD LANGUAGES Armed with the constructs and properties discussed in the first section, we will now consider what types of languages can actually be defined by regular expressions. How general is this method of expressing sets of words? Can the FAD languages be represented by regular expressions? (Yes). Can all programming languages be represented by regular expressions? (No). Are regular sets always finite automaton definable languages? (Yes). We begin by addressing this last question. V Definition 6.4. Let l be an alphabet. ~~ is defined to be the set of all regular sets over l. 11 The first question to be considered is, Can every regular set be recognized by a DFA? That is, is ~~ ~ 2lJ~? It is clear that the "basic building blocks" are recognizable. Figure 6.1 shows three NDFAs that accept { }, {A}, and {c], respectively. Recalling the constructions outlined in Chapter 5, it is easy to see how to combine these "basic machines" into machines that will accept expressions involving the operators U, " and ". EXAMPLE 6.4 An NDFA that accepts a U b (as suggested by the proof of Theorem 5.2) is shown in Figure 6.2. Note that it is composed of the basic building blocks for the letters a and b, as suggested by the constructions in Figure 6.1. Sec. 6.2 Regular Sets as FAD Languages ~CL(C)~{c} Figure 6.1 NDFAs which recognize regular expressions with zero operators 183 Figure 6.2 The NDFA discussed in Example 6.4 EXAMPLE 6.5 An NDFA that accepts (a U b)" is shown in Figure 6.3. The automaton given in Figure 6.2 for (a U b) is modified as suggested by the proof of Theorem 5.5 to produce the Kleene closure of (a U b). Recall that the "extra" state qo was added to ensure that A. is accepted by the new machine. a a b b Figure 6.3 The NDFA discussed in Example 6.5 184 Regular Expressions Chap. 6 EXAMPLE 6.6 An NDFA that accepts c*(a U b)" (as suggested by the proof of Theorem 5.4) is shown in Figure 6.4. a b Figure 6.4 The NDFA discussed in Example 6.6 Note that in this last example qo, to, and So are disconnected states, and r., Sb and t1 could be coalesced into a single state. The resulting machines are not advertised to be efficient; the main point is that they can be built. The techniques illustrated above are used to prove the following lemma. V Lemma 6.2. Let I be an alphabet and let R be a regular set over I. Then there is a DFA that accepts R. Proof. The proof is by induction on the number of operators in the regular expression describing R (see the exercises). Note that Figure 6.1 effectively illustrates the basis step: Those regular expressions with zero operators (0, E, a., a2, ... ,am) do indeed correspond to FAD languages. This covers sets generated by rules i, ii, and iii of Definition 6.2. For sets corresponding to regular expressions with a positive number of operators, the outermost operator can be identified, and it will be either', U, or *, corresponding to an application of rule iv, v, or vi. The induction assumption will guarantee that the subexpressions used by the outermost operator have corresponding DFAs. Theorems 5.2,5.4, and 5.5 can then be invoked to argue that the entire expression has a corresponding DFA. d V Corollary 6.1. Let I be an alphabet. then ~I ~ 21lI . Proof. The proof follows immediately from Lemma 6.2. Sec. 6.3 Language Equations 185 Since we are assured that every regular set can be accepted by a finite automaton, the collection of regular sets is clearly contained in the set of FAD languages. This also means that those languages that cannot be represented by a DFA (that is, those contained in.N'~) have no chance of being represented by a regular expression. 6.3 LANGUAGE EQUATIONS The next question we will address is whether ~~ ~ ~~, that is, whether every FAD language can be represented by a regular expression. The reader is invited to take a sample DFA and try to express the language it accepts by a regular expression. You will probably be able to do it, but only by guesswork and trial and error. Our first question appears to have a much more methodical solution: Given a regular expression, it was a relatively straightforward task to draw a NDFA (and then a DFA); in fact, we have a set of algorithms for doing just that, and we could program a computer to do the task for us. This second question does not seem to have an obvious algorithm connected with it, and we will have to attack the problem using a new concept: language equations. In algebra, we are used to algebraic equations such as 3x + 7 = 19. Recall that a solution to this equation is a numerical value for x that will make the equation true, that is, make both sides equal. In the above example, there is only one choice for x, the unique solution 4. Equations can have two different solutions, like x 2 = 9, no solutions, like x 2 = -9, or an infinite number of solutions, like 2(x + 3) = x + 6 + x. In a similar way, set equations can be solved, such as {a, b, c}= {a, b}U X. Here X represents a set, and we are again looking for a value for X that will make the equation true; an obvious choice is X = {c}, but there are other choices, like X = [b, c}(since {a, b, c}= {a, b}U{b, en. Such equations may likewise have no solutions, like XU {b} = {a, c}, or an infinite number of solutions, such as XU {b} = X (what sorts of sets satisfy this last equation?). We wish to look at set equations where the sets are actually sets of strings, that is, language equations. The type of equation in which we are most interested has one and only one solution, as outlined in the next theorem. It is very similar in form and spirit to the theorem in algebra that says "For any numbers a and b, where a =/= 0, the equation ax = b has a unique solution given by x = b -:a." V Theorem 6.1. Let I be an alphabet. Let E and A be any subsets of I ". Then the language equation X = E U A* X admits the solution X = A*.E. Any other solution Y must contain A*.E. Furthermore, if A $. A, X = A*.E is the unique solution. Proof. First note that the set A*E is indeed a solution to this equation, since A*E = E U A *(A*E) (see the exercises). Now assume that some set Y is a solution to this equation, and let us investigate some of the properties that Y must have: If Y is a solution, then 186 Regular Expressions Y = E U A* Y~ (by definition of U) E~Y 1\ A*Y~Y~(ifE~Y, thenA*E~A*Y) A* E ~ A* Y ~ Y~ (by substitution) A*A*E~A*A*Y c A*Y ~Y~(by induction) ('fin E N)(An •E c Y)~ (by definition of A *) A**E~Y Chap. 6 Thus, every solution must contain all of A *E, and A*E is in this sense the smallest solution. This is true regardless of whether or not A belongs to A. Now let us assume that A$. A and that we have a solution W that is actually "bigger" than A *E; we will show that this is a contradiction, and thus all solutions must look exactly like A *E. If W is a solution, W =F A *E, then there must be some elements in the set W A *E; choose a string of minimal length from among these elements and call it z. Thus z E Wand z f!. A *E, and since E ~ A *E (why?), z f!. E. Since W is a solution, we have W = E UA*W~ (since z E Wand it cannot be in the E part) z EA*W~ (by definition of .) (3x E A, 3y E W) ;) z = x-y ~ (by definition of I I) Iz I= [xI+ Iy I~ (since A$. A and x E A, so x =/= A and [xI> 0) lyj<lzl Note that y cannot belong to A *E (if yEA*E, then, since x E A, z( = x .y) E A*(A*E) ~ A *E, which means that z E A *E, and we started by assuming that z f!. A *E); since yEW, we have yEW A *E, and we have produced a string shorter than z, which belongs to W A *E. This is the contradiction we were looking for, and we can conclude that it is impossible for a solution W to be larger than A *E. Since we have already shown that no solution can be smaller than A *E, we now know that the only solution is exactly A *E. a EXAMPLE 6.7 X ={b, c}U [a} X does indeed have a solution; X can equal [a}" '{b, c}. Note also that this is the only solution (verify, for example, that X ={a]" '{c} is not a solution). The equation Z = {b, c}U {a, A}'Z has several solutions; among them are Z = [a]" {b, c}and Z = {a, b, c}", It is instructive to explicitly list the first few elements of {a}* '{b, c}and begin to check the validity of the solution to the first equation. If Y is a solution, then the two sides of the equation Y = [b, c}U{a}'Y must be equal. Since both band c appear on Sec. 6.3 Language Equations 187 the right-hand side they must also be on the left-hand side, which clearly means that they have to be in Y. Once b is known to be in Y, it will give rise to a term on the right-hand side due to the presence of {a}*Y. Thus, a-b must also be found on the left-hand side and therefore is in Y, and so on. The resulting sequence of implications parallels the first part of the proof of Theorem 6.1. To see intuitively why no string other than those found in [a]" '{b, e} may belong to a solution for X = {b, c] U {a}'X, consider a string such as aa. If this were to belong to X, then it would appear on the left-hand side and therefore would have to appear on the right-hand side as well if the two sides were to indeed be equal. On the right-hand side are just the two components, {b, e}and [a}X. aa is clearly not in {b, c], so it must be in {aj-X, which does seem plausible; all that is necessary is for a to be in X, and then aa will belong to [a} X. If a is in X, though, it must also appear on the left-hand side, and so a must be on the right-hand side as well. Again, a is not in {b,e}, so it must be in [aj-X, This can happen only if Abelongs to X so that a-x will belong to {a}'X. This implies that A must now show up on both sides, and this leads to a contradiction: A cannot be on the right-hand side since A clearly is not in {b;c}, and it cannot belong to [aj-X either, since all these words begin with an a. This contradiction shows why aa cannot be part of any solution X. This example illustrates the basic nature of these types of equations: for words that are not in [a]" *{b, e}, the inclusion of that word in the solution leads to the inclusion of shorter and shorter strings, which eventually leads to a contradiction. This property was exploited in the second half of the proof of Theorem 6.1. Rather than finding shorter and shorter strings, though, it was assumed we already had the shortest, and we showed that there had to be a still shorter one; this led to the desired contradiction more directly. Our main goal will be to solve systems of language equations, since the relationships between the terminal sets of an automaton can be described by such a system. Systems of language equations are similar in form and spirit to systems of algebraic equations, such as 3Xl +X2 = 10 Xl -X2 = 2 which has the unique solution Xl = 3,X2 = 1. We will look at systems of language equations such as Xl=EUa*XlUb,X2 X2 = 0U b* Xl U 0*X2 which has the (unique) solution Xl=(aUbb)*,X2=b*(aUbb)*. Checking that this is a solution entails verifying that both equations are satisifed if these expressions are substituted for the variables Xl and X2 • The solution of such systems parallels the solution of algebraic equations. For example, the system 188 Regular Expressions Chap. 6 3XI +X2 = 10 Xl -X2 = 2 can be solved by treating the second statement as an equation in just the variable Xz and solving as indicated by the algebraic theorem "For any numbers a and b, where a =1= 0, the equation ax = b has a unique solution given by X = b -:a." The second statement can be written as (-1)xz = 2 Xz, which then admits the solution Xz = (2 Xl) -:- (-1) or Xz = Xl 2. This solution can be inserted into the first equation to eliminate Xz and form an equation solely in Xl' Terms can be regrouped and the algebraic theorem can be applied to find Xl. We would have 3XI + Xz = 10 which becomes or 4XI 2 = 10 or or Xl = 12 -:4 yielding xl=3 This value of Xl can be back-substituted to find the unique solution for Xz: Xz= XI 2 = 3 2 = 1. Essentially, the same technique can be applied to any two equations in two unknowns, and formulas can be developed that predict the coefficients for the reduced set of equations. Consider the generalized system of algebraic equations with unknowns Xl and Xz, constant terms EI and Ez, and coefficients An, A12, A2l , and A zz: Anxi + A12xz = EI AZIXI + Azzxz = Ez Recall that the appropriate formulas for reducing this to a single equation of the form Anxi = EI, where the new coefficients An and EI can be calculated as EI = EIAzzEzAIZ An = AnAzz AIZAzl A similar technique can be used to eliminate variables when there is a larger number of equations in the system. The following theorem makes similar predictions of the new coefficients for language equations. Sec, 6.3 Language Equations 189 V Theorem 6.2. Let n 2= 2 and consider the system of equations in the unknowns Xl, Xz, ' , . .X, given by Xl = E 1U A llX1U A 12XZU U A 1(n-1)Xn1 U A 1nXn x, = Ez U A Z1X1U AzzXz U U A Z(n-1)Xn1 U A 2nXn X n-1 = En-1U A(n-1)lX1 U A(n-1)ZXZU . , . U A(n-1)(n-1)Xn1 U A(n-1)n X, X, = En U A n1X1 U AnzXz U . , . U A n(n-1)Xn1 U AnnXn in which (Vi,j E {I, 2, ... ,n})(A $. A ij) , a. This system has a unique solution. b. Define Ei = E, U (Ain,A:n ,En) for all i = 1,2,., . ,n -1 and Aj = A ij U (Ain ,A:n •A nj) for all i,j = 1,2, ... ,n 1. The solution of the original set of equations will agree with the solution of the following set of n 1 equations in the unknowns Xj , Xz, ' , . ,Xn 1: X, = E1 U AU x1 U A12xZ U ' , . U A 1(n-1)Xn1 x, = Ez U A Z1X1U Azzxz U*** U A Z(n-l)Xn-1 X n1 =En1U A(n 1)lX1 U A(n-l)zXZU ' , . U A(n-1)(n-1)Xn-1 c. Once the solution to the above n 1 equations in (b) is known, that solution can be used to find the remaining unknown: x,= A:n *(EnU A n1X1 U AnzXz U,** U A n(n-1)Xn1) Proof. The proof hinges on the repeated application of Theorem 6.1. The last of the n equations, X, = En U A n1X1 U AnzXz U ' .. U A n(n-1)Xn1 U AnnXn can be thought of as an equation in the one unknown X, with a coefficient of Ann for X n, and the remainder of the expression a "constant" term not involving X n • The following parenthetical grouping illustrates this viewpoint: x, = (En U A n1X1 U AnzXz U*** U A n(n-l)Xn-1) U AnnXn Note that for any subscript k, if A nk does not contain A,neither will AnkXk, Theorem 6.1 can therefore be applied, to the one equation in the one unknown Xn , with coefficients 190 Regular Expressions Chap. 6 and A = Ann. The solution, A*E, is exactly as given by part (c) above: X n= A:n*(EnU AnlXI U AnzXzU' .. U An(n-I)Xn-l) or X, = A:n.EnU A:n*AnlXI U A:n*AnzXzU ... U A:n*An(n-I)Xn-l) If there was a unique solution for the terms X, through Xnl, then Theorem 6.1 would guarantee a unique solution for X n , too. The solution for X, can be substituted for X, in each of the other n -1 equations. If the kth equation is represented by x, = EkU AklXI U AkZXZU ... U AknXn then the substitution will yield X, = EkU AklXI U AkZXZU ... U (Akn*(A:n'EnU A:n*AnIXI U A:n.AnzXzU'" U A:n*An(n-I)Xn-l)) By using the distributive law, this becomes X, = EkU AklXI U AkZXZU ... U(Akn*A:n-B, UAkn*A:n.AnlXI UAkn*A:n*AnzXzU*** UAkn*A:n.An(n-l)Xn-l) Collecting like terms yields x, = (EkU Akn*A:n*En)U (AkIXI U Akn*A:n*AnIXI) U (AkZXZU Akn,A:n.AnzXz) U*** U (Ak(n-I)Xn-1 U Akn*A:n*An(n-l)Xn-l) or X, = (E, U A kn*A:n.En) U (Akl U A kn*A:n.Anl)XI U (AkZU Akn*A:n.Anz)XzU'" U (Ak(n-l) U Akn*A:n*An(n-l))Xn-1 The constant term in this equation is (E, U A kn*A:n.En), which is exactly the formula given for Ek in part (b). The coefficient for X, is seen to be (Akl U Akn,A:n.Ani), while the coefficient for X, is (AkZU A kn*A:n.Ad, and so on. The coefficient for Xj would then be Akj= A kj U (Akn*A:n.A nj), which also agrees with the formula given in part (b). This is why the solution of the original set of equations agrees with the solution of the set of n 1 equations given in part (b). Part (a) is proved by induction on n: the method outlined above can be repeated on the new set of n 1 equations to eliminate Xn l, and so on, until one equation in the one unknown X, is obtained. Theorem 6.1 will guarantee a unique solution for Xl, and part (c) can then be used to find the unique solution for Xz, and so on . .6. Sec. 6.3 Language Equations 191 EXAMPLE 6.8 Consider the system defined before, where Xl=EUa,XlUb*Xz Xz=0Ub*Xlu0*Xz The proof of Theorem 6.2 implies the solution for X, will agree with the solution to the one-variable equation X, = El U AnXI, where E l = E, U (Alz*Aiz*Ez) = EU (b*0**0) = EU (b*E*0) = EU 0 = E, and An = An U (A l2 • Aiz' Azl) = a U (b'0* *b) = a U (b-e-b) = aU bb. Thus we have X, = EU (a U bb). XI, which by Theorem 6.1 has the (unique) solution Xl = AilEl = (a U bb)**E. Substituting this into the second equation yields X, = 0 U b*(a U bb)* U 0*Xz, which by Theorem 6.1 has the (unique) solution X, = 0**(b* (a U bb)*) = b*(a U bb)". Note that this expression for X, could also be found by applying the back-substitution formula given in the proof of Theorem 6.2. We will now see that the language accepted by a DFA can be equated with the solution of a set of language equations, which will allow us to prove the following important theorem. V Theorem 6.3. Let l be an alphabet and let L be an FAD language over l. Then L is a regular set over l. Proof. If L is FAD, then there exists an n > 0 and a deterministic finite automaton A = <l, [s., Sz, ... ,sn}, s., 8, F> such that L(A) = L. For each i = 1,2, ... ,n, define Xi = {z E l* 18(s;, z) E F}; that is, Xi is the set of all strings that, when starting at state s., reach a final state in A. Each Xi then represents the terminal set T(A, Si) defined in Chapter 3. Since Sl is the start state of this machine, it should be clear that X, =L(A) = L. Define and if s,$. F for i = 1,2, ... ,n if SiE F Ai} = U a ae:EA&(s"a)=sj for i,j = 1,2, ... ,n That is, Ai} represents the set of all letters that cause a transition from state s, to state sf. Notice that since A. $.l none of the sets Ai} contain the empty string, and therefore by Theorem 6.2, there is a unique solution to the system: 192 Regular Expressions x, = E l U AllXl U A1ZXZU U AlnXn x, = EzU AZlXl U AzzXzU U AznXn Chap. 6 However, these equations exactly describe the relationships between the terminal sets denoted by Xl> Xz, ... , X, at the beginning of this proof (compare with Example 6.11), and hence the solution will represent exactly those quantities. In particular, the solution for X, will be a regular expression for L(A), that is, for L. ~ EXAMPLE 6.9 Consider the DFA B given by the diagram in Figure 6.5, which accepts all strings with an odd number of bs over {a, b], This machine generates the following system of language equations: x, = 0U aX l U ex, X, = E U bXl U aXz which will have the same solution for X, as the equation x, = E,l U .AllXl where and Figure 6.5 The DFA discussed in Example 6.9 Theorem 6.1 predicts the solution for X, to be (a U (b*a**b))**b*a*. It can be verified that this solution describes all those strings with an odd number of bs. X, is indeed the terminal set for s., that is, T(B, s.). Likewise, finding X, yields all strings with an even number of bs, which is the terminal set for Sz, T(B, sz). Nondeterministic finite automata can likewise be represented by language equations, and without the intermediate step of applying Definition 4.5 to acquire a deterministic equivalent. The sets E; and A ij retain essentially the same definitions Sec. 6.3 Language Equations 193 as before: E, is • or 0, depending on whether or not s, is a final state, and A ij again represents exactly the set of all letters that cause a transition from state s, to state Sj' This definition requires a minor cosmetic change for NDFAs, since the state transition function is slightly different: A ij = U a 8 E I II SjE 8(s, ,8) for i,j = 1,2, ... ,n An n-state NDFA therefore gives rise to n equation in n unknowns, which can be solved as outlined by Theorems 6.2 and 6.1. While Definition 4.5 need not be used as a conversion step, an NDFA with A-moves will have to be transformed into an equivalent NDFA without A-moves. An appropriate definition for A ij could be given for the original NDFA, and while the resulting equations would describe the relation between the terminal sets, some A ij set might then contain Aas a member. There are systems of equations arising in this manner that do not have unique solutions (see the exercises). For an NDFA with A-moves, Definition 4.9 could be applied to find an equivalent NDFA without A-moves, since Theorems 6.2 and 6.1 specifically prohibit the empty string as a part of a coefficient. However, if the ambiguous equations generated from a machine with A-moves were solved as suggested in Theorems 6.1 and 6.2, a "minimal" solution would be obtained that would correspond to the desired answer. EXAMPLE 6.10 Consider again the system described in Example 6.8. This can be thought of as the set of language equations corresponding to the NDFA called 8, illustrated in Figure 6.6a. Note that L (8) is indeed the given solution: L (8) = X, = (a U bb)*. Notice the a (a) a (b) a (c) Figure 6.6 (a) The NDFA 8 discussed in Example 6.10 (b) The NDFA C discussed in Example 6.10 (c) The NDFA D discussed in Example 6.10 194 Regular Expressions Chap. 6 similarity between B and the machine C shown in Figure 6.6b, which has Sz as the start state. Note that L(C) is given by X, = b*(a U bb)", where X, was the other part of the solution given in Example 6.8 (verify this). Finally, consider a similar machine D in Figure 6.6c with both s, and Sz as start states. Can you quickly write a regular expression that describes the language accepted by D? EXAMPLE 6.11 Regular expressions for machines with more than two states can be found by repeated application of the technique described in Theorem 6.2. For example, consider the three-state DFA given in Figure 6.7. The solution for this three-state machine will be explored shortly. We begin by illustrating the natural relationships between the terminal sets described in Theorem 6.3. First let us note that the language accepted by this machine includes: b Figure 6.7 The DFA discussed in Example 6.11 1. All strings that end with b. 2. Strings that contain no bs, but for which [x], -Ixlc is a multiple of 3. 3. Strings that are concatenations of type (1) and type (2) strings. According to Theorem 6.3, the equations for this machine are x, = E U ex,U aXzU cX3 x, = 0 U (b U c)X j U0Xz U aX3 X3 = 0 U (a U b)X j Uex, U 0X3 which can be simplified to x, = E U sx, U aXzU cX3 Xz=(bUc)X jUaX3 X3 = (a U b)X j Uex, and rewritten as Sec. 6.3 Language Equations x, = E Uex, U aXzU eX3 x, = ux, U ex, U aX3 X3= »x,usx, U eXz 195 The equation for Xl admits the following interpretation; recalling that Xl represents all the strings that reach a final state when starting from Sl> we see that these can be broken up into four distinct classes: 1. Strings of length 0: (E). 2. Strings that start with a (and note that a moves the current state from s, to sz) and then proceed (from sz) to afinal state: (aXz). 3. Strings that start with b and then proceed to a final state: (bXl). 4. Strings that start with e and then proceed to a final state: (eX3). The union of these four classes should equal Xl> which is exactly what the first equation states. X? = bXl U eXl U aX3 can be interpreted similarly; E does not appear in this equation because there is no way to reach a final state from Sz if no letters are processed. If at least one letter is processed, then that first letter is an a, b, or e. If it is a, then we move from state Sz to S3, and the remainder of the string must take us to a final state from S3 (that is, the remainder must belong to X3). Strings that begin with an a and are followed by a string from X3 can easily be described by a*X3. Similarly, strings that start with b or e must move from Sz to s., and then be followed by a string from Xl' These strings are described by b-X, and c-Xi. The three cases for reaching a final state from Sz that have just been described are exhaustive (and mutually exclusive), and so their union should equal all of Xz. This is exactly the relation expressed by the second equation, Xz = bXl U eXl U aX3. The last equation admits a similar interpretation. None of the above observations are necessary to actually solve the system! The preceding discussion is intended to illustrate that the natural relationships between the terminal sets described by Theorem 6.3 and the correspondences we have so laboriously developed here are succinctly predicted by the language equations. Once the equations are written down, we can simply apply Theorem 6.2 and reduce to a system with only two unknowns. We have An = b, A2l =bUe, A 3l = a U b, from which we can compute Ez=0, A12 = a, A zz= 0, A13=e A 23 = a 196 Regular Expressions Chap. 6 E2 = 0 A12= a U c0*c = a U cc A22= 0 U a0*c = ac E1= E1U A13Aj3E3 = EU c0*0 = E, An = An U A13Aj3A31 = b U c0*(a U b) = b U c(a U b), A21= b U c U a0*(a U b) = b U c U a(a U b), which gives the following system of equations: X, = EU (b U c(a U b))X1U (aU cc)X2 X2= 0 U ((b U c) U a(a U b))X1U acX2 These two equations can be reduced to a single equation by applying Theorem 6.2 again: £1 = E1U A12(A22)*E2=~ (a U cc)*(ac)**0 = ~ A 1\ "" " . An = An U A1zCAd*A21= b U c(a U b) U (a U cc)(ac)*((b U c) U a(a U b)) which yields one equation in one unknown whose solution is A .~ X, = (A n)*E1= (b U c(a U b) U (a U cc)(ac)*((b U c) U a(a U b)))*'E Since Sl was the only start state, the regular expression given by X, should describe the language accepted by the original three-state automaton. Returning to our observations above, this expression ~an be reconciled with our intuitive notion of what the solution "should" look like. An can be expanded to yield the following form: *An = b U ca U cb U a(ac)*b U a(ac)*c U a(ac)*ab U a(ac)*aa U cc(ac)*b U cc(ac)*c U cc(ac)*ab U cc(ac)*aa Observe that each of the 11 subexpressions consists of strings that (1) end with b, or (2) contain no bs, but for which Ix la -Ix Ie is a multiple of 3. Hence the Kleene closure of this expression, which represents the language accepted by this machine, does indeed agree with our notion of what X, should describe. Since Sl is also the only final state in this example, it is interesting to note that *each of the subexpressions of An describes strings that, when starting at Sl in the automaton, return you to Sl again for the first time (examine the diagram and verify this ). EXAMPLE 6.12 Consider the automaton shown in Figure 6.8. It is similar to the one in Example 6.11, but it now gives rise to four equations in four unknowns. As these equations are solved, the final An coefficient for X, will again describe strings that, when startiqg at Sl in the automaton, return you to Sl again for the first time; it will ag[ee with An in Example 6.11. The final constant term associated with X, (that is, E1), will represent all those strings that deposit you in a final ~tate froms, without ever returning to Sl. In this automaton, this will be given by E1= de". Ai1E1 therefore represents strings that go from s, back to s, any number of times, followed by a string that leaves Sl (for the last time) for a final state. Sec. 6.3 Language Equations Figure 6.8 The DFA discussed in Example 6.12 197 In general, the final coefficient and constant terms can always be interpreted in this manner. In Example 6.11, the only way to reach a final state from Sl and avoid having to rfturn again to Sl was to not leave in the first place; this was reflected by the fact that EI = E. EXAMPLE 6.13 Consider the automaton illustrated in Figure 6.9, which is identical to the DFA in Example 6.11 except for the placement of the final state. Even though the initial ~ system of three equations is now pifferent, we can expect An to compute to the same expression as before. Since E I is supposed to represent all those strings that deposit you in a final state from Sl without ever returning to S1> one should be able to predict that the new final constant term will look like EI = a(ac)* U c(ca)*c. An expression for the language recognized by this automaton would then be given by Figure 6.9 The DFA discussed in Example 6.13 XI = (A.n)*E I = (b U c(a U b) U (a U cc)(ac)*«b U c) U a(a U b)))*'(a(ac)* U c(ca)*c) It may often be convenient to eliminate a variable other than the one that is numerically last. This can be accomplished by appropriately renumbering the unknowns and applying Theorem 6.2 to the new set of equations. For convenience, we state an analog of Theorem 6.2 that allows the elimination of the mth unknown from a set of n equations in n unknowns. The following lemma agrees with Theorem 6.2 if m = n. 198 Regular Expressions Chap. 6 V Lemma 6.3. Let nand m be positive integers and let m :5 n. Consider the system of n ~ 2 equations in the unknowns Xl, X2, ... , X, given by for k = 1,2, ... , n in which (Vi,j)(A. $. Ai})' The unknown X; can be eliminated from this system to form the following n 1 equations in the unknowns Xl>X2, ... , Xm-l, Xm+l> ... , X n • x, = EkU AklX l U Ak2X2U ... U Ak(m-I)Xm-1 U Ak(m+l) Xm+ l U ... U Aknxn, for k = 1, 2, ... , m 1, m + 1, ... , n where for alIi = 1,2, ... ,m -I,m + 1, ... ,n and foralli,j=I,2, ... ,m l,m+l, ... ,n Furthermore, once the solution to the above n 1 equations is known, that solution can be used to find the remaining unknown: x, = A;:;m *(EmU AmlXl U Am2X2U*** U Am(m-I)Xm-1 U Am(m+I)Xm+ 1 U ... U AmnXn) Proof. The proof follows from a renumbering of the equations given III Theorem 6.2. a A significant reduction in the size of the expressions representing the solutions can often be achieved by carefully choosing the order in which to eliminate the unknowns. This situation can easily arise when solving language equations that correspond to finite automata. For example, consider the DFA illustrated in Figure 6.10. The equations for this machine are given by Figure 6.10 The DFA discussed in Exercise 6.19 Sec. 6.3 Language Equations 199 for all i = 1,2, ... , m 1, m + 1, ... .n x, = 0 U 0X l U (0 U nx, U 0X3 Xz = E U OXI U lXzU 0X3 X3 = 0 U 0Xl U (0 U nx, U 0X3 Using Theorem 6.2 to methodically solve for XI, Xz, and X3 involves eliminating X3 and then eliminating Xz. Theorem 6.1 can then be used to solve for XI, and then the back-substitution rules can be employed to find X, and X3• The regular expressions found in this manner are quite complex. A striking simplification can be made by eliminating X3 and then eliminating X, (instead of Xz). The solution for X, is quite concise, which leads to simple expressions for X, and X3 during the backsubstitution phase (see Exercise 6.19). Let A = <I, {SI, sz, ... , s.}, SI,5, F> be a deterministic finite automaton. We have seen that the relationships between the terminal sets T(A, Si) described in Chapter 3 give rise to a system of equations. Similarly, the initial sets leA, s.) defined in Chapter 2 are also interrelated. Recall that, for a state s., leA, Si) is comprised of strings that, when starting in the start state, lead to the state s.. That is, leA, Si) = {x 15(sI,x) = s.}, The equations we have discussed to this point have been right linear; that is, the unknowns Xi appear to the right of their coefficients. The initial sets for an automaton are also related by a system of equations, but these equations are left linear; the unknowns Y, appear to the left of their coefficients. The solution for sets of left-linear equations parallels that of right-linear systems. V Theorem 6.4. Let nand m be positive integers and let m ::5 n, Consider the system of n === 2 equations in the unknowns YI, Y z, . . . , Y, given by Y k= Ik U YlBkl U Y 2Bk2U'" U YnBkn, for k = 1,2, ... .n in which (Vi,j)(A f/:. Bij). a. The unknown Ym can be eliminated from this system to form the following n 1 equations in the unknowns YI, Y z, . . . , Ym I, Y m+b . . • ,Yn. Y k= t,U YlBkl U Y ZBk2U U Ym-lBk(m-l) U Ym+IBk(m+l) U ... U YnB kn, for k = 1,2, ... ,m 1, m + 1, ,n where t = Ii U (1m ' B';'m' Bim), and Bij = Bij U (BmrB;:',m .Bim), for all i,j = 1,2, ... ,m -1, m + 1, ... , n b. Once the solution to the above n 1 equations is known, that solution can be used to find the remaining unknown: v, = rr, U YlBml U Y 2Bm2U'" U Ym-lBm(m-l) 200 Regular Expressions Chap. 6 c. A single equation YI = II U YIB u has the unique solution YI = IIBtl. Proof. The proof is essentially a mirror image of the proofs given in Theorems 6.1 and 6.2. b.. V Lemma 6.4. Let A = <I, {s!> S2,' .. .s.}, So, 8, F> be an NDFA. For each i = 1,2, ... ,n, let the initial set I (A, s.) = {x 18(s., x) = s.} be denoted by Y i . The unknowns Y!> Y2 , ••• , Yn satisfy a system of n left-linear equations of the form for k = 1,2, ... ,n where the coefficients are given by if Si$ So for i = 1, 2, ... .n if s, E So and Bjj = U a aE:Et\ si E 8(Sj,a) Proof. See the exercises. for i,j = 1,2, ... ,n In contrast to Theorem 6.3, where A jj represented the set of all letters that causes a transition from state s, to state Sj, Bjj represents the set of all letters that causes a transition from state Sj to state s., That is, Bjj = Aji. In the definition in Theorem 6.3, E, represented the set of all strings of length zero that can reach final states from s., Compare this with the definition of I, above, which represents the set of all strings of length zero that can reach Sj from a start state. 6.4 FAD LANGUAGES AS REGULAR SETS; CLOSURE PROPERTIES The technique outlined by Theorems 6.1,6.2, and 6.3 provide the second half of the correspondence between regular sets and FAD languages. As a consequence, regular expressions and automata characterize exactly the same class of languages. V Corollary 6.2. Let I be an alphabet. Then ~:E k ~:E' Proof. The proof follows immediately from Theorem 6.3. V Theorem 6.5: Kleene's Theorem. Let I be an alphabet. Then ~:E = ~:E' Proof. The proof follows immediately from Corollaries 6.1 and 6.2. Sec. 6.4 FAD Languages as Regular Sets; Closure Properties 201 Thus the terms FAD language and regular set can be used interchangeably, since languages accepted by finite automata can be described by regular expressions, and vice versa. Such languages are often referred to as regular languages. The correspondence will allow, for example, the pumping lemma to be invoked to justify that certain languages cannot be represented by any regular expression. ~I is therefore closed under every operator for which ~I is closed. We have now seen two representations for FAD languages, and a third will be presented in Chapter 8. Since there are effective algorithms for switching from one representation to another, we may use whichever vehicle is most convenient to describe a language or prove properties about regular languages. For example, we may use whichever concept best lends itself to the proof of closure properties. The justification that ~I is closed under union follows immediately from Definition 6.1; much more effort was required in Chapter 5 to prove that the union of two languages represented by DFAs could be represented by another DFA. On the other hand, attempting to justify closure under complementation by using regular expressions is an exercise in frustration. We will now see that closure under substitution is conveniently proved via regular expressions. A substitution is similar to a language homomorphism (Definition 5.8), in which letters were replaced by single words. Substitutions will denote the methodical replacement of the individual letters within a regular expression with sets of words. The only restriction on these sets is that they must also be regular expressions, though not necessarily over the same alphabet. V Definition 6.5. Let I = [a., a2, ... ,am}be an alphabet and let r be a second alphabet. Given regular expressions Rl> R2, ... .R, over I', define a regular set substitution s: I~ p(f*) by s (a.) = R, for each i = 1,2, ... ,m, which can be extended to s: I *~ p(f*) by S(A) = E and (Va E I)(Vx E I*)(s(a*x) = s(a)*s(x)) s can be further extended to operate on a language L k I * by defining s(L) = U s(z) zEL In this context, s: p(I*)~ p(f*). Ll EXAMPLE 6.14 Let I = {O} and r ={a, b}. Define s(O) = (a U b)*(a U b). From the recursive definition, s(OO) = (a U b)*(a U b)*(a U b)*(a U b). Furthermore, the language s(O*) represents all even-length strings over {a, b}. The definition of s(L) for a language L allows the domain of the substitution to be extended all the way to s: p(I*)~ p(f*). It can be proven that the image of ~I 202 Regular Expressions Chap. 6 under s is contained in (!}tr (see the exercises); however, the image of .N'I, under s is not completely contained in .N'r. In Example 6.14, the language 0* was regular and so was its image under S. Neither of the sets described in the second example were regular. It is possible to start with a nonregular set and define a substitution that produces a regular set (see Lemma 6.5), but it is impossible for the image of a regular set to avoid being regular, as shown by the next theorem. yo Theorem 6.6. Let ~ be an alphabet, and let s: ~~ p(~*) be a substitution. Then (!}tI, is closed under S. Proof. Choose an arbitrary regular expression R over S. We must show that s(R) represents a regular set. R is an expression made up of the letters in ~ and the characters (, ), " U, and ". Form the new expression R' by replacing each letter a by sea). R' is then clearly another regular expression over S, In fact, it can be shown that R' represents exactly the words in s(R); this is formally accomplished by inducting on the number of operators in the expression R. To prove this, one must argue that this substitution correspondence is preserved by each of the six rules defining regular expressions. The basis step of the induction involves all regular expressions with zero operators,that is, those defined by the first three rules for generating a regular expression. i, The substitution corresponding to any single letter a, is a regular expression corresponding to s(ai)' since, by definition of S, sea;) = sea;). ii. The substitution corresponding to 0 is a regular expression corresponding to s(0), since, by definition of S, s(0) = 0. iii. The substitution corresponding to E is a regular expression corresponding to seA), since, by definition of S, seA) = E. The inductive step requires an argument that the correspondence is preserved whenever another of the three operators is introduced to form a more complex expression. These assertions involve the final three rules for generating regular expressions. iv. If R, and R, are regular expressions, then the substitution corresponding to (R, .Rz) is a regular expression representing the concatenation of the two corresponding substitutions. That is, s(Rj ' Rz) = s(Rj ) *s(Rz). v, If R, and R, are regular expressions, then the substitution corresponding to (R, U Rz) is a regular expression representing s(Rj ) U s(Rz). vi. If R, is a regular expression, then the substitution corresponding to (Rj)" is a regular expression representing (s(Rj )) *. Sec. 6.4 FAD Languages as Regular Sets; Closure Properties 203 Each of these three assertions follows immmediately from the definition of substitution and is left as an exercise. The inductive step guarantees that the substitution correspondence is preserved in any regular expression R, regardless of the number of operators in R. Consequently, R' is indeed a regular expression denoting s(R), and m~ is therefore closed under s. ~ The analogous result does not always hold for the nonregular sets. V Lemma 6.5. Let I be an alphabet. a. There are examples of regular set substitutions s: I--,-. p(I*) for which .N'~ is not closed under s. b. There are examples of regular set substitutions t: I--,-.p(I*) for which.N'~ is closed underj. Proof. (a).N'~ is not closed under some substitutions. Let I = {a, b} and define s(a) = (a U b) and s(b) = (a U b). The image of the nonregular set L={xllxl.=lxlb} is the set of even-length words, which is regular. Thus L E .N's but s(L) $. .N'~. (b) .N's is closed under some substitutions. Some substitutions do preserve nonregularity (such as the identity substitution i, since for any language L, t(L) = L). In this case, ('v'L)(L E.N'~ ~ t(L) E .N'~) and therefore N, is closed under t. ~ Note that a substitution in which each R; is a single string then conforms to Definition 5.8 and represents a language homomorphism. V Corollary 6.3. Let I be an alphabet, and let 1jJ: I--,-.I* be a language homomorphism. Then m~ is closed under 1iJ. Proof. The proof follows immediately from Theorem 6.6, since a language homomorphism is a special type of substitution. ~ As in Chapter 5, this result can also be proved by successfully modifying an appropriate DFA, showing that 2lJ~ (= m~) is closed under language homomorphism. It is likewise possible to use machine constructs to show that 2lJ~ is closed under substitution, but this becomes much more complex than the argument given for Theorem 6.6. A third characterization of regular languages will be presented in Chapter 8, affording a choice of three distinct avenues for proving closure properties of (lJt~. 204 EXERCISES Regular Expressions Chap. 6 6.1. Let I = {a, b]. Give (if possible) a regular expression that describes the set of all even-length words in 1*. 6.2. Let I = {a, b}. Give (if possible) a regular expression that describes the set of all words x in 1* for which Ixl~ 2. 6.3. Let I = {a, b}. Give (if possible) a regular expression that describes the set of all words x in 1* for which Ixla = Ixlb. 6.4. Let I = {a, b, c}. Give a regular expression that describes the set of all odd-length words in 1* that do not end in b. 6.5. Let I = {a, b, c}. Give a regular expression that describes the set of all words in 1* that do not contain two consecutive cs. 6.6. Let I = {a, b, c], Give a regular expression that describes the set of all words in 1* that do contain two consecutive cs. 6.7. Let I = {a, b, c}. Give a regular expression that describes the set of all words in 1* that do not contain any cs. 6.8. Let 1= {O, I}. Give, if possible, regular expressions that will describe each of the following languages. Try to write these directly from the descriptions (that is, avoid relying on the nature of the corresponding automata). (a) L1={xllxlmod3=2} (b) L2=I*-{wI3ñBw=al* .. an 1\ an=l} (c) L3 = {y Ilylo > Iylt} 6.9. Let I = {a, b, c}. Give, if possible, regular expressions that will describe each of the following languages. Try to write these directly from the descriptions (that is, avoid relying on the nature of the corresponding automata). (a) L1 = {x I(Ix la is odd) 1\ (Ix Ib is even)} (b) k = {y I(Iy Ie is even) V (Iy Ib is odd)} (c) L3 = {z 1(lzla is even)} (d) L4 = {z IIz Ie is a prime number} (e) L, = {x Iabc is a substring of x} ([) L6 = {x Iacaba is a substring of x} (g) ~={x E{a,b,c}*llxla=Omod3} 6.10. Let I = {a, b, d], Give a regular expression that will describe 'I' = {x E I *I(x begins with d) V (x contains two consecutive bs)}. 6.11. Let I = {a, b, c}. Give a regular expression that will describe cI> = {x E 1* Ievery b in x is immediately followed by c}. 6.12. Let I = {O, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Give a regular expression that will describe r = {x E I *Ithe number represented by x is evenly divisible by 3} = {A, 0, 00, 000, ... ,3,03,003, ... ,6,9,12,15, ...}. 6.13. Let 1= {O, 1,2,3,4,5,6,7,8, 9}. Give a regular expression that will describe K = {x E 1* [the number represented by x is evenly divisible by 5}. Chap. 6 Exercises 205 6.14. Use the exact constructs given in the theorems of Chapter 5 to build a NDFA that accepts b U a*c (refer to Examples 6.4, 6.5, and 6.6). Do not simplify your answer. 6.15. Give examples of sets that demonstrate the following inequalities listed in Lemma 6.1: (a) R I U E =f. RI (b) RI,Rz=f. Rz*RI (c) RI,RI =f. RI (d) RI U (Rz,R3) =f. (RI U Rz)*(RI U R3) (e) (RI*Rz)* =f. (Rj*Ri)* (f) (RI'Rz)* =f. (Rj U Ri)* Find other examples of sets that show the following expressions may be equal under some conditions: (g) RI UE = RI (h) RI*Rz=Rz'RI (even if Ri v Rs) (i) RI.RI = RI (j) RI U (Rz*R3) = (RI U Rz)*(RI U R3) (even if RI =f. Rz=f. R3 =f. RI) (k) (RI*Rz)* = (Rj* Ri) * (even if RI =f. Rz) (I) (RI*Rz)*=(RjURi)* (even if Rj v Rj) 6.16. Prove the equalities listed in Lemma 6.1. 6.17. (a) Consider Theorem 6.1. Find examples of sets A and E that will show that A*,E is not a unique solution if >.. E A. (b) Find examples of setsA and E that will show that A *,E can be the unique solution even if>" EA. 6.18. Solve the following set of language equations for X, and X, over {O, 1}*: x, = (OU I)XI X, = E U lXo U OXI Do you see any relation between these equations and the DFA A in Example 3.4? 6.19. (a) Solve the following set of language equations for XI, Xs, and X3 by eliminating X3 and then eliminating X z. Solve for X, and then back-substitute to find X, and X3 . Note that these equations arise from the automaton in Figure 6.10. x, = 0U0XI U (0 U I)Xz U0X 3 x, = E U OXI U ix, U 0X3 X3 = 0 U 0XI U (0 U I)Xz U 0X3 (b) Rework part (a) by eliminating X3 and then eliminating X, (instead of Xz). (c) How does the solution in part (b) compare to the solution in part (a)? Is one more concise? Are they equivalent? 6.20. Prove Lemma 6.2. [Hint: Let P(m) be the statement that "Every regular expression R with m or fewer operators represents a regular set that is FAD," and induct on m.l 6.21. Let ~ = {a, b, c}. Find all solutions to the language equation X = X U{b}. 6.22. Prove that, for any languages A and E, A*E = E U A* (A *E). 6.23. Give a regular expression that will describe the intersection of the regular sets (ab U b)*a and (ba U a)". 6.24. Develop an algorithm that, when applied to two regular expressions, will generate an expression describing their intersection. 206 Regular Expressions Chap. 6 6.25. Verify by direct substitution that X, = (a U bb)" and X2 = b*(a U bb)" is a solution to Xl=eUa,XlUb,X2 X2 = 0U sx, U 0,X2 6.26. (a) Find L (D) for the machine D described in Example 6.10. (b) Generalize your technique: For a machine A with start states Sip Si2' ••• ,Sim , L(A) is given by ? 6.27. Let I = {a, b}. Give a regular expression that will describe the complement of the regular set (ab U b)*a. 6.28. Develop an algorithm that, when applied to a regular expression, will generate an expression describing the complement. 6.29. Let I = {a, b, c}.Define E(L) = {z I(3y E r)(3x E L)z = yx}. Use the regular expression concepts given in this chapter to argue that 1!Jl.:t is closed under the operator E (that is, don't build a new automaton; build a new regular expression from the old expression). 6.30. Let I = {a, b, c}.Define B (L) = {z I(3x E L)(3y E I*)z = xy}. Use the regular expression concepts given in this chapter to argue that 1!Jl.:t is closed under the operator B (that is, don't build a new automaton; build a new regular expression from the old expression). 6.31. Let I = {a, b, e}. Define M(L) = {z I(3x E L)(3y E r)z = xy}. Use the regular expression concepts given in this chapter to argue that 1!Jl.:t is closed under the operator M (that is, don't build a new automaton; build a new regular expression from the old expression). 6.32. (a) Let I = {a, b, c}. Show that there does not exist a unique solution to the following set of language equations: Xl=bUe,XlUa,X2 X2 = c U 0,Xl U e,X2 (b) Does this contradict Theorem 6.2? Explain. 6.33. Solve the following set of language equations for Xo and X, over {O, 1}*: x, = 0*1 U (10)*Xo U 0(0 U I)X l x, = e U 1*01x, U OX I 6.34. Let I = {a, b, e}. (a) Give a regular expression that describes the set of all words in I * that end with c and for which aa, bb, and cc never appear as substrings. (b) Give a regular expression that describes the set of all words in I* that begin with c and for which aa, bb, and cc never appear as substrings. 6.35. Let I = {a, b, c], (a) Give a regular expression that describes the set of all words in I* that contain no more than two es. (b) Give a regular expression that describes the set of all words in I* that do not have exactly one c. 6.36. Recall that the reverse of a word x, written x', is the word written backward. The reverse of a language is likewise given by L' = {x'ix E L}. Let I = {a, b, c}. Chap. 6 Exercises 207 (a) Note that (R 1 U Rs)' = (R~ U R;) for any regular sets R1 and Rz• Give similar equivalences for each of the rules in Definition 6.1. (b) If L were represented by a regular expression, explain how to generate a regular expression representing L' (compare with the technique used in the proof of Theorem 6.6). (c) Prove part (b) by inducting on the number of operators in the expression. (d) Use parts (a), (b), and (c) to argue that~}; is closed under the operator r. 6.37. Complete the details of the proof of Theorem 6.4. 6.38. Let I = {a, b, e}, (a) Give a regular expression that describes the set of all words in I * for which no b is immediately preceded by a. (b) Give a regular expression that describes the set of all words in I* that contain exactly two cs and for which no b is immediately preceded by a. 6.39. Let I = {a, b, c}. (a) Give a regular expression that describes the set of all words in I * for which no b is immediately preceded by c. (b) Give a regular expression that describes the set of all words in I* that contain exactly one c and for which no b is immediately preceded by c. 6.40. (a) Use Theorem 6.3 to write the two right-linear equations in two unknowns corresponding to the NDFA given in Figure 6.11. Figure 6.11 The NDFA for Exercise 6.40 (b) Solve these equations for both unknowns. (c) Give a regular expression that corresponds to the language accepted by this NDFA. (d) Rework the problem with two left-linear equations. 6.41. (a) Use Theorem 6.3 to write the four right-linear equations in four unknowns corresponding to the NDFA given in Figure 6.12. Figure 6.12 The automaton for Exercise 6.41 (b) Solve these equations for all four unknowns. (c) Give a regular expression that corresponds to the language accepted by this NDFA. (d) Rework the problem with four left-linear equations. 208 Regular Expressions Chap. 6 6.42. (a) Use Theorem 6.3 to write the seven right-linear equations in seven unknowns corresponding to the NDFA given in Figure 6.13. Figure 6.13 The NDFA for Exercise 6.42 (b) Solve these equations for all seven unknowns. Hint: Make use of the simple nature of these equations to eliminate variables without appealing to Theorem 6.2. (c) Give a regular expression that corresponds to the language accepted by this NDFA. (d) Rework the problem with seven left-linear equations. 6.43. Prove that for any languages A, E, and Y, if E ~ Y, then A*E ~ A*Y. 6.44. Let S be an alphabet, and let s: :l~ I" be a substitution. (a) Prove that the image of 9tl; under s is contained in 9tr . (b) Give an example to show that the image of Xl; under s need not be completely contained in Xr. 6.45. Give a detailed proof of Lemma 6.3. 6.46. Let S= {a, b] and E = {x E:l* Ix contains (at least) two consecutive bs A x does not contain two consecutive as}. Draw a machine that will accept E. 6.47. Let S = {a, b, c}. Give regular expressions that will describe: (a) {x E {a, b, c}* Ievery b in x is eventually followed by c}; that is, x might look like baabacaa, or bcacc, and so on. (b) {x E {a, b, c}* Ievery b in x is immediately followed by c}. 6.48. Let :l = {a, b]. Give, if possible, regular expressions that will describe each of the following languages. Try to write these directly from the descriptions (that is, avoid relying on the nature of the corresponding automata). (a) The language consisting of all words that have neither consecutive as nor consecutive bs. (b) The language consisting of all words that begin and end with different letters. (c) The language consisting of all words for which the last two letters match. (d) The language consisting of all words for which the first two letters match. (e) The language consisting of all words for which the first and last letters match. 6.49. The set of all valid regular expressions over {a, b] is a language over the alphabet {a, b, (,), U,*, *,~,E}. Show that this language is not FAD. 6.50. Give regular expressions corresponding to the languages accepted by each of the NDFAs listed in Figure 6.14. 6.51. Complete the details of the proof of Theorem 6.6. 6.52. Prove Lemma 6.4. 6.53. Corollary 6.3 followed immediately from Theorem 6.6. Show that Theorems 5.2,5.4, and 5.5 are also corollaries of Theorem 6.6. Chap. 6 Exercises 209 a) d) g) Figure 6.14 The automata for Exercise 6.50 6.54. Let F be the collection of languages that can be formed by repeated application of the following five rules: i. {a} E F and {b} E F ii. { } E F iii. {A} E F iv. IfFtEFandF2EF,thenFt*F2EF v. IfFtEFandF2EF,thenFtUF2EF Describe the class of languages generated by these five rules. CHAPTER FINITE-STATE TRANSDUCERS We have seen that finite-state acceptors are by no means robust enough to accept standard computer languages like Pascal. Furthermore, even if a DFA could reliably recognize valid Pascal programs, a machine that only indicates "Yes, this is a valid program" or "No, this is not a valid program" is certainly not all we expect from a compiler. To emulate a compiler, it is necessary to have a mechanism that will produce some output other than a simple yes or no: in this case, we would expect the corresponding machine language code (if the program compiled successfully) or some hint as to the location and nature of the syntax errors (if the program was invalid). A machine that accepts input strings and translates them into output strings is called a sequential machine or transducer. Our conceptual picture of such a device is only slightly different from the model of a DFA shown in Figure 7.1a. We still have a finite-state control and an input tape with a read head, but the accept/reject indicator is replaced by an output tape and writing device, as shown in Figure 7.1b. These machines do not have the power to model useful compilers, but they can be employed in many other areas. Applications of sequential machine concepts are by no means limited to the computer world or even to the normal connotations associated with "read" and "write." A vending machine is essentially a transducer that interprets inserted coins and button presses as valid inputs and returns candy bars and change as output. Elevators, traffic lights, and many other common devices that monitor and react to limited stimuli can be modeled by finite-state transducers. The vending machine analogy illustrates that the types of input to a device (coins) may be very different from the types of output (candy bars). In terms of our 210 Sec. 7.1 Basic Definitions 1,;:111,llClpl'IIDI,;:I);'1 I + 211 Finite State Control Finite State Control .. Finite State Acceptor +.1 I I Finite State Transducer Figure 7.1 The difference between an acceptor and a transducer conceptual model, the read head may be capable of recognizing symbols that are different from those that the output head can print. Thus we will have an output alphabet r that is not necessarily the same as our input alphabet k. Also essential to our model is a rule that governs what characters are printed. For our first type of transducer, this rule will depend on both the current internal state of the machine and the current symbol being scanned by the read head and will be represented by the function co, Finally, since we are dealing with translation rather than acceptance/rejection, there is no need to single out accepting states: the concept of final states can be dispensed with entirely. 7.1 BASIC DEFINITIONS V Definition 7.1. A finite-state transducer (FST) or Mealy sequential machine with a distinguished start state is a sextuple <k, I', S, so, 8, w>, where: l, ii. iii. iv. v. vi. a k denotes the input alphabet. r denotes the output alphabet. S denotes the set of states, a finite nonempty set. So denotes the start (or initial) state; So E S. 8 denotes the state transition function; 8: S x k-,'> S. w denotes the output function; co: S x k-,'> r. The familiar state transition diagram needs to be slightly modified to represent these new types of machines. Since there is one labeled arrow for each ordered pair in the domain of the state transition function and there is also one output symbol for each ordered pair, we will place the appropriate output symbol by its corresponding arrow, and separate it from the associated input symbol by a slash, /. 212 EXAMPLE 7.1 Finite-State Transducers Chap. 7 Let V = <in, d, q, b},[e, n', d', q', CO, c., CZ, C3, C4}, S, sO, 8, w> be the FST illustrated in Figure 7.2. V describes the action of a candy machine that dispenses 30¢ Chocolate Explosions. n, d, q denote inputs of nickels, dimes, and quarters (respectively), and b denotes the act of pushing the button to select a candy bar. 'P, n', d', q', Co, c., CZ, C3, C4 represent the vending machine's response to these inputs: it may do nothing, return the nickel that was just inserted, return the dime, return the quarter, or dispense a candy bar with 0, 1,2,3, or 4 nickels as change, respectively. Note that the transitions agree with the vending machine model presented in Chapter 1; the new model now specifies the action corresponding to the given input. It is relatively simple to modify the above machine to include a new input r that signifies that the coin return has been activated and a new output a representing the release of all coins that have been inserted (see the exercises). Figure 7.2 A finite-state transducer model of the vending machine discussed in Example 7.1 Various modern appliances can be modeled by FSTs. Many microwave ovens accept input through the door latch mechanism and an array of keypad sensors, and typical outputs include the control lines to the microwave generator, the elements of a digital display, an interior light, and an audible buzzer. The physical circuitry needed to implement these common machines will be discussed in a later section. We now examine the ramifications of Definition 7.1 by concentrating on the details of a very simple finite-state transducer. Sec. 7.1 Basic Definitions 213 EXAMPLE 7.2 Let B = <I, I', S, sO, 8, w> be given by I ={a, b] r = {O, I} S = {so, s.} So = So The state transition function is defined in Table 7.1a. TABLE7.1a B .3 b So So S1 S1 So S1 It can be more succinctly specified by (Vs E S)[8(s, a) = So and 8(s, b) = s.], Finally, Table 7.1b displays the output function, which can be summarized by (Vc E I)[w(so, c) = 0 and w(sJ,c) = 1] TABLE7.1b ~~~ I ~ ~ All the information about B is contained in the diagram displayed in Figure 7.3. Consider the input sequence z = abaabbaa. From so, the first letter of z, that is, a, causes a 0 to be printed, since w(so, a) = 0, and since 8(so,a) = so, the machine remains in state so. The second letter b causes a second 0 to be printed since w(so, b) = 0, but the machine now switches to state Sl [8(so, b) = s.], The third input letter causes a 1 to be printed [w(s., a) = 1], and so on. The entire output string will be 00100110, and the machine, after starting in state so, will successively assume the state so, SJ, so, so, SJ, SJ, so, So as the input string is processed. We are not currently interested in the terminating state for a given string (so in this case), but rather in the . resulting output string, 00100110. Figure 7.3 The state transition diagram for the transducer discussed in Example 7.2 214 Finite-State Transducers Chap. 7 It should be clear that the above discussion illustrates a very awkward way of describing translations. While Wdescribes the way in which single letters are translated, the study of finite-state transducers will involve descriptions of how entire strings are translated. This situation is reminiscent of the modification of the state transition function 8, which likewise operated on single letters, to the extended state transition function 8 (which was defined for strings). Indeed, what is called for is an extension of to to W, which will encompass the translation of entire strings. The translation cited in the last example could then be succinctly stated as w(so, abaabbaa) == 00100110. That is, the notation wet, y) is intended to represent the output string produced by a transducer (beginning from state t) in response to the input string y. The formal recursive definition of Wwill depend not only on w but also on the state transition function 8 (and its extension 8). "8 retains the same conceptual meaning it had for finite-state acceptors: 8(s, x) denotes the state reached when starting from s and processing, in sequence, the individual letters of the string x. Furthermore, the conclusion stated in Theorem 1.1 still holds: (Vx E 1*)(Vy E 1*)(Vs E S)(8(s,yx) == 8(8(s,y),x)) A similar statement can be made about wonce it has been rigorously defined. V Definition 7.2. Given a FST A== <1,f,S,so,8,w>, the extended output function for A, denoted by w, is a function W: S x l*~f* defined recursively as follows: l. (Vt ES) w(t,>.) == >. ii. (Vt E S)(Vx E 1*)(Va E l)(w(t, ax) == wet,a)*w(8(t, a), x)) EXAMPLE 7.3 Let B= <I, I', S, sO, 8, w> be the FST given in Example 7.2. Then W(S1, baa) = W(S1, b)*w(8(S1, b), aa) = 1*W(S1, aa) = I'W(S1,a)'w(8(s1,a),a) == 11'w(so,a) =:' 110 Note that a three-letter input sequence gives rise to exactly three output symbols: w is length preserving, in the sense that (Vt E S)(Vx E l*)(!w(t, x ) 1== Ixi). The Wfunction extends the Wfunction from single letters to words. Whereas the Wfunction maps a state and a letter to a single symbol from I', the w function maps a state and a word to an entire string from I'", It can be deduced from (i) and (ii) (see the exercises) that (iii) (Vt E S)(Va E l)(w(t, a) = wet,a)), which is the observation that wand wtreat single letters the same. The extended output function Whas properties similar to those of 8, in that the single letter a found in the recursive Sec. 7.1 Basic Definitions 215 definition of 00 can be replaced by an entire word y. The analog of Theorem 1.1 is given below. V Theorem 7.1. Let A = <!', r, S, so, 8, 00> be a FST. Then: (Vx E!'*)(Vy E!'*)(Vt ES)(oo(t,yx) =oo(t,Y)'oo(S(t,y),x)) and (Vx E !'*)(Vy E !'*)(Vs E S)(S(s,yx) = S(S(s,y),x)) Proof. The proof is by induction on IyI (see the exercises and compare with Theorem 1.1). a EXAMPLE 7.4 Let B = <!', r, S, so, 8, 00> be the FST given in Example 7.2. Consider the string z = abaabbaa = yx, where Y = abaab and x == baa. To apply Theorem 7.1 with t = so, we first calculate oo(so,Y) =oo(so, abaab) = 00100, and S(so,Y) = Sl. From Example 7.3, oo(S1> baa) = 110, and hence, as required by Theorem 7.1, 00100110=oo(so, abaabbaa) =oo(so,Yx) =oo(so,y)'oo(S(t,y),x) = 00100*110 For a given FST A with a specified start state, the deterministic nature of finite-state transducers requires that each input string be translated into a unique output string; that is, the relation fA that associates input strings with their corresponding output strings is afunction. V Definition 7.3. Given a FST M = <!', r, S, so, 8, 00>, the translationfunction for M, denoted by fM' is the functionju: !,*~ I'" defined by fM(X) =oo(so,x). a Note that fM,like 00, is length preserving: (Vx E !'*)(lfM(x)I = IxI). Consequently, for any n EN, if the domain offM were restricted to !,n, then the range offM would likewise be contained in I". EXAMPLE 7.5 Let B = <!', r, S, so, 8, 00> be the finite-state transducer given in Figure 7.3. Since oo(so, abaab) = (00100), fB(abaab) = 00100. Similarly, fB(A) = A, fB(a) = 0, fB(b) = 0, fB(aa) = 00, iB(ab) = 00, fs(ba) = 01, fB(bb) = 01. Coupled with these seven base definitions, this particular iB could be recursively defined by (Vx E !,*)fB(xaa) = fs(xa)*O fs(xab) = fs(xa)*O fs(xba) =iB(xb)*1 216 and Finite-StateTransducers Chap. 7 fB(xbb) = fB(xb)*1 fB in essence replaces as with Os and bs with Is, and "delays" the output by one letter. More specifically, the translation function for B takes an entire string and substitutes Os and Is for as and bs (respectively), deletes the last letter of the string, and appends a 0 to the front of the resulting string. The purpose of the two states So and S1 in the FST B is to remember whether the previous symbol was an a or a b (respectively) and output the appropriate replacement letter. Note that Is are always printed on transitions from s}, and Os are printed as we leave so. EXAMPLE 7.6 Let C = <{a, b},{O, I}, {to, t},t2 , t3 } , to,8e, we> be the FST shown in Figure 7.4. C flags occurrences of the string aab by printing a Ion the output tape only when the substring aab appears in the input stream. Figure 7.4 The state transition diagram for the Mealy machine C in Example 7.6 Clearly, not all functions from I* to I'" can be represented by finite-state transducers; we have already observed that functions that are not length preserving cannot possibly qualify. As the function discussed later in Example 7.7 shows, not all length-preserving functions qualify, either. V Definition 7.4. Given a function f: I *--',lo I" , f is finite transducer definable (FTD) iff there exists a transducer A such thatf = fA' a Due to the deterministic nature of transducers, any two strings that "begin the same" must start being "translated the same." This observation is the basis for the following theorem. V Theorem 7.2. Assume f is FTD. Then ('tin E ~)('tIx E In)('tIy E L*)('tIZ E I*) (thefirst n letters off(xy) must agree with the first n letters off(xz)) Sec. 7.2 Minimization of Finite-State Transducers 217 Proof. See the exercises. EXAMPLE 7.7 Consider the function g: {a, b, c}*~ {O, I}*, which replaces input symbols by 0 unless the next letter is c, in which case I is used instead. Thus, g(abcaaccb) = 01001100 and g(abb) = 000. With n = 2, choosing x = ab, y = caaccb, and z = b shows that g violates Theorem 7.2, so g cannot be FrO. The necessary condition outlined in the previous theorem is by no means sufficient to guarantee that a function is FrO; other properties such as a pumping lemmastyle repetitiousness of the translation must also be present (see the exercises). 7.2 MINIMIZATION OF FINITE-STATE TRANSDUCERS Two transducers that perform exactly the same translation over the entire range of input strings from ~* will be called equivalent transducers. This is in spirit similar to the way equivalence was defined for deterministic finite automata. V Definition 7.5. Given transducers A = <~, r, SA, SOA' 5A, WA> and B = <~, r, Ss, SOB' 5s, ws>, A is said to be equivalent to B ifffA =fs. Ii Just as with finite automata, a reasonable goal when constructing a transducer is to produce an efficient machine, and, as before, this will be equated with the size of the finite-state control; given a translation function f, a minimal machine for f is a FST that has the minimum number of states necessary to perform the required translation. V Definition 7.6. Given a finite-state transducer A=<~,r,SA,SOA,5A,WA>' A is the minimal Mealy machine for the translation fA iff for all finite-state transducers B = <~, r, Ss, SOB' 5s, ws> for which fA =fs, II SAil :5IISsll* Ii Thus, A is minimal if there is no equivalent machine with fewer states. EXAMPLE 7.8 The FST C = <{a, b},{O, I}, {to, tl> t2, t3 } , to, 5c, we> given in Figure 7.4 is not minimal. The FST D = <{a, b},{O, I}, {qo, ql> q2}, qo,50,wo> given in Figure 7.5 performs the same translation, but has only three states. 218 Finite-State Transducers Chap. 7 Figure 7.5 The state transition diagram for the Mealy machine 0 in Example 7.8 The concept of two transducers being essentially the same except for a trivial renaming of the states will again be formalized through the definition of isomorphism (and homomorphism). As before, it will be important to match the respective start states and state transitions; but rather than matching up final states (which do not exist in the FST model), we must instead ensure that the output function is preserved by the relabeling process. 'i/ Definition 7.7. Given two FSTs A = <2, I', SA, SOA' &A, WA> and B = <2, I', SB,SOB' &B, WB>, and a function u.; SÃ SB, f.L is called a Mealy machine homomorphism from A to B iff the following three conditions hold: i, f.L(SOA) = SOB' ii. (Vs E SA)(Va E 2)("""(&A(S, a)) = &B("""(S), a)), iii. (Vs E SA)(Va E 2)(WA(S, a) = WB(f.L(S), a)). As in Chapter 3, a bijective homomorphism will be called an isomorphism and will signify that the isomorphic machines are essentially the same (except perhaps for the names of the states). The isomorphism is essentially a recipe for renaming the states of one machine to produce identical transducers. 'i/ Definition 7.8. Given two FSTs A = <2, I', SA, SOA' &A, WA> and B = <2, I', SB,SOB' &B, WB>, and a fun.ction IX: SÃ SB, f.L is called a Mealy machine isomorphism from A to B iff the following five conditions hold: l, f.L(SOA) = SOB' . ii. (Vs E SA)(Va E 2)(f.L(&A(S, a)) = &B(f.L(S), a)). iii. (Vs E SA)(Va E 2)(WA(S, a) = WB(f.L(S), a)). Sec. 7.2 Minimization of Finite-State Transducers 219 iv. f-L is a one-to-one function from SA to SB. v.f-L is onto SB. V Definition 7.9. If f-L: SA-SB is an isomorphism between two transducers A = <I, r, SA, SOA'8A, WA> and B = <I, r, SB, SOB' 8B, WB>, then A is said to be isomorphic to B, and we will write A =:; B. d EXAMPLE 7.9 Consider the two FSTs C = <{a, b},{O, I}, {to, t1> t2,t3},to,8c, wc>, given in Figure 7.4, and D=<{a,b},{O,I},{qo,q1>q2},qo,80'wo>, displayed in Figure 7.5. The function f-L: {to, t1> t2,t3}- {qo, q1> q2}, defined by f-L(to) = qo, f-L(t1) = qb f-L(t2) = q2, and fi(t3)= qo is a homomorphism between C and D. Conditions (i) and (ii) are exactly the same criteria used for finite automata homomorphisms and have exactly the same interpretation: the start states must correspond arid the transitions must match. The third condition is present to ensure that the properties ofthe W function are respected; for example, since t2causes 1 to be printed when b is processed, so should the corresponding state in the 0 machine, which is q2= f-L(t2) in this example. Indeed, wc(t2 , b) = 1 = wo(f-L(t2), b). Such similarities extend to full strings also: note that wc(to,aab) = 001 = wo(f-L(to),aab) in this example. The results can be generalized as presented in the next lemma. V Lemma 7.1. If f-L: SASB is a homomorphism between two FSTs A = <I, I', SA, SOA'8A, WA> and B = <I, r, SB, SOB; 8B,WB>, then and ('tis E SA)('tIX E I*)(WA(S,X) =WB(f-L(S),x))). Proof. The proof is by induction on Ix I(see the exercises). V Corollary 7.1. If u.: SA-Sa is a homomorphism between two FSTs A = <I, r, SA, SOA'8A, WA> and B= <I, r, SB, SOB' 8B, WB>, then A is equivalent to B; that is,fA = fa. Proof. The proof follows immediately from Lemma 7.1 and the definition of In a manner very reminiscent of the approach taken to minimize deterministic finite automata, notions of state equivalence relations, reduced machines, and 220 Finite-State Transducers Chap. 7 connectedness can be defined. As was the case in Chapter 3, a reduced and connected machine will be isomorphic to every other equivalent minimal machine. The definition for connectedness is essentially unchanged. V Definition 7.10. A state s in a transducer M = <~, r, S, sO, 8, w> is called accessibleiff (3x s E ~*) 18(so,xs) = s The transducer M = <~, r, S, sO, 8, w> is called connected iff (VsE S)(3xs E ~*) 18(so,xs) = s That is, every state s of S can be reached by some string (xs) in ~"; once again, the choice of the state s will have a bearing on which particular string is used as a representative. States that are not accessible do not affect the translation performed by the transducer; such states can be safely deleted to form a connected version of the machine. V Definition 7.11. Given a FST M = <~, r, S, sO, 8, w>, define the transducer Mc = <~, r, SC, sti, 8c, wC>,called M connected, by SC = {sE S 13x E~* 18(so,x) = s} 8Cis essentially the restriction of 8 to SC x~: (Va E ~)(Vs E SC)(8C(s, a) = 8(s, a)), and WCis the restriction of w to SC x~: (Va E ~)(Vs E SC)(WC(s, a) = w(s, a)). A MC is, as in Chapter 3, the machine M with the unreachable states "thrown away." As with DFAs, trimming a machine in this fashion has no effect on the operation of the transducer. To formally prove this, the following lemma is needed. V Lemma 7.2. Given transducers M = <~,r,S,so,8,w> and MC= <~,r,sc,sg,8c,wc>, the restriction of wto SC x ~* is WC. Proof. We must show that (VyE~*)(VtESC)(WC(t,y)=w(t,y)).This can be done with a straightforward induction on Iy I. Let P(n) be defined by (Vy E ~n)(Vt E SC)(WC(t, y) = w(t, y)). The basis step is trivial, since WC(t, A) = A= w(t, A). For the inductive step, assume (Vy E~m)(VtESC)(WC(t,y) =w(t,y)), and let tESCand z E~m+l be given. Then 3x E ~m, 3a E ~ for which z = ax, and therefore Sec. 7.2 Minimization of Finite-State Transducers 221 WC(t, z) = (by definition of z) WC(t, ax) = (by Definition 7.2ii) u{(t, a) 'WC(~)C(t, a), x) = (by definition of ~n u{(t, a) *wC(8(t , a), x) = (by the induction assumption) u{(t, a)*w(8(t, a),x) = (by definition of co") w(t,a)*w(8(t, a),x) = (by Definition 7.2ii) w(t, ax) = (by definition of z) w(t, z) Since z was an arbitrary element of Im+ 1, and t was an arbitrary state in SC, (Vy E Im+ l)(Vt E SC)(WC(t,z) =w(t, z», which proves P(m + 1). Hence, P(m)~P(m + 1), and, since m was arbitrary, (Vm E N)(P(m) ~ P(m + 1». By the principle of mathematical induction, P(n) is therefore true for all n, and the lemma is proved. a Since WC= W, it immediately follows thatfM = fMC, and we are therefore assured that the operation of any transducer is indistinguishable from the operation of its connected counterpart. V Theorem 7.3. Given transducers M = <I, r, S, so, 8, w> and MC = <I, r, SC, sg, 8c, wC>, M is equivalent to MC. Proof. fM'(x) =WC(sg, x) =WC(so, x) =w(so,x) = fM(X), and hence by the definition of equivalence of transducers, M is equivalent to MC. a V Corollary 7.2. Given a FrD function f, the minimal machine corresponding to f must be connected. Proof. (by contradiction): Assume the minimal machine M is not connected; then, by Theorem 7.3, fMC =fM = f, and clearly Iiscil < IISII, and hence M could not be minimal. a While connectedness is a necessary condition for minimality, it is not sufficient, as evidenced by the machine C in Figure 7.4: C was connected, but the FST D in Figure 7.5 was an equivalent but smaller transducer. As was the case with finite automata in Chapter 3, connectedness is just one of 222 Finite-State Transducers Chap. 7 the two major requirements for minimality. The other requirement is that no two states behave identically. For DFAs, this translated into statements about acceptance and rejection. For FSTs, this will instead involve the behavior of the output function. The analog to Definition 3.2 is given next. V Definition 7.12. Given a transducer M = <I, r, S, sO, &, w>, the state equivalence relation on M, EM, is defined by ('Vs E S)('Vt E S)(s EM t ~ ('Vx E I*)(w(s, x) =w(t,x») In other words, we will relate states sand t if and only if it is not possible to determine, by only observing the output, whether we are starting from state s or state t (no matter what input string is used). The more efficient machines will not have such duplication of states, and, as with DFAs, will be said to be reduced. V Definition 7.13. A transducer M = <I, r, S, sO, &, w> is called reduced iff ('Vs, t E S)(s EM t ~ s = t). a As before, if M is reduced, EM must be the identityrelation on the set of states S, and each equivalence class must contain only a single element. We defer for the moment the discussion of how EM can be efficiently calculated. Once the state equivalence relation is known, in a manner that is also analogous to the treatment of finite automata, states related by EM can be coalesced to form a machine that is reduced. V Definition 7.14. Given a FST M = <I, r, S, sO, &, w>, defined M modulo its state equivalence relation, M/EM' by M/EM = <I, r, SEM' SOEM' &EM' WEM>' where SEM = {[S]EMls E S} S~M = [SOlEM &EM is defined by ('VaE I)('V[S]EM E SEM)(&EM([S]EM' a) = [&(s, a)]EM)' and WEM is defined by ('VaE I)('V[slEM E SEM)(WEM([S]EM' a) = w(s, a». The proof that &EM is well defined is similar to that found in Chapter 3. In an analogous fashion, WEM must be shown to be well defined (see the exercises). All the properties that one would expect of M/EM are present, as outlined in the following theorem. Sec. 7.2 Minimization of Finite-State Transducers 223 V Theorem 7.4. Given a FST M = <I, r, S, SO, 8, w>, M/EM =<I, r, SEM' SOEM' 8EM' WEM> is equivalent to M and is reduced. Furthermore, if M is connected, so is M/E M • Proof. The proof that connectedness is preserved is identical to that given for Theorem 3.5; showing that M/EM is reduced is very similar to the proof of Theorem 3.4. The proof of the fact that the two machines are equivalent requires the inductive argument that ('t:Iy EI*)('t:ItES)(w(t,y) =WEM([t]EM,y)) and is indeed very similar to the proofs of Lemma 7.2 and Theorem 7.3. d . An argument similar to that given for Corollary 7.2 shows that a reduced FST is also a requirement for minimality. V Corollary 7.3. Given a FTD functionf, the minimal machine corresponding to fmust be reduced. Proof. The proof is by contradiction; see the exercises. Being reduced, like connectedness, is a necessary condition for a machine to be minimal, but it is also not sufficient (see the exercises). One would hope that the combination of being reduced and connected wouLd be sufficient to guarantee that the given machine is minimal. This is indeed the case, and one more important result, proved next in Theorem 7.5, is needed to complete the argument: Two reduced and connected FSTs are equivalent iff they are isomorphic. Armed with this result, we can also show that a minimal transducer can be obtained from any FST M by reducing and connecting it. As in Chapter 3, connecting and reducing an arbitrary machine M will therefore be guaranteed to produce the most efficient possible machine for that particular function. V Theorem 7.5. Two reduced and connected FSTs, Mj = <I, r, SI> SOl' 81> Wj> and Mz= <I, r, SZ, SOl' 8z, wz>, are equivalent iffM j == Mz. Proof. By Corollary 7.1, if Mj == Mz, then Mj is equivalent to Mz. The converse half of the proof is very reminiscentof that given for Theorem 3.1. We must assume Mj and Mz are equivalent and then prove that an isomorphism can be exhibited between Mj and Mz. A natural way to define such an isomorphism is as follows: Given a state s in Mj, choose a string X s such that 8j(so1'x s) = s. Let J.1(s) = 8z(s~, x.). At least one such string z, must exist for each state of MI> since Mj was assumed to be connected. There may be several choices for z, for a given state s, but all will yield the same value for 8z(soz'xs), and so J.1 is well defined (see the exercises). The function J.1 satisfies the three properties of a homomorphism and turns out to be a bijection (see the exercises). Thus Mj == Mz. As will be clear from 224 Finite-State Transducers Chap. 7 the exercises, the hypothesis that M1 and Mz are reduced and connected is crucial to the proof of this part of the theorem. a Note that Theorem 7.5 implies that, as long as we are dealing with reduced and connected machines, fM l = fM2iff M1 ='=' Mz. The conclusions discussed earlier now follow immediately from Theorem 7.5. V Corollary 7.4. Given a FST M, a necessary and sufficient condition for M to be minimal is that M is both reduced and connected. Proof. See the exercises. V Corollary 7.5. Given a FST M, Me/E MCis minimal. Proof. Let M be a FST and let A be a minimal machine that is equivalent to M. By Corollaries 7.2 and 7.3, A must be both reduced and connected. By Theorems 7.3 and 7.4, Me/E MC is also reduced, connected, and equivalent to M (and hence to A). Theorem 7.5 would then guarantee that A and Me/E MCare isomorphic, and therefore they have the same number of states. Since A was assumed to have the minimum possible number of states, Me/E MC also has that property and is thus minimal. a The minimal machine can therefore be found as long as Me/E MCcan be computed. Finding S" (and from that M') is accomplished in exactly the same manner as described in Chapter 3. The strategy for generating EM is likewise quite similar, and again uses the ith state equivalence relation, as outlined below. V Definition 7.15. Given a transducer M = <~, r, s,sO, S,w> and a nonnegative integer i, define a relation between the states of M called EiM, the ith state equivalence relation on M, by ('Is, t E S)(s EiM t ¢> ('Ix E ~* 3 Ix I:s; i)(w(s, x) =wet,x))) Thus E iM relates states that cannot be distinguished by strings of length i or less, whereas EM relates states that cannot be distinguished by any string of any length. All the properties attributable to the analogous relations for finite automata (EiA) carry over, with essentially the same proofs, to the relations for finite-state transducers (EiM) . V Lemma 7.3. Given a transducer M = <~, r, s,sO, S, w>: a. E m+1M is a refinement of E mM; that is, ('Is, t E S)(s E m+1M t ~ s E mM t). Sec. 7.3 Moore Sequential Machines 225 b. EM is a refinement of EmM; that is, (Vs,tES)(sEMt ~ sEmMt); hence, EM~EmM' c. (3m E N ~ EmM= Em+IM) ~ (Vk E N)(Em+kM= EmM). d. (3m E t\l ~ m :511 S II f\ EmM= Em+lM)' e. (3m E N ~ EmM= Em+IM) ~ EmM= EM. Proof. The proof is similar to the proofs given in Chapter 3 for EiA(see the exercises). A V Lemma 7.4. Given a FST M = <1,r,S,so,8,w>: a. EOMhas just one equivalence classes, which consists of all of S. b. ElM is defined by s ElM t ~ (Va E 1)(w(s, a) = w(t, a». c. For i 2: 1, Ei+IM can be computed from EiMas follows: (Vs E S)(Vt E S)(Vi 2: 1)(s Ei+IMt ~ S EiMtf\ (Va E 1)(8(s, a) EiM8(t, a))). Proof. The proof is similar to the proofs given in Chapter 3 for EiA(see the exercises). A V Corollary 7.6.. Given a FST M = <I, r, S, sO, 8, w>, there is an algorithm for computing EM. Proof. Use Lemma 7.4 to compute successive EiMrelations from ElM until EiM= Ei+IM; by Lemma 7.3, this EiMwill equal EM, and this will all happen before i reaches IIsll, the number of states in S. Thus the procedure is guaranteed to halt. A V Corollary 7.7. Given a FST M = <1,r,S,so,8,w>, there is an algorithm for computing the minimal machine equivalent to M. Proof. Using the algorithm for computing the set of connected states, MC can be found. The output function is used to find ElMc, and the state transition function is then used to calculate successive relations until EMc is found. MC/EMCcan then be defined and will be the minimal machine equivalent to M. A 7.3 MOORE SEQUENTIAL MACHINES Moore machines form another class of transducer that is equivalent in power to Mealy machines. They use a less complex output function, but often require more states than an equivalent Mealy machine to perform the same translation. An illustration of the convenience and utility of Moore machines can be found in 226 Finite-State Transducers Chap. 7 Example 7.16, which demonstrates that traffic signal controllers can most naturally be modeled by the transducers discussed in this section. V Definition 7.16. A Moore sequential machine (MSM) with a distinguished start state is a sextuple <~, I', S, sO, 8, w>, where: l, ~ denotes the input alphabet. li, r denotes the output alphabet. iii. S denotes the set of states, a finite nonempty set. lv, So denotes the start (or initial) state; So E S. v. 8 denotes the state transition function; 8: S x ~~ S. vi. w denotes the output function; w: S~ r. a Note that the only change from Definition 7.1 is the specification of the domain of w. Conceptually, we will envision the machine printing an output symbol as a new state is reached (rather than during the transition, as was the case for Mealy machines).Note that the output symbol can no longer depend (directly) on the current symbol being scanned; it is solely a function of the current state of the machine. Consequently, the state transition diagrams will list the output function next to the state name, separated by a slash, l . We will adopt the convention that no symbol will be printed until the first character is read and a transition is made (an alternate view, not adopted here, is to decree that the machine print the symbol associated with So when the machine is first turned on; in this case, an output string would be one character longer than its corresponding input string). EXAMPLE 7.10 Let C = <~, I', S, ro,8, w> be given by ~ = {a, b} r =: {O, I} So =: ro The state transition table is shown in Table 7.2. TABLE 7.2 1) a b ro ro rz II Io Iz I z II I3 r, II I3 Sec. 7.3 Moore Sequential Machines 227 Finally, the output function is given by w(ro) = 0, w(r1) = 1, w(r2) = 0, and w(r3)= 1, or, more succinctly, [w(r;)=imod2] for i =0,1,2,3. All the above information about C is contained in Figure 7.6. This Moore machine performs the same translation as the Mealy machine B in Example 7.2. Figure 7.6 The state transition diagram for the transducer discussed in Example 7.10 Results that were targeted toward a: FST in the previous sections were specific to Mealy machines. When the descriptor "transducer" appearsin the theorems and definitions presented earlier, the concept or result applies unchanged to both FSTs and MSMs. Most of these results are alluded to but not restated in this section. For example, 8" is defined like and behaves like the extended state transition functions for DFAs and FSTs. On the other hand, because of the drastic change in the domain of w, w must be modified as outlined below in order for w(s, x) to represent the output string produced when starting at s and processing x. V Definition 7.17. Given a MSM A=<I,r,S,so,8;w>, the extended output function for A,denoted again by W, is a function w: S x I*~ I'" defined recursively by: i, (Vt E S) w(t, A) = A ii. (Vt E S)(Vx E !'*)(Va E !')(w(t, ax) = w(8(t, a))*w(8(t, a), x)) A Note that the domain of the function w has been extended further than usual: in all previous cases, the domain was enlarged from S x !, to S x !,"; in this instance, we are beginning with a domain of only S and still extending it to S x !,". The above definition allows the following analog of Theorem 7.1 to remain essentially unchanged. V Theorem 7.6. Let!' be an alphabet and A = <!', I', S, sO, 8, w> be a Moore sequential machine. Then 228 Finite-State Transducers Chap. 7 (VX E~*)(Vy E~*)(VtES)(w(t,yx) =w(t,y)'w(8(t,y),x» Proof. The proof is by induction on Iy I (see the exercises and compare with Theorem 1.1). A As before, the essence of a Moore machine is captured in the translation function that the machine describes. V Definition 7.18. Given a MSM M = <~, r,s, so, 8, w>, the translation function for M, denoted by fM' is the function fM: ~*--i> I'" defined by fM(X) =w(so,x). A Definition 7.5 applies to Moore machines; two MSMs are equivalent if they define the same translation. Indeed, it is possible for a Mealy machine to be equivalent to a Moore machine, as shown by the transducers in Figures 7.2 and 7.6. It is easy to turn a Moore machine A = <~, r, s, so, 8, w> into an equivalent Mealy machine M = <~,r,S,so,8,w'>. The first five parts of the transducer are unchanged. Only the sixth component (the output function) must be redefined, as outlined below. V Definition 7.19. Given a Moore machine A = <~, r, s, so, 8, w>, the corresponding Mealy machine M is given by M = <~, r, s, so,8, w'>, where wI is defined by (Va E ~)(Vs E S)(w'(s, a) = w(8(s, a») Pictorially, all arrows that lead into a given state in the Moore machine should be labeled in the corresponding Mealy machine with the output symbol for that particular state. It follows easily from the definition that the corresponding machines perform the same translation. V Theorem 7.7. Given a Moore machine A= <~,r,S,so,8,w>, the corresponding Mealy machine M = <~, r, s, so, 8, w'> is equivalent to A; that is, (Vx E ~*)(fM(X) = fA(X», Proof. The proof is by induction on Ix I(see the exercises). EXAMPLE 7.11 Let A = <~, r, s, ro, 8, w> be the Moore machine given in Figure 7.6. The corresponding Mealy machine M = <~, r, s, so, 8, w'> is then given by ~={a,b}, r={O,I}, So = ro Sec. 7.3 Moore Sequential Machines 229 and the state transition table and the output function table are specified as in Tables 7.3a and 7.3b. TABLE7.3A TABLE7.3B I) a b 00' a b fa fa f2 fa 0 0 f1 fa f2 f1 0 0 f2 f1 f3 f2 1 1 f3 f1 f3 f3 1 1 The new Mealy machine is shown in Figure 7.7. Note that the arrow labeled a leaving rl now has a 0 associated with it, since the state at which the arrow pointed (ro/O) originally output a O. Figure 7.7 The state transition diagram for the Mealy machine M in Example 7.11 In a similar fashion, an equivalent Moore machine can be defined that corresponds to a given Mealy machine. However, .due to the more restricted nature of the output function of the Moore constructs, the new machine will generally need more states to perform the same translation. The idea behind the construct is to break each state in the Mealy machine up into a group of several similar states in the Moore machine, each of which prints a different output symbol. The new transition function mimics the old one; if state r maps to state t in the Moore machine, then any state in the group corresponding to r will map to one particular state in the group of states corresponding to t. The particular state within the group is chosen in a manner that will guarantee that the appropriate output symbol will be printed. This construct is implemented in the following definition. 'i1 Definition 7.20. Given a Mealy machine M = <I, I', S, so, 8, 00>, the corresponding Moore machine A is given by A = <I, I', S x I', (so, a), 8',00'>, where a is an (arbitrary) member of r, 230 Finite-State Transducers Chap. 7 and A 3' is defined by (Vs E S)(Vb E r)(Va E I)(3'(s, b), a) = (3(s, a), w(s, a») 00' is defined by (Vs E S)(Vb E f)(w'(s, b) = b). V Theorem 7.8. Given a Mealy machine M = <I,f,S,so,3,w>, the corresponding Moore machine A = <I, I', S, sO, 3',00' > is equivalent to M; that is, (Vx E I*)(JA(X) = fM(X», Proof. The proof is by induction on Ix I (see the exercises). A Since every Mealy machine has an equivalent Moore machine and every Moore machine has an equivalent Mealy machine, either construct can be used as a basis of what was meant by a translation f being finite transducer definable. V Corollary 7.8. A translation f is FfD ifff can be defined by a FST M ifff can be defined by a MSM A. Proof. The proof is immediate from the definition of FfD and Theorems 7.7 and 7.8. A EXAMPLE 7.12 Consider the Mealy machine B from Figure 7.3. The corresponding Moore machine A = <I, I', S, qo,3, 00> is given by I ={a, b} r = {O, I} S ={(so, 0), (so, 1), (s., 0), (S1. I)} qo= (so, 1) w(so, 0» = 0, w(so, 1» = 1, W(S1. 0» = 0, w(SI> 1» = 1 and the state transition table is specified as in Table 7.4. TABLE 7.4 1) a b (so,O) (so,O) (S1, 0) (so,l) (so,O) (S1, 0) (SI'O) (so,l) (sl,l) (S1, 1) (so,1) (S1, 1) Figure 7,8 displays this new Moore machine. Note that this transducer A, except for the placement of the start state, looks very much like the Moore machine Sec. 7.3 Moore Sequential Machines 231 Figure 7.8 The state transition diagram for the Moore machine A in Example 7.12 C given in Figure 7.4. Indeed, any ordered pair that is labeled with the original start state would be an acceptable choice for the new start state in the corresponding Moore machine. For example, the automaton A' , which is similar to A but utilizes (so, 0) as the new start state, is another Moore machine that is equivalent to the original Mealy machine B. The transition diagram for A' is shown in Figure 7.9. In fact, by appropriately recasting the definition of isomorphism so that it applies to Moore sequential machines, it can be shown that A' and C are isomorphic. The definition of isomorphic again guarantees that a renaming of the states can be found that preserves start states, transition functions, and output functions. Indeed, the definition of isomorphism agrees with that of Mealy machines (and of DFAs, for Figure 7.9 The state transition diagram for the Moore machine A' in Example 7.12 232 Finite-State Transducers Chap. 7 that matter) except in the specification of the correspondence between output functions. The formal definition is given below. V Definition 7.21. Given two MSMs A = <:£, r,SA, SOA' 8A, WA> and B = <:£, r,SB,SOB' 8B, WB>, anda function I-L: SÃ SB, I-L is called a Moore machine isomorphism from A to B iff the following five conditions hold: I, ii. iii. iv. v. 11 I-L(SOA) = SOB' (VsE SA)(Va E :£)(1-L(8A(s, a)) = 8B(I-L(s), a). (VsE SA)(WA(S) = WB(I-L(S))). I-L is a one-to-one function between SA and SB' I-L is onto SB. EXAMPLE 7.13 The two Moore machines A' in Figure 7.9 and C in Figure 7.6 are indeed isomorphic. There is a function I-L from the states of A'to the states of C that satisfies all five properties of an isomorphism. This correspondence is given by I-L((so, 0») = ro, I-L«so, 1») = rb 1-L«Sb 0») = rz, and 1-L«Sb 1») = r3, succinctly defined by 1-L«Si,j») = rZi+j for i,j E {a, I}. As before, a homomorphism is meant to represent a correspondence between states that preserves the algebraic structure of the transducer without necessarily being a bijection. V Definition 7~22. Given two MSMs A = <:£, r, SA, SOA' 8A, WA> and B = <:£, r, SB, SOB' 8B, WB>, and a function I-L: SÃ SB, I-L is called a Moore machine homomorphism from A to B iff the following three conditions hold: I, I-L(SOA) = SOB ii, (VsE SA)(Va E :£)(1-L(8A(s, a)) = 8B(I-L(s), a)) iii. (Vs E SA)(WA(S) = WB(I-L(S))) The isomorphism I-L discussed in Example 7.13 is also a homomorphism. Preserving the algebraic structure of the transducer guarantees that the translation is also preserved: if A and B are homomorphic, then they are equivalent. The homomorphism criterion that applies to single letters once again extends to similar statements about strings, as outlined in Lemma 7.5. V Lemma 7.5. If I-L: SÃ SB is a homomorphism between two MSMs Sec. 7.3 then and Moore Sequential Machines 233 ("IsE SA)('VX E I*)(WA(S,X) =WB(J.1(S),x))). Proof. The proof is by induction on Ix I(see the exercises). V Corollary 7.9. If J.1: SÃSB is a homomorphism between two MSMs A = <I, r, SA, SOA' 8A, <UA> and B = <I, r, SB, SOB' 8B, <UB>, then A is equivalent to B; that is, fA = fB. Proof. The proof follows immediately from Lemma 7.5 and the definition offM' il It is interesting to note that the MSMs A in Figure 7.8 and A' in Figure 7.9 are not isomorphic. In fact, there does not even exist a homomorphism (in either direction) between A and A' since the start states print different symbols, and rules (i) and (iii) therefore conflict. The absence of an isomorphism in this instance illustrates that an analog to Theorem 7.5 cannot be asserted under the definition of Moore sequential machines presented here. Observe that A and A' are equivalent and they are both minimal (four states are necessary in a Moore machine to perform this translation), yet they are not isomorphic. The reader should contrast this failure with the analogous statement about Mealy machines in Theorem 7.5. Producing a result comparable to Theorem 7.5 is not possible without a fundamental adjustment of at least one of the definitions. One possibility is to drop the distinguished start state from the definition of the Moore machine. This removes condition (i) from the isomorphism definition and thereby resolves the conflict between (i) and (iii). We have already noted that many applications do not require a distinguished start state (such as elevators and traffic signal controls), which makes this adjustment not altogether unreasonable. A more common alternative is to decree that a Moore sequential machine first print the character specified by the start state upon being turned on (before any of the input tape is read) and then proceed as before. This results in output strings that are always one symbol longer than the corresponding input strings, and the lengthpreserving property of transducers is thereby lost. A more substantial drawback results from the less natural correspondence between Mealy and Moore machines: no FST can be truly equivalent to any MSM since translations would not even be of the same length. The advantage of this decree is that machines like A and A' (from Figures 7.8 and 7.9) would no longer be equivalent, and hence they would not be expected to be 234 Finite-State Transducers Chap. 7 isomorphic. Note that equivalence is lost since, under the new decree for translations, they would produce different output when presented with, say, A as input: A would print 1 while A' would produce O. Our definition of a MSM (Definition 7.16) was chosen to remain compatible with the translations obtained from Mealy machines and to preserve a distinguished state as the start state; these advantages were obtained at the expense of a convenient analog to Theorem 7.5. A third, and perhaps the best, alternative is to modify what we mean by a MSM isomorphism. Definition 7.21 can be rephrased to relax the condition that the start states of the two machines must print the same character. As with Mealy machines, Moore machines can also be minimized, and a reduced and connected MSM is guaranteed to be the smallest MSM which performs that translation. Note that Definitions 7.4 (FfD), 7.5 (equivalence), 7.9 (isomorphic), 7.10 (connected), 7.12 (state equivalence relation), 7.13 (reduced), and 7.15 (ith relation) have been phrased to encompass both forms of transducers. Minor changes (generally involving the domain ofthe output function) are all that is necessary to make the remaining definitions and results conform to the Moore constructs. We begin with a formal definition of minimality, which is in essence the same as the definitions presented for DFAs and FSTs (Definitions 2.7 and 7.6). V Definition 7.23. Given a MSM A = <~, I', SA, SOA' 8A, WA>, A is the minimal Moore machine for the translation fA ijjfor all MSMs 8 = <~, I', SB, SOB' 8B, WB> for which fA = Is, II SAil ::; IISBII* ~ A connected Moore machine is essential to minimality. The previous definition of connectedness (Definition7.10) suffices for both FSTs and MSMs and was therefore phrased to apply to all transducers, rather than. to one specific type of transducer. For an arbitrary Moore machine, the algorithm for finding the set of accessible states is unchanged; transitions are followed from the start state until no further new states are found. The connected version of a MSM is again obtained by paring down the state set to encompass onlythe connected states and restricting the 8 and W functions to the smaller domain. V Definition 7.24. Given a, MSM M=<~,r,S,so,8,w>, define the transducer Me = <~, r, s-, so, 8e, we>, called M connected, by S" = {sE S /3x E};* 3l 8(so, x) = s} So = So 8e is essentially the restriction of 8 to S" x S: (\fa E ~)(\fs E se)(8e(s,a) = 8(s, a)), and weis the restriction of co to S" x~: (\fs Ese)(we(s) = w(s)). ~ The concept of a reduced Moore machine and the definition of the state equivalence relation are identical in spirit and in form to those presented for Mealy Sec. 7.3 Moore Sequential Machines 235 machines (Definitions 7.12 and 7.13). The definition that outlines how to reduce a Moore machine by coalescing states differs from that given for FSTs (Definition 7.14) only in the specification of the output function. In both Definition 7.14 and the following Moore machine analog, the value Wtakes for an equivalence class is determined by the value given for a representative of that equivalence class. As ; before, this natural definition for the output function can be shown to be well defined (see the exercises). v . Definition 7.25. Given aMSM M=<~,r,S,so,B,w>,define M!EM' M modulo its state equivalence relation, by M!EM = <~, r, SEM' SOEM' BEM' WEM>' where SEM = {[S]EMls E S} SOEM = [SO]EM BEMis defined by (Va E ~)(V[s] E SEM)(BEM([S]EM' a) = [B(s,a)]EM)' and WEM is defined by The Moore machine M!E M has all the properties attributed to the Mealy version. Without changing the nature of the translation, it is guaranteed to produce a MSM which is reduced. V Theorem 7.9. Given a MSM M = <~, r, S, sO, B,w>: -a, M!EM = <~, r, SEM' SOEM' BEM' WEM> is equivalent to M. b. M!EM is reduced. c. If M is conected, so is M!EM • d. Given a FTD function f, the minimal Moore machine corresponding to f must be reduced. Proof. The proof is similar to Theorem 7.4 (see the exercises). As mentioned earlier, the definition of a MSM chosen here denies a convenient analog to Theorem 7.5. However, a reduced and connected Moore machine must be minimal. V Theorem 7.10 (a) Given a MSM M, a necessary and sufficient condition for M to be minimal is that M is both reduced and connected. 236 (b) Given a MSM M, Me/EMCis minimal. Proof. See the exercises. Finite-State Transducers Chap. 7 The minimal Moore machine corresponding to a MSM M can thus be obtained if the connected state set and the state equivalence relation can be computed. The algorithm for calculating the accessible states is the same as before, and computing the state equivalence relation will again be accomplished using the concept of the ith state equivalence relation (Definition 7.15). All the results proved previously in Lemma 7.3 still hold, showing that successive calculations are guaranteed to halt and produce EM. All that remains is to specify both a starting point and a way to find the next relation from the current EiM. With Mealy machines, EOMconsisted of one single equivalence class, since A' could not distinguish between states. All states were therefore related to each other under EOM' With Moore machines, different states cause different letters to be printed. EOMcan therefore be thought of as grouping together states that print the same symbol. V Lemma 7.6. Given a MSM M = <I, I', S, so, 8, 00>: (a) EoMisdefinedbysEoMt ¢:> (oo(s)=oo(t)). (b) For i 2= 0, Ei+lM can be computed from EiMas follows: (Vs ES)(Vt E S)(Vi 2= o)(s Ei+1Mt ¢:> SEiMtf\(Va E I)(8(s, a) EjM8(t, a))). Proof. The proof is essentially the same as in Chapter 3 (see Theorem 3.8). V Corollary 7.10. Given a MSM M = <I, I', S, so, 8, 00>, there is an algorithm for computing EM. Proof. See the exercises. EOMwill generally have one equivalence class for each symbol in I'; rk(EoM) could be less than II I'] if some output symbols are not printed by any state (remember that equivalence classes are by definition nonempty). The rule for computing Ei+1Mfrom EiMis identical to that given for Mealy machines (and DFAs); only the starting point, EOM' had to be redefined for Moore machines (compare with Lemma 7.4). Lemmas 7.3 and 7.6 imply that there is an algorithm for finding EM for any Moore machine M; this was the final computation needed to produce Me/EM"> which will be the minimal Moore machine equivalent to the MSM M. V Corollary 7.11. Given a MSM M = <I, I', S, so, 8,00>, there is an algorithm for computing the minimal machine equivalent to M. Proof. See the exercises. Sec. 7.4 Transducer Applications and Circuit Implementation 237 7.4 TRANSDUCER APPLICATIONS AND CIRCUIT IMPLEMENTATION The vending machine example that began this chapter showed that the transducer was capable of modeling many of the machines we deal with in everyday life. This section gives examples of several types of applications and then shows how to form the circuitry that will implement such transducers. Transducers can be used not only to model physical machinery, but can also form the basis for computational algorithms. The following example can be best thought of not as a model of a machine that receives files, but as a model of the behavior of the computer algorithm that specifies how such files are to be received. EXAMPLE 7.14 The transducer metaphor is often used to succinctly describe the structure of many algorithms commonly used in computer applications, most notably in network communications. Kermit is a popular means of transferring files between mainframes and microcomputers. A transfer is accomplished by the send portion of Kermit on the source host exchanging information with the receive portion of Kermit on the destination host. The two processes communicate by exchanging packets of information; these packets comprise the input alphabet of our model. When the Kermit protocol was examined in Chapter 1 (Example 1.16), it was noted that a full description of the algorithm must also describe the action taken upon receipt of an incoming packet; these actions comprise the output alphabet of our model. During a file transfer, the states of the receiving portion of Kermit on the destination host are R (awaiting a transfer request), RF (awaiting the name of the file to be transferred), RD (awaiting more data to be placed in the new file), and A (abort due to an unrecoverable error). The set of states will again be {A, R, RD, RF}. Expected inputs are represented by S (an initialization packet, indicating that a transfer is requested), H (a header packet, containing the name of one of the files to be created and opened),n (a data packet), Z (an end of file marker, signaling that no more data need be placed in the currently opened file), and B (break, signaling the end of transmission). Unexpected input, representing a garbled transmission, is denoted by X. The input alphabet is therefore I = {B,D, H, S, X, Z}. When Kermit receives a recognizable packet, it sends an acknowledgment (ACK) back to the other host. This action will be represented in the output alphabet by the symbol Y. When the receiver expects and gets a valid header packet, it opens the appropriate file and also acknowledges the packet. This pair of actions is represented by the output symbol. O. W will denote the writing of the packet contents to the opened file and acknowledgment of the packet, and 'P will denote that no action is taken. C will indicate that the currently opened file is closed. N will represent the transmission of a NAK (negative acknowledgment), which is used to alert the sender that a garbled packet was detected. The output alphabet is therefore r ={N,0, W, Y, 'P}. The complete algorithm is summed up in the state transition diagram given in Figure 7.10. Hardware as well as software can be profitably modeled by finite-state transducers. The column-by-column addition of two binary numbers is quite naturally 238 Finite-State Transducers Chap. 7 Figure 7.10 The state transition diagram for the receive portion of Kermit, as discussed in Example 7.14 modeled by a simple two-state FST, since the carry bit is the only piece of previous history needed by the transducer to correctly sum the current column. This discussion will focus on binary numbers in order to keep the alphabets small, but trivial extensions will make the two-state machine apply to addition in any base system. EXAMPLE 7.15 A computation such as the one shown in Figure 7.lla would be divided up into columns and presented to the FST as indicated in Figure 7.llb (shown in midcomputation). A digit from the first number and the corresponding digit from the second number are presented to the transducer as a single input symbol. With the column pairs represented by standard ordered pairs, the corresponding input tape binary adder 00110 +00011 (a) (b) Figure 7.11 (a) The addition problem discussed in Example 7.15 (b) Conceptual model of the binary adder discussed in Example 7.15 Sec. 7.4 Transducer Applications and Circuit Implementation binary adder Figure 7.12 The binary adder discussed in Example 7.15 239 might appear as in Figure 7.12 (shown at the start of computation). As illustrated by the orientation of the tape, this FST must be set up to process strings in reverse, that is, from right to left, since computations must start with the low-order bits to ensure that the correct answer is always (deterministically) computed. With states C (representing carry) and N (no carry), input alphabet I ={(O, 0), (0,1), (1, 0), (1, I)} and output alphabet r = {O, I}, this binary adder behaves as shown in the state transition diagram given for B in Figure 7.13. For the problem displayed in Figure 7.lla, the output produced by B would be 01001 (9 in binary), which is the appropriate translation of the addition problem given (6 + 3). Unfortunately, addition is not truly length preserving; adding the three-digit numbers 110 and 011 produces a binary answer that is four digits long. The adder B defined in Example 7.15 cannot correctly reflect a carry out of the most significant binary position. While the concept of finalstates is not present in our formal definition of transducers, this FST B provides an example in which it is natural to both produce continuous output and track the terminal state: if a computation ends in state C, then we know that an overflow condition has occurred. Bclearly operates correctly on all strings that have been padded with (0,0) as the last (leftmost) symbol; employing such padding is reminiscent of the use of the <EOS> symbol when building circuits for DFAs. Indeed, it might be profitable to specifically <0,0>/0 <0,1>/1 <1,0>/1 <0,1>/0 d,O>/O d,1>/1 Figure 7.13 The state transition diagram for a binary adder modeled as a Mealy machine, as discussed in Example 7.15 240 Finite-State Transducers Chap. 7 include an <EOS> symbol and have the transducer react to <EOS> by printing a y or n to indicate whether or not there was overflow. While the binary adder is only one small component of a computer, finitestate transducers can be profitably used to model complete systems; one such application involves traffic lights. The controller for a large intersection may handle eight banks of traffic signals for the various straight-ahead and left-turn lanes, as well as four sets of walk lights (see the exercises). Input about the intersection conditions is often fed to the controller from pedestrian walk buttons and metal detectors embedded in the roadway. For simplicity, we will choose a simplified intersection to illustrate how to model a traffic controller by a transducer. The simplified example nevertheless incorporates all the essential features of the more intricate intersections. A full-blown model would only require larger alphabets and more states. EXAMPLE 7.16 Consider a small north-south street that terminates as it meets a large east-west avenue, as shown in Figure 7.14. Due to the heavy traffic along the avenue, the westbound traffic attempting to turn left is governed by a left-turn signal (signal 2 in Figure 7.14). Traffic continuing west is controlled by signal 1, while signal 3 governs eastbound traffic. Vehicles entering the intersection from the south rely on signal 4. The red, yellow, and green lights of these four signals represent the output of the transducer. Protecting westbound traffic while turning left is accomplished by an output configuration of (G, G, R, R), which is meant to indicate that the first two signals are green while the eastbound and northbound lanes have red lights. The output alphabet can thus be represented by ordered foursomes of R, Y, and G (red, yellow, and green). We can succinctly define I' = {R, Y, G} X {R, Y, G} x {R, Y, G} x {R, Y, G}, though there will be some combinations (like (G, G, G, G») that are not expected to appear in the model. 100 -- -200 ,r--CKJ .: - -003 jN Figure 7.14 The intersection discussed in Example 7.16 Sec. 7.4 Transducer Applications and Circuit Implementation 241 The most prevalent output configuration is expected to be (G, R, G, R), which allows unrestricted flow of the east-west traffic on the avenue. Due to the relatively small amount of traffic on the north-south street, the designers of the intersection chose to embed the sensors IX in the left-turn lane and 13 in the northbound lane and only depart from the (G, R, G, R) configuration when a vehicle is sensed by these detectors. There is therefore a pair of inputs to our transducer, indicating the status of sensor IX and sensor 13. The four combinations will be represented by (0,0), (no traffic above either sensor), (1,0) (sensor IX active), (0, 1) (sensor 13 active), and (1, 1) (both detectors have currently sensed vehicles). The controller is most naturally modeled by a Moore machine, since the state of the system is so intimately tied to the status of the four lights. From the configuration (G, R, G, R), activation of the 13 sensor signifies that all traffic should be stopped except that governed by signal 4. The output should therefore move through the pattern (Y, R, Y, R) to (R, R, R, G) and remain in that state until the 13 sensor is deactivated. This and the other transitions are illustrated in Figure 7.15. <0,1>,< 1,1> <0,1>,< 1,1> <1;0>,<1,1> <0,1> <1,0>,<1,1> Figure 7.15 The state transition diagram for a stoplight modeled as a Moore machine, as discussed in Example 7.16 242 Finite-StateTransducers Chap. 7 In actuality, the duration of patterns incorporating the yellow caution light is shorter than others. With the addition of extra states, a clock cycle length on the order of 5 seconds (commensurate with the typical length of a yellow light) could be used to govern the length of the different output configurations. For example, incorporating S8 as shown in Figure 7.16 guarantees that the output (R, R, R, G) will persist for at least two cycles (10 seconds). From an engineering standpoint, complicating the finite-state control in this manner can be avoided by varying the clock cycle length. We now discuss some of the hardware that comprise the heart of traffic controllers and vending machines. As was done with deterministic finite automata in Chapter 1 and nondeterministic finite automata in Chapter 4, finite-state transducers can be implemented with digital logic circuits. We again use a clock pulse, D flip-flops, and an encoding for the states. Besides needing an encoding for the input alphabet, it is now necessary to have an encoding for the output alphabet, which will <0,0> <0,0> <0,0>,< 1,0> <1,0>,<1,1> <0,1> Figure 7.16 The modified controller discussed in Example 7.16 Sec. 7.4 Transducer Applications and Circuit Implementation 243 be represented by the bits WI> wz,W3•••. We again suggest (solely for simplicity and standardization in the exercises) ordering the symbols in r alphabetically and assigning binary codes in ascending order, as was recommended earlier for I. We must construct a circuit for generating each wi> in the same manner as we built circuits implementing the accept function for finite automata. Many practical applications of FSTs (such as traffic signals) operate continuously, rather than starting and stopping for one small string. In such cases, an <EOS> symbol is not necessary; the circuit operates until power is shut off. Similarly, an <SOS> symbol is not essential for a traffic signal complex; upon resuming operation after a power failure, it is usually immaterial whether east-west traffic first gets a green light or whether it gets a red light in deference to the north-south traffic. In contrast, it is important for vending machines to initialize to the proper state or some interesting discounts could be obtained by playing with the power cord. EXAMPLE 7.17 Consider the FST displayed in Figure 7.17. If <EOS> and <SOS> is unnecessary, then the input alphabet can be represented by a single bit a., with a, = 0 representing c and a, = 1 representing d. Similarly, the output alphabet can be represented by a single bit WI> with WI = 0 representing a and WI = 1 representing b. The states can Figure 7.17 The state transition diagram for the Mealy machine in Example 7.17 likewise be represented by a single bit tl> with tl = 0 representing So and tl = 1 representing Sl' As before, we can construct a truth table to represent the state transition function, defining t; in terms of tl and al. The complete table is given in Table 7.5a. TABLE 7.5a t1 31 t; 1 1 1 0 1 0 1 0 0 0 0 1 The principal disjunctive normalform for the transition function is therefore seen to be t; = (tl1\ al)V (---,tll\---,al)' The output function can be found in a similar manner, as shown in Table 7.5b. Thus, WI = (tl tal)' As in Example 1.12, the circuit for tl will.be fed back into the D flip-flop(s); the circuit for WI will form the output for the machine (replacing the acceptance circuit used in DFAs). The complete network is shown in Figure 244 Finite-State Transducers Chap. 7 TABLE 7.5b t l al WI 1 1 0 0 1 1 1 0 1 0 0 1 7.18. Note that we would want the output device to print on the rising edge of the clock cycle, before the new value of t1 propagates through the circuitry. A larger output alphabet would require an encoding of several bits; each Wi would have its own network of gates; and the complete circuit would then simultaneously generate several bits of output information. As in Chapter 1, additional states or input symbols will add bits to the other encoding schemes and add to the number of rows in the truth tables for 8 and eo. Each additional state bit will also require its own D flip-flop and a new truth table for its feedback loop. Each additional state bit doubles the number of states that can be represented, which means that, as was the case with deterministic finite automata, the number of flip-flops grows as the logarithm of the number of states. W, In t. t1 Figure 7.18 Circuit diagram for Example 7.17 EXERCISES 7.1. Let A = <I, I', S, se,8, 00> be a Mealy machine. Prove the following statements from Theorem 7.1: (a) (Vx E 1*)(Vy E 1*)(Vt E S)(w(t,yx) = w(t,y)'w{8(t,y),x» (b) (Vx E 1*)(Vy E 1*)(Vs E S)(8(s,yx) = 8(8(s,y),x» 7.2. Refer to Lemma 7.1 and prove: (a) (VsE SA)(VX E 1*)«J.I.(8A(s,x» = 8B(J.I.(s), x» (b) (VsE SA)(Vx1*)(wA(s,x) = WB(J.I.(S),x))) 7.3. Prove Corollary 7.3. Chap. 7 Exercises 245 7.4. Prove Corollary 7.4 by showing that a necessary and sufficient condition for a Mealy machine M to be minimal is that M is both reduced and connected. 7.5. Show that any FfD functionfmust satisfy a "pumping lemma." (a) Devise the statement of a theorem that shows that the way any sufficiently long string is translated determines how an entire sequence of longer strings are translated. (b) Prove the statement made in part (a). 7.6. In each of the following parts, you may assume the results in the preceding parts; for example, you may assume parts (a) and (b) when proving (c). (a) Prove Lemma 7.3a. (b) Prove Lemma 7.3b. (c) Prove Lemma 7.3c. (d) Prove Lemma 7.3d. (e) Prove Lemma 7.3e. 7.7. Given aFST M = <I,r, S, sO, &, 00>,prove the following statements from Lemma 7.4: (a) EOMhas just one equivalence classes, which consists of all of S. (b) ElM is defined by s ElM t ~ ('v'a E I)(oo(s, a) = oo(t, a». (c) ('v's E S)('v't E S)('v'i 2: l)(s Ei+ IMt ~ SEiMt!\ ('v'a E I)(&(s, a) EiM&(t, a))). 7.8. Prove Theorem 7.6 by showing that if A = <I, I', S; ss, &, 00> is a Moore machine then ('v'x E I*)('v'y E I*)('v't E S)(w(t,yx) = w(t,y)*w(8(t,y),x». 7.9. Prove Theorem 7.7. 7.10. Prove Theorem 7.8. 7.11. Use Lemma 7.6 to find Ee in Example 7.10. 7.12. Show that there is a homomorphism from the machine M in Example 7.11 to the machine 8 in Example 7.2. 7.13. Prove that, in a FST M = <I, r, S, ss, &, 00>, ('v'tE S)('v'a E I)(w(t, a) = oo(t, a». 7.14. Modify the vending machine in Example 7.1 so that it can return all the coins that have been inserted. Let r denote a new input that represents activating the coin return, and let a represent a new output corresponding to the vending machine releasing all the coins in its temporary holding area. 7.15. Given a FST M = <I,r,S,so,&,oo> and M/EM= <I, r,SEM,SOEM' &EM,ooEM>' show that &EM is well defined. 7.16. Given a FST M = <I, r, S, ss, &, 00> and M/EM= <I,r,SEM,soEM' &EM' ooEM> , show that ooEM is well defined. 7.17. Give an example that shows that requiring a FST M to be reduced is not a sufficient condition to ensure that M is minimal. 7.18. Show that the function Il. defined in the proof of Theorem 7.5 is well defined. 7.19. Given the function Il. defined in the proof of Theorem 7.5, prove that Il. is really an isomorphism; that is: (a) Il.(SOI) = S02' (b) ('v's E Sl)('v'aE I)(Il.(&I(s,a» = &,(Il.(s), a» (c) ('v's E Sl)('v'aE I)(ool(s, a) = oo2(Il.(S), a» (d) Il. is a one-to-one function between S, and S2. (e) Il. is onto S2. 7.20. Consider a transducer that implements a "one-unit delay" over the alphabets I = {a, b} 246 Finite-State Transducers Chap. 7 and r = {a, b, x}. The first letter of the output string should be x, and the nth letter of the output string should be the n 1st letter of the input string (for n > 1). Thus, w(abbab) = xabba, and so on. (a) Define a sextuple for a Mealy machine that will perform this translation. (b) Draw a Mealy machine that will perform this translation. (c) Define a sextuple for a Moore machine that will perform this translation. (d) Draw a Moore machine that will perform this translation. 7.21. Consider the circuit diagram that would correspond to the vending machine in Example 7.l. (a) Does there appear to be any reason to use an <EaS> symbol in the input alphabet? Explain. (b) Does there appear to be any reason to use an <SO'S> symbol in the input alphabet? Explain. (c) How many encoding bits are needed for the input alphabet? Define an appropriate encoding scheme. (d) How many encoding bits are needed for the output alphabet? Define an appropriate encoding scheme. (e) How many encoding bits are needed for the state names? Define an appropriate encoding scheme. (I) Write the truth table and corresponding (minimized) Boolean function for h. Try to make the best possible use of the don't-care combinations. (g) Write the truth table and corresponding (minimized) Boolean function for W2. Try to make the best possible use of the don't-care combinations. (h) Define the other functions and draw the complete circuit for the vending machine. 7.22. Consider the vending machine described in Exercise 7.14. (a) Does there appear to be any reason to use an <EaS> symbol in the input alphabet? Explain. (b) How many encoding bits are needed for the input alphabet? Define an appropriate encoding scheme. (c) How many encoding bits are needed for the output alphabet? Define an appropriate encoding scheme. (d) How many encoding bits are needed for the state names? Define an appropriate encoding scheme. (e) Write the truth table and corresponding (minimized) Boolean function for h. Try to make the best possible use of the don't-care combinations. (I) Write the truth table and corresponding (minimized) Boolean function for W3. Try to make the best possible use of the don't-care combinations. (g) Define the other functions and draw the complete circuit for the vending machine. 7.23. Use the standard encoding conventions to draw the circuit corresponding to the FST defined in Example 7.2. 7.24. Use the standard encoding conventions to draw the circuit corresponding to the FST defined in Example 7.6. 7.25. Use the standard encoding conventions to draw the circuit corresponding to the FST D defined in Example 7.8. 7.26. Give an example that shows that requiring a FST M to be connected is not a sufficient condition to ensure that M is minimal. 7.27. Consider a transducer that implements a "two-unit delay" over the alphabets I == {a, b} Chap. 7 Exercises 247 and r = {a, b, x}, The first two letters of the output string should be xx, and the nth letter of the output string should be the n 2nd letter of the input string (for n > 2). Thus, w(abbaba) = xxabba, and so on. (a) Define a sextuple for a Mealy machine that will perform this translation. (b) Draw a Mealy machine that will perform this translation. (c) Define a sextuple for a Moore machine that will perform this translation. (d) Draw a Moore machine that will perform this translation. 7.28. (a) Give an example that shows that the conclusion of Theorem 7.5 can be false if M1is not reduced. (b) What essential property of the proposed isomorphism uis now absent? 7.29. (a) Give an example that shows that the conclusion of Theorem 7.5 can be false if M1is not connected. (b) What essential property of the proposed isomorphism J.L is now absent? 7.30. (a) Give an example that shows that the conclusion of Theorem 7.5 can be false if M2is not reduced. (b) What essential property of the proposed isomorphism J.L is now absent? 7.310 (a) Give an example that shows that the conclusion of Theorem 7.5 can be false if M2is not connected. (b) What essential property of the proposed isomorphism J.L is now absent? 7.32. (a) Give an example of a FST A for which A is not reduced and ACis not reduced. (b) Give an example of a FST A for which A is not reduced and AC is reduced. 7.33. Complete the proof of Theorem 7.4 by showing: (a) (Vy E I*)(Vt E S)(w(t,y) = WEM([t]EM'y». (b) M/EM is equivalent to M. (c) M/EM is reduced. (d) If M is connected, then M/EM is connected. 7.34. Let I = {O, I} and r ={y, n}. (a) Define /1(ala2,..am)=s" if a, = 1, and let /1(ala2... am)= n" otherwise. Thus, /1(10) = yy and/1(0101) = nnnn. Demonstrate that j; is FTD. (b) Define b(ala2... am)= s" if am = 1, and let b(ala2,..am)= n" otherwise. Thus, b(10) = nn andb(0101) = yyyy. Prove that j; is not FTD. 7.35. Let I = {a, b}and r = {O, I}. Define /J(ala2oo.am) to be the first m letters of the infinite sequence 01001000100001051061071081.... Thus, h(ababababab) = 0100100010 and h(abbaa) = 01001. Argue that j, is not FTD. 7.36. Assume/is FTD. Prove that (Vx E In)(vy E I*)(Vz E I*) (the first n letters of/(xy) must agree with the first n letters of /(xz». 7.37. Consider an elevator in a building with two floors. Floor 1 has an up button u on the wall, floor two has a down button d, and there are buttons labeled 1 and 2 inside the elevator itself. The four actions taken by the elevator are close the doors, open the doors, go to floor 1, and go to floor 2. Assume that an inactive elevator will attempt to close the doors. For simplicity, assume that the model is not to incorporate sensors to test for improperly closed doors, nor are there buttons to hold the doors open, and the like. Also assume that when the elevator arrives on a given floor the call button for that floor is automatically deactivated, rather than modeling the shutoff as a component of the output. (a) Define the input alphabet for this transducer (compare with Example 7.16). 248 Finite-State Transducers Chap. 7 (b) Define the output alphabet for this transducer. (c) Define the Mealy sextuple that will model this elevator. (d) Draw a Mealy machine that will model this elevator. (e) Define the Moore sextuple that will model this elevator. (I) Draw a Moore machine that will model this elevator. (g) Without using <EOS> or <SOS>, draw a circuit that will implement the transducer defined in part (d). 7.38. Build a Mealy machine that will serve as a traffic signal controller for the intersection described in Example 7.16. 7.39. Consider the intersection described in Example 7.16 with walk signals added to the north-south crosswalks (only). As shown in Figure 7.19, there is an additional input sensor v corresponding to the pedestrian walk button and an additional component of the output that will always be in one of two states (W for walk and D for don't walk). There are walk buttons at each of the corners, but they all trip the same single input sensor; similarly, the output for the walk light is displayed on each corner, but they all change at once and can be modeled as a single component. Assume that if the walk button is activated all traffic but that on the side street is stopped, and the walk lights change from D to W. Further assume that the walk lights revert to D and W before the side street light turns to yellow. ~rrJ ~rrJ loõ ~ 200 ,--00 ,-- ----.00 3 ~rrJ fN Figure 7.19 The intersection discussed in Exercise 7.39 (a) Define the new input and output alphabets. (b) Draw a Moore machine that implements this scenario. (c) Draw a Mealy machine that implements this scenario. 7.40. Consider an intersection similar to that described in Example 7.16, as shown in Figure 7.20. There are now four left-turn signals in addition to the four straight-ahead signals and additional input sensors v and 1)for the other left-turn lanes. Assume that a normal Chap. 7 Exercises t \.. t ~ ~ ~ 4 8 100 +- -500 ,r0L] ,r 249 ~ CO/' lID 7 -- --+ (1]3 6 2 ~ ~ ~ t (N " t Figure 7.20 The intersection discussed in Exercise 7.40 alternation of straight-ahead traffic is carried out, with no left turns indicated unless the corresponding sensor is activated. Further assume that left-turn traffic will be allowed to precede the opposing traffic. (a) Define the new input and output alphabets. (b) Draw a Moore machine that implements this scenario. (c) Draw a Mealy machine that implements this scenario. 7.41. Consider an adder similar to the one in Example 7.15, but which instead models addition in base 3. (a) Define the input and output alphabets. (b) Draw a Mealy machine that performs this addition. (c) Draw a Moore machine that performs this addition. (d) Draw a circuit that will implement the transducer built in part (b); use both <EOS> and <SOS>. 7.42. Consider an adder similar to the one in Example 7.15, but which models addition in base 10. (a) Define the input and output alphabets. (b) Define the sextuple of a Mealy machine that performs this addition (by indicating the output and transitions by concise formulas, rather than writing out the 200 entries in the tables). (c) Define the sextuple of a Moore machine that performs this addition. (d) Draw a circuit that will implement the transducer built in part (b); use both <EOS> and <SOS>. 250 Finite-State Transducers Chap. 7 7.43. Consider a function f4 implementing addition in a manner similar to the function described by the transducer in Example 7.15, but that scans the characters (that is, columns of digits) from left to right (rather than right to left as in Example 7.15). Argue that f4 is not FfD. 7.44. Given a MSM M, prove the following statements from Theorem 7.9: (a) M/EMis equivalent to M. (b) M/EMis reduced. (c) If M is connected, so is M/EM. 7.45. Given a FfD function f, prove that the minimal Moore machine corresponding to f must be reduced. 7.46. Given a MSM M, prove the following statements from Theorem 7.10: (a) A necessary and sufficient condition for M to be minimal is that M is both reduced and connected. (b) Mc/EMc is minimal. 7.47. Given MSMs A = <I, r,SA,SOA' 8A, WA> and B = <I, I', SB, SOs' 8B, WB>'and a homomorphism u.: SÃ SB, prove the following statements from Lemma 7.5 and Corollary 7.9: (a) (VsE SA)(VX E I*)(j..L(8A(s,x» = 8B(j..L(s),x». (b) (VsE SA)(VX E I*)(WA(S,X) = WB(j..L(S),x))). (c) A is equivalent to B; that is,fA = fB' 7.48. Prove Corollary 7.10. 7.49. Prove Corollary 7.11. 7.50. Given a FST M = <I, I', S, sO, 8, 00> and M/EM= <I, I', SEM' SOEM' 8EM, WEM> defined by SEM = {[S]EMls E S} SOEM = [SOlEM 8EMis defined by (Va E I)(V[S]EM E SEM)(8EM([S]EM' a) = [8(s, a)]EM) and WEM is defined by (Va E I)(V[S]EM E SEM)(WEM([S]EM' a) = w(s,a» (a) Show that 8EMis well defined. (b) Show that WEM is well defined. 7.51. Given a MSM M = <I, r, S, sO, 8, 00> and M/EM= <I, r, SEM' SOEM' 8EM, WEM> defined by SEM = {[S]EMls E S} SOEM = [SO]EM 8EMis defined by (Va E I)(V[S]EM E SEM)(8EM([S]EM' a) = [8(s, a)]EM) and WEM is defined by (V(S]EM E SEM)(WEM«(S]EM) = w(s» (a) Show that 8EMis well defined. (b) Show that WEM is well defined. Chap. 7 Exercises 251 7.52. Consider the following assertion: If there is an isomorphism from A to B and A is connected, then B must also be connected. (a) Prove that this is true for isomorphisms between Mealy machines. (b) Prove that this is true for isomorphisms between Moore machines. 7.53. Consider the following assertion: If there is an isomorphism from A to Band B is connected, then A must also be connected. (a) Prove that this is true for isomorphisms between Mealy machines. (b) Prove that this is true for isomorphisms between Moore machines. 7.54. Consider the following assertion: If there is a homomorphism from A to B and A is connected, then B must also be connected." (a) Give an example of two Mealy machines for which this assertion is false. (b) Give an example of two Moore machines for which this assertion is false. 7.55. Consider the following assertion: If there is a homomorphism from A to Band B is connected, then A must also be connected. (a) Give an example of two Mealy machines for which this assertion is false. (b) Give an example of two Moore machines for which this assertion is false. 7.56. Assume A and B are connected FSTs and that there exists an isomorphism 1\1 from A to B and an isomorphism JL from B to A. Prove that 1\1 = JL-1. 7.57. Assume A and Bare FSTs and there exists an isomorphism 1\1 from A to B and an isomorphism JL from B to A. Give an example for which 1\1 f. JL-1. 7.58. Give an example of a three-state MSM for which EOAhas only one equivalence class. Is it possible for EOAto be different from E1Ain such a machine? Explain. 7.59. (a) Give an example of a Mealy machine for which M is not connected and M/EM is not connected. (b) Give an example of a Mealy machine for which M is not connected but M/EM is connected. 7.60. (a) Give an example of a Moore machine for which M is not connected and M/EM is not connected. (b) Give an example of a Moore machine for which M is not connected but M/EM is connected. 7.61. For a homomorphism JL: SA-;>Se between two Mealy machines A = <I, r, SA, SOM BA, WA> and B = <I, I', Se, SOB' Be,We>, prove (Vs, t E SA)(JL(S) E e JL(t) ~ SEAt). 7.62. For a homomorphism JL: SA-;>Sebetween two Moore machines A = <I, r, SA, SOM BA, WA> and B = <I, r, Se, SOB' Be,We>, prove (Vs, tE SA)(JL(s)EeJL(t) ~ SEAt). 7.63. (a) Give an example of a FST for which A is not reduced and AC is not reduced. (b) Give an example of a FST for which A is not reduced and AC is reduced. 7.64. (a) Give an example of a MSM for which A is not reduced and AC is not reduced. (b) Give an example of a MSM for which A is not reduced and AC is reduced. 7.65. Isomorphism (:=) is a relation in the set of all Mealy machines. (a) Prove that := is a symmetric relation; that is, formally justify that if there is an isomorphism from A to B then there is an isomorphism from B to A. 252 Finite-State Transducers Chap. 7 (b) Prove that == is a reflexive relation. (c) Show that iff and g are isomorphisms, thenfog is also an isomorphism (whenever fog is defined). (d) From the results of parts (a), (b), and (c) given above, prove that == is an equivalence relation over the set of all Mealy machines. (e) Show that homomorphism is not an equivalence relation over the set of all Mealy machines. 7.66. (a) Prove that == is an equivalence relation in the set of all Moore machines. (b) Show that homomorphism is not an equivalence relation over the set of all Moore machines. 7.67. Given a Mealy machine M = <I,r,S,so,8,w>, prove that there exists a homomorphism J.l. from M to M/EM • 7.68. Given a Moore machine M = <I,r,S,so,8,w>, prove that there exists a homomorphism J.l. from M to M/EM • 7.69. Consider the intersection presented in Example 7.16 and note that the construction presented in Figure 7.15 prevents the transducer from leaving S2 or S6 while the appropriate sensor is active. The length of time spent in each output configuration can be limited by replacing S2 with a sequence of states that ensures that the output configuration will change within, say, three clock cycles (this is similar to the spirit in which Ss was added). A similar expansion can be made with regard to S6. While this would not be a likely problem if the side street were not heavily traveled, higher traffic situations would require a different solution than that shown in Figure 7.15. (a) Modify Figure 7.15 so that the output configuration can, if necessary, remain at (R, R, R, G) for three clock cycles, but not for four clock cycles. (b) Starting with the larger transducer found in part (a), make a similar expansion to S6. (c) Starting with the larger transducer found in part (a), make an expansion to S6 in such a way that the left-turn signal is guaranteed to be green for a minimum of two clock cycles and a maximum of four clock cycles. 7.70. Consider the intersection presented in Example 7.16 and note that the construction presented in Figure 7.15 prevents the transducer from returning to So while either of the sensors is active. Thus, even ifthe length oftime spent in each output configuration was limited (see Exercise 7.69), left-turn and northbound traffic could perpetually alternate without ever allowing the east-west traffic to resume. This would not be a likely problem if the side street were not heavily traveled, buthigher traffic situations would require a different solution than the one presented in Example 7.16. (a) Without adding any states to Figure 7.15, modify the state transition diagram so that east-west traffic will receive a green light occasionally. (b) By adding new states to Figure 7.15 (to remember the last lanes that had the right of way), implement a controller that will ensure that no lane will get a second green light if any other lane that has an active sensor has yet to receive a green light. (It may be helpful to think of the east-west traffic as having an implicit sensor that is always actively demanding service). 7.71. Prove that iftwo Moore machines are homomorphic then they are equivalent. 7.72. Show that, for any FTD functionf: I*~1*, ~:ds closed under f. CHAPTER REGULAR GRAMMARS In the preceding chapters, we have seen several ways to characterize the set of FAD languages: via DFAs, NDFAs, right congruences, and regular expressions. In this chapter we will look at still another way to represent this class, using the concept of grammars. This construct is very powerful, and many restrictions must be placed on the general definition of a grammar in order to limit the scope to FAD languages. The very restrictive regular grammars will be explored in full detail in this chapter. The more robust classes of grammars introduced here will be discussed at length in later chapters. 8.1 OVERVIEW OFTHEGRAMMAR HIERARCHY Much like the rules given in Backus-Naur Form (BNF) in Chapters 0 and 1, the language-defining power of a grammar stems from the generation of strings through the successive replacement of symbols in a partially constructed string. These replacement rules form the foundation for the definition of programming languages and are used in compiler construction not only to determine correct syntax, but also to help determine the meaning of the statements and thereby guide the translation of a program into machine language. EXAMPLES.1 A BNF that describes the set of all valid FORTRAN identifiers is given below. Recall that such identifiers must begin with a letter and be followed by no more than five other letters and numerals. These criteria can be specified by the following set of rules. 253 254 Regular Grammars Chap. 8 S :: =aSj! bSj! IzSd a! b! Iz Sj:: =aSzlbSzl IzSzlalbl IziOSzllSz12Sz1 19Szl011121 19 Sz::= aS31bS31 IzS31albi !z!OS3!IS312S31 19S31011!21 19 S3::=aS4!bS4! IzS41albj IzlOS411S412S41 19S410j1121 19 S4::= aSslbSsl lzSslalbl .. *lzIOSsllSsI2Ssl .. *19SsIOI1121* .. 19 s,::=a Ib I... 1z 1011121 ... 19 The first rule specifies that S can be replaced by any of the 26 letters of the Roman alphabet or any such letter followed by the token Sj. These productions (rules) do indeed define the variable names found in FORTRAN programs. Starting with S, a derivation might proceed as S =;> sSj =;> suSz =;> sum, indicating that sum is a valid FORTRAN identifier. Invalid identifiers, such as 2a, cannot be derived from these productions by starting with S. EXAMPLE 8.2 The strings used to represent regular sets (see Chapter 6) could have been succinctly specified using BNF. Recall that regular languages over, say, {a, b, c}are described by regular expressions. These regular expressions were strings over the alphabet {ft, E, a, b, c, U,*, *,), (}, and the formal definition was quite complex. A regular expression over {a, b, c} was defined to be a sequence of symbols formed by repeated application of the following rules: I. a, b, c are each regular expressions. ii. f/J is a regular expression. iii.• is a regular expression. iv. If R, and R, are regular expressions, then so is (Rj* Rz). v, If R, and Rz are regular expressions, then so is (R, U Rz). vi. If R, is a regular expression, then so is R]', The conditions set forth above could have instead been succinctly specified by the BNF shown below. R:: =alblcIElftl(RoR)I(RUR)IR* The following is a typical derivation, culminating in the regular expression (ao(cUE»*. R=;>R* =;> (RoR)* =;> (a-R]" =;> (ao(RUR»* Sec. 8.1 Overview of the Grammar Hierarchy ~ (a*(cUR»* ~ (a.(cUe»* 255 Note that in the intermediate steps of the derivation we do not wish to consider strings such as (aoR)* to be valid regular expressions. (aoR)* is not a string over the alphabet {tl, e, a, b, c, U, 0, *,),0, and it does not represent a regular language over {a, b, c}.To generate a valid regular expression, the derivation must proceed until all occurrences of R are removed. To differentiate between the symbols that may remain and those that must be replaced, grammars divide the tokens into terminal symbols and nonterminal symbols, respectively. The following notational conventions will be used throughout the remainder of the text. Members of I will be represented by lowercase roman letters such as a, b, c, d and will be referred to as terminal symbols. A new alphabet n will be introduced, and its members will be represented by uppercase roman letters such as A, B, C, and S, and these will be called nonterminal symbols. S will often denote a special nonterminal, called the start symbol. The specification of the production rules will be somewhat different from the BNF examples given above. The common grammatical notation for rules such as S:: = aSj and S:: = bSj is S-i> aSj and S-i>bSj. As with BNF, a convenient shorthand notation for a group of productions involves the use of the I (or) symbol. The productions Zr-« aaB, Zr-« ac, Zr-« cbT, which all denote replacements for Z, could be succinctly represented by z-» aaB Iac IcbT. A production can be thought of as a replacement rule; that is, A -i> cdba indicates that occurrences of the (nonterminal) A can be replaced by the string cdba. For example, the string abBAdBc can be transformed into the string abBcdbadBc by applying the production A -i> cdba; we will write abBAdBc~ abBcdbadBc, and say that abBcdbadBc was derived (in one step) from abBAdBc. Productions may be applied in succession; for example, if both A -i> cdba and B-i> etB were available, then the following modifications of the string abBAdBc would be possible: abBAdBc~ abBcdbadBc~ abetBcdbadBc~ abefetBcdbadBc, and we might write abBAdBc~ abefetBcdbadBc to indicate that abBAdBc can produce abefetBcdbadBc in zero or more steps (three steps in this case). Note that the distinction between ~ and ~ is reminiscent of the difference between the state transition functions 8 and 8. As with the distinction between the transducer output functions wand W, the overbar is meant to indicate the result of successive applications of the underlying operation. The symbol ~ is often used in place of ~. As illustrated by Example 8.1, several nonterminals may be used in the grammar. The set of nonterminals in the grammar given for FORTRAN identifiers was comprised of {S,Sj, S2, S3, S4, Ss}. The start symbol designates which of these nonterminals should always be used to begin derivations. The previous examples discussed in this section have illustrated all the essen256 Regular Grammars Chap.S tial components of a grammar. A grammar must specify the terminal alphabet, the set of intermediary nonterminal symbols, and the designated start symbol, and it must also enumerate the set of rules for replacing phrases within a derivation with other phrases. In the above examples, the productions have all involved the replacement of single nonterminals with other strings. In an unrestricted grammar, a general replacement rule may allow an entire string IX to be replaced by another string 13. Thus, aBeD~ beA would be a legal production, and thus whenever the sequence aBeD is found within a derivation it can be replaced by the shorter string beA. V Definition 8.1. An unrestricted or type 0 grammar over an alphabet I is a quadruple G = <0, I, S, P>, where: o is a (nonempty) set of nonterminals. I is a (nonempty) set of terminal symbols (and 0 n I = 0). S is the designated start symbol (and S EO). P is a set of productions of the form IX~ 13, where IX E (0 u It, 13 E (0 U I)*. EXAMPLE 8.3 Consider the grammar Gil= <{A,B,S, T},{a, b,e},S,{S~aSBe,S~T, T~A, TB~ bT,eB~Be}> A typical derivation, starting from the start state S, would be: S~ (by applying S~ aSBe) aSBe~ (by applying S~ aSBe) aaSBeBe~ (by applying S~ T) aaTBeBe ~ (by applying TB~bT) aabTeBe ~ (by applying eB~ Be) aabTBee ~ (by applying TB~ bT) aabbTee ~ (by applying T~ A) aabbee Depending on how many times the production S~ aSBe is used, this grammar will generate strings such as A, abc, aabbee, and aaabbbeee. The set of strings that can be generated by this particular grammar is [a'b'c' Ii2=: OJ. In this sense, each grammar defines a language. Specifically, we require that derivations start with the designated start symbol and proceed until only members of I remain in the resulting string. Sec. 8.1 Overview of the Grammar Hierarchy 257 V Definition 8.2. Given a grammar G = <n,I, S, P>, the language generated by G, denoted by L(G), is given by L(G) = {x Ix E I* 1\ S~x}. 6. A language that can be defined by a type 0 grammar is called a type 0 language. Thus, as shown by the grammar G" given in Example 8.3, L(G") ={aibicili;::: O} is a type 0 language. The way grammars define languages is fundamentally different from the way automata define languages. An automaton is a cognitive device, in that it is used to directly decide whether a given string should be accepted into the language. In contrast, a grammar is a generative device: the productions specify how to generate all the words in the language represented by the grammar, but do not provide an obvious means of determining whether a given string can be generated by those rules. There are many applications in which it is important to be able to determine whether a given string can be generated by a particular grammar, and the task of obtaining cognitive answers from a generative construct will be addressed at several points later in the text. The reverse transformation, that is, producing an automaton that recognizes exactly those strings that are generated by a given grammar, is addressed in the next section. The distinction between generative and cognitive approaches to representing languages has been explored previously, when regular expressions were considered in Chapter 6. Regular expressions are also a generative construct, in the sense that a regular expression can be used to begin to enumerate the words in the corresponding regular set. As is the case with grammars, it is inconvenient to use regular expressions in a cognitive fashion: it may be difficult to tell whether a given string is among those represented by a particular regular expression. Chapter 6 therefore explored ways to transform a regular expression into a corresponding automaton. It is likewise feasible to define corresponding automata for certain grammars (see Lemma 8.2). However, Example 8.3 illustrated that some grammars produce nonFAD languages and therefore cannot possibly be represented by deterministic finite automata. The translation from a mechanical representation of a language to a grammatical representation is always successful, in that every automaton has a corresponding grammar (Lemma 8.1). This result is similar to Theorem 6.3, which showed that every automaton has a corresponding regular expression. Note that in Example 8.3 the only production that specified that a string be replaced by a shorter string was T~ A. Consequently, the length of the derived string either increased or remained constant except where this last production was applied. Rules such as aBcD~ beA, in which four symbols are replaced by only three, will at least momentarily decrease the length of the string. Such productions are called contracting productions. Grammars that satisfy the added requirement that no production may decrease the length of the derivation are called context sensitive. Such grammars cannot generate as many languages as the unrestricted grammars, but they have the added advantage of allowing derivations to proceed in 258 Regular Grammars Chap. 8 a more predictable manner. Programming languages are explicitly designed to ensure that they can be represented by grammars that are context sensitive. V Definition 8.3. A pure context-sensitive grammar over an alphabet I is a quadruple G = <O,I,S,P>, where: o is a (nonempty) set of nonterminals. I is a (nonempty) set of terminal symbols (and 0 n I = 0). S is the designated start symbol (and S EO). P is a set of productions ofthe form a-i> 13, where a E (0 U It, 13 E (0 U I)+' and Ia I~ 1131* In a derivation in a context-sensitive grammar, if S =? Xl =? X2 =? ... =? Xn , then we are assured that 1 = IS I ~ IXII -s IX21 -s ... ~ Ixnl. Unfortunately, this means that in a pure context-sensitive grammar it is impossible to begin with the start symbol (which has length 1) and derive the empty string (which is of length 0). EXAMPLE 8.4 Languages that contain A, such as {a'b'c'[r ;:::O} generated in Example 8.3 by the unrestricted grammar Gil, cannot possibly be represented by a pure contextsensitive grammar. However, the empty string is actually the only impediment to finding an alternative collection of productions that all satisfy the condition 1a I -s 1131. The language {a'b'c'] i ;::: 1}can be represented by a pure context-sensitive grammar, as illustrated by the following grammar. Let G be given by G = <{A, B, S, T},{a, b, e},S,{S-i> aSBe, S-i>aTe, T-i> b, TB-i> bT, eB-i>Be}> The derivation to produce aabbee would now be S =? (by applying S-i>aSBe) aSBe =? (by applying S-i>aTe) aaTeBe =? (by applying eB-i>Be) aaTBee =? (by applying TB -i> bT) aabTee =? (by applying T -i> b) aabbee The shortest string derivable by G is S =? aTe =? abc. In Example 8.3, the shortest derivation was S =? T =? A. Any pure context-sensitive grammar can be modified to include A by adding a new start state Z and two new productions Zr-» A and Zs--» S, where S was the Sec. 8.1 Overview of the Grammar Hierarchy 259 original start state. Such grammars and their resulting languages are generally referred to as type 1 or context sensitive. V Definition 8.4. A context-sensitive or type 1 grammar over an alphabet ~ is either a pure context-sensitive grammar or a quadruple G' = <OU{Z},~,Z,P U{Z~A,ÃS}>, where G = <O,~, S, P> is a pure context-sensitive grammar and Z f/=. 0 U~. a The only production Ct~ f3 that violates the condition ICt 1:51 f31 is Z~ A, and this production cannot playa part in any derivation other than Z =9 A. From the start symbol Z, the application Z~ A immediately ends the derivation (producing A), while the application of Z~ S will provide no further opportunity to use Z~ A, since the requirement that Z f/=. 0 U ~ means that the other productions will never allow Z to reappear in the derivation. Thus, G' enhances the generating power of G only to the extent that G' can produce A. Every string in L(G) can be derived from the productions of G', and G' generates no new strings besides A. This argument essentially proves that L(G') = L(G) U {A} (see the exercises). EXAMPLES.5 The language generated by Gil in Example 8.3 was L(G") = [a'b'c'[r ~O}. Since L(G") is [a'b'c'[r ~ I} U {A}, it can therefore be represented by a context-sensitive grammar by modifying the pure context-sensitive grammar in Example 8.4. Let G' be given by G' = <{A, B, S, T, Z},{a, b, c], Z, {S~ aSBe, S~ aTe,T~ b, TB~ bT, eB~ Be, Z~ A,Z~ S}> The derivation to produce aabbee would now be Z~ (by applying Z~ S) S =9 (by applying S~ aSBe) aSBe~ (by applying S~ aTe) aaTeBe~ (by applying eB~ Be) aaTBee =9 (by applying TB~ bT) aabTee~ (by applying T~ b) aabbee This grammar does produce A, and all other derivations are strictly lengthincreasing. Note that this was not the case in the grammar Gil in Example 8.3. The last step of the derivation shown there transformed a string of length 7 into a string 260 Regular Grammars Chap. 8 of length 6. Gil does not satisfy the definition of a context-sensitive grammar; even though only T could produce A, T could occur later in the derivation. The presence of T at later steps destroys the desirable property of having all other derivations strictly length-increasing at each step. Definition 8.4 is constructed to ensure that the start symbol Z can never appearin a later derivation step. The restriction of productions to nondecreasing length reduces the number of languages that can be generated; as discussed in later chapters, there exist type 0 languages that cannot be generated by any type 1 grammar. The restriction also allows arguments about the derivation process to proceed by induction on the number of symbols in the resulting terminal string and is crucial to the development of normal forms for context-sensitive grammars. We have already seen examples of different grammars generating the same set of words, as in the grammars Gil and G' from Examples 8.3 and 8.5. The term context sensitive comes from the fact that context-sensitive languages (that is, type 1 languages) can be represented by grammars in which the productions are all of the form aB-y---,'> ã-y, where a single nonterminal B is replaced by the string ~ in the context of the strings a on the left and -y on the right. Specialized grammars such as these, in which there are restrictions on the form of the productions, are examples of normal forms and are discussed later in the text. If the productions in a grammar all imply that single nonterminals can be replaced without regard to the context, then the grammar is called context free. In essence, this means that all productions are of the form A ---,'> ~, where the left side is just a single nonterminal and the right side is an arbitrary string. The resulting languages are also called type 2 or context free. V Definition 8.5. A pure context-free grammar over an alphabet I is a quadruple G = <fl,I,S,P>, where: fl is a (nonempty) set of nonterminals. I is a (nonempty) set of terminal symbols (and fl n I = 0). S is the designated start symbol (and S E fl). P is a set of productions of the form A---,'>~, where A E fl, ~ E (fl U I)+. Note that since the length of the left side of a context-free production is 1 and the right side cannot be empty, pure context-free grammars have no contracting productions and are therefore pure context-sensitive grammars. As with pure contextsensitive grammars, pure context-free grammars cannot generate languages that contain the empty string. V Definition 8.6. A context-free or type 2 grammar over an alphabet I is either a pure context-free grammar or a quadruple G' = <flU{Z},I,Z,P U{Z---,'>A,Z---,'>S}>, Sec. 8.2 Right-Linear Grammars and Automata 261 where G = <O,!', S, P> is a pure context-free grammar and Z $. 0 u !'. ~ Productions of the form C~ ~ are called C-rules. As was done with contextsensitive grammars, this definition uses a new start state Z to avoid all such lengthdecreasing productions except for a single one of the form Z~ 'A., which is used only for generating the empty string. Type 2 languages will therefore always be type 1 languages. Note that the definition ensures that the only production that can decrease the length of a derivation must be the Z-rule Z~ 'A.. The grammar corresponding to the BNF given in Example 8.2 would be a context-free grammar, and thus the collection of all regular expressions is a type 2 language. The grammar given in Example 8.4 is not context free due to the presence of the production eB~ Be, but this does not yield sufficient evidence to claim that the resulting language {a'b'c'] i 2: 1} is not a context-free language. To support this claim, it must be shown that no type 2 grammar can generate this language. A pumping lemma for context-free languages will be presented in Chapter 10 to provide a tool for measuring the complexity of such languages. Just as there are type 1 languages that are not type 2, there are type 0 languages that are not type 1. Note that even these very restrictive type 2 grammars can produce languages that are not FAD. As shown in Example 8.2, the language consisting of the collection of all strings representing regular expressions is context free. However, this collection is not FAD, since it is clear that the pumping lemma (Theorem 2.3) would show that a DFA could not hope to correctly match up unlimited pairs of parentheses. Consequently, even more severe restrictions must be placed on grammars if they are to have generative powers similar to the cognitive powers of a deterministic finite automaton. The type 3 grammars explored in the next section are precisely what is required. It will follow from the definitions that all type 3 languages are type 2. It is likewise clear that all type 2 languages must be type 1, and every type 1 language is type O. Thus, a hierarchy of languages is formed, from the most restrictive type 3 languages to the most robust type 0 languages. The four classes of languages are distinct; there are type 2 languages that are not type 3 (for example, Example 8.2), type 1 languages that are not type 2 (see Chapter 9), and type 0 languages that are not type 1 (see Chapter 12). 8.2 RIGHT-LINEAR GRAMMARS AND AUTOMATA The grammatical classes described in Section 8.1 are each capable of generating all the FAD languages; indeed, they even generate languages that cannot be recognized by finite automata. This section will explore a class of grammars that generate the class of regular languages: every FAD language can be generated by one of the right-linear grammars defined below, and yet no right-linear grammar can generate a non-FAD language. 262 Regular Grammars Chap. 8 V Definition 8.7. A right-linear grammar over an alphabet l is a quadruple G:::: <O,l,S,P>, where o is a (nonempty) set of nonterminals. l is a (nonempty) set of terminal symbols (and 0 n l:::: 0). S is the designated start symbol (and S EO). P is a set of productions of the form A -i> XB, where A E 0, B E (0 U h), and x El*. Right-linear grammars belong to the class of type 3 grammars and generate all the type 3 languages. Grammars that are right linear are very restrictive; only one nonterminal can appear, and it must appear at the very end of the expression. Consequently, in the course of a derivation, new terminals appear only on the right end of the developing string, and the only time the string might shrink in size is when a (final) production of the form A -i> h is applied. A right-linear grammar may have several contracting productions that produce h and may not strictly conform with the definition of a context-free grammar. However, Corollary 8.3 will show that every type 3 language is a type 2 language. Right-linear grammars generate words in the same fashion as the grammars defined in Section 8.1. The following definition of derivation is tailored to rightlinear grammars, but it can easily be generalized to less restrictive grammars (see Chapter 9). V Definition 8.8. Let G:::: <O,l,S,P> be a right-linear grammar, y El*, and A -i> XB be a production in P. We will say that yx B can be directly derived fromyA by applying the production A-i>xB, and writeyÃ yxB. Furthermore, if (xjAj~xzAz) A (XZAZ~X3A3) A"'A (xn-jAn-j~xnAn), where Xi E l* for i :::: 1,2, ... ,n, Ai E 0 for i :::: 1,2, ... ,n 1, and An E (0 U h), then we will say that xjAj derives xnAn, and write xjAj:!:;>xnAn. a While the symbol ~ might be more consistent with our previous extension notations, :!:;> is most commonly used in the literature. EXAMPLE 8.6 Let Gj::::<{T,S},{a,b},S,{S-i>aS,S-i>bT,T-i>aa}>. Then S:!:;>aabaa, since by Definition 8.2, with Xj :::: h, Xz :::: a, X3 :::: aa, X4:::: aab, Xs :::: aabaa, A, :::: A, := A3 :::: S, A4 :::: T, and As :::: h. S~ as(by applying S-i>aS) ~ aaS(by applying S-i>as) Sec. 8.2 Right-Linear Grammars and Automata 263 ::}aabT(by applying S~ bT) ::}aabaa(by applying T~ aa) Derivations similar to Example 8.1, which begin with only the start symbol Sand end with a string with symbols entirely from I (that is, which do not contain any nonterminals) will be the main ones in which we are interested. As formally stated in Definition 8.2, the set of all strings (in I *) that can be derived from the start symbol form the language generated by the grammar G and will be represented by L(G). In symbols, L(G) == {x Ix EI* 1\ S~x}. EXAMPLES.1 As in Example 8.6, consider Gl==<{T,S},{a,b},S,{S~aS,S~bT,T~aa}>. Then L(G 1) == a*baa == {baa, abaa, aabaa, ... }. Note that each of these words can certainly be produced by G1 ; the number of as at the front of the string is entirely determined by how many times the production S~aS is used in the derivation. Furthermore, no other words in I* can be derived from G1 ; beginning from S, the production S~ as may be used several times, but if no other production is used, a string of the form anS will be produced, and since Sf/=. I, this is not a valid string of terminals. The only way to remove the S is to apply the production S~ bT, which will leave a string of the form anbT, which is also not in I ". The only production that can be applied at this point is T~ aa, deriving a string of the form anbaa. A proof involving induction on n would be required to formally prove that L (G1) == {anbaa In EN} == a*baa. If G contains many productions, such inductive proofs can be truly unpleasant. EXAMPLES.S Consider the grammar Q == <{I,F},{O, 1, .},I,{I~OIllIlo.Fll.F,F~ AIOF!IF}>. L(Q) generates the set of all (terminating) binary numbers including 101.11, 011., 10.0, 0.010, and so on. In a manner similar to that used for automata and regular expressions, we will consider two grammars to be similar in some fundamental sense if they generate the same language. The following definition formalizes this notion. V Definition8.9. Twogrammars G, == <OI,I,SI,P1> and Gz == <Oz,I, Sz, Pz> are called equivalent iffL(G 1) == L(Gz), and we will write Gz= G1• A EXAMPLES.9 Consider G1 from Examples 8.6 and 8.7, and define the right-linear grammar Gs == <{Z},{a,b},Z, {Z~ aZ, Z~ baa]'>. Then L(Gs) == a*baa == L(G 1) , and therefore Gs= G1• The concept of equivalence applies to all types of grammars, whether 264 Regular Grammars Chap. 8 or not they are right linear, and hence the grammars Gil and G' from Examples 8.3 and 8.5 are likewise equivalent. Definition 8.9 marks the fourth distinct use of the operator L and the concept of equivalence. It has previously been used to denote the language recognized by a DFA, the language recognized by an NDFA, and the language represented by a regular expression [although the more precise notation L (R), which is the regular set represented by the regular expression R, has generally been eschewed in favor of the more common convention of denoting both the set and the expression by the same symbol R]. In the larger sense, then, a representation X of a language, regardless of whether X is a grammar, DFA, NDFA, or regular expression, is equivalent to another representation Y iffL (X) = L (Y). Our first goal in this section is to demonstrate that a cognitive representation of a language (via a DFA) can be replaced by a generative representation (via a right-linear grammar). In the broader sense of equivalence of representations discussed above, Lemma 8.1 shows that any language defined by a DFA has an equivalent representation as a right-linear grammar. We begin with a definition of the class of all type 3 languages. V Definition 8.10. Given an alphabet S, c§}; is defined to be the collection of all languages generated by right-linear grammars over S, 6. The language generated by G1 in Example 8.7 turned out to be FAD. We will now prove that every language iñ is FAD, and, conversely, every FAD language L has (at least one) right-linear grammar that generates L. This will show that ~ = ~};. We begin by showing that a mechanical representation A of a language is equivalent to a grammatical representation (denoted by GAin Lemma 8.1). V Lemma 8.1. Given any alphabet ~ and a DFA A = <~, Q,qo, 8, F>, there exists a right-linear grammar GAfor which L (A) = L (GA). Proof. Without loss of generality, assume Q = {qo, ql> q2,' .. ,qm}' Define GA= <Q,~,qO,PA>' where PA={q~a'8(q,a)lq E Q,aE~}U{q~Alq EF}. There is one production ofthe form s~ bt for each transition inthe DFA, and one production of the form s~ Afor each final state s in F. (It may be helpful to look over Example 8.10 to get a firmer grasp of the nature of PA before proceeding with this proof.) Note that the set of nonterminals n. is made up of the names of the states in A, and the start symbol S is the name of the start state of A. The heart of this proof is an inductive argument, which will show that for any string x = ala2' .. anE ~*, qõ al*(8(qo, a.) ~ al'a2'(8(qo, ala2» Sec. 8.2 Right-Linear Grammars and Automata 265 ~ acaz* .. an-d8(qo, alaz... an-I)) =?al'az ... an' (8(qo,alaz* .. an)) from which it follows that, if 8(qo, alaz... an) E F, then qõalaz'" an*8(qo,alaz... an)=?alaz*** an The actual inductive statement and proof is left as an exercise; given this fact, if x EL(A), then 8(qo,x) E F and there is a corresponding derivation qõx, and so x EL(GA). Thus L(A) ~L(GA)' A similarly tedious inductive argument will show that if, for some sequence of integers i], iz, •.. .i.; qio=?alqi j =?alazqiz=? ... =?alaZ ... anqin , then the string alaz ... an will cause the DFA (when starting in state qio) to visit the states qip qiz' ... ,qi n ' Furthermore, if qin E F, then, by applying the production qi n --.-. 'A, qõ alaz ... anqi n =?alaz ... an' This will show that valid derivations correspond to strings reaching final states in A, and so L (GA) ~ L (A) (see the exercises). ThusL(GA) =L(A). L1 EXAMPLE 8.10 Let 8 = <{a, b}, is, T}, S, 8, {T}> where 8(S, a) =T, 8(S, b) =T 8(T, a) =S, 8(T, b) =S This automaton is shown in Figure 8.1. Applying the construction in Lemma 8.1, we have .n = is, T}, I = {a, b}, S = S, and Pa= {S--.-. aT, S--.-. bT, T --.-. as, T--.-. bS, T --.-. 'A}. Note that the derivation S =?bT =?baS =?babT =? bab mirrors the action of the DFA as it processes the string bab, recording at each step of the derivation the string that has been processed so far, followed by the current state of B. Conversely, in trying to duplicate the action of 8 as it processes the string ab, we have S =?aT =?abS, which cannot be transformed into a string of only as and bs without processing at least one more letter, and hence ab Et=L(Ga). Since S is not a final state, it cannot be removed from the derivation, corresponding to the rejection of any string that brings us to a nonfinal state. Those strings that are accepted by 8 are exactly those that end in the state T, and for which we will have the opportunity to use the production T--.-. 'A in the corresponding derivation in Ga, which will leave us with a terminal string of only as and bs. ", 266 Regular Grammars Chap. 8 Figure 8.1 The automaton discussed in Example 8.10 Lemma 8.1 showed that a cognitive representation of a finite automaton definable language can be expressed in an appropriate generative form (via a rightlinear grammar). There are many practical applications in which it is necessary to test whether certain strings can be generated by a particular grammar. For unrestricted grammars, the answers to such questions can be far from obvious. In contrast, the specialized right-linear grammars discussed in this section can always be transformed into a simple cognitive representation: every right-linear grammar has a corresponding equivalent NDFA. V Lemma 8.2. Let!' be any alphabet and G = <O,!', S, P> be a rightlinear grammar, then there exists an NDFA AG (with x-transitions) for which L(G) =L(AG) . Proof. Define AG= <!" QG,qOG' 8G, FG>,where QG = {<z > Iz = A. V z EO V 3y E !, * and 3B EO such that B~yz is a production in P} qOG={<S>} FG={<A.>}, and 8Gis comprised of (normal) transitions of the form 8G«w>,a) ={<x>13y E(OU!,)*,3BEO ~ w =ax /\ B~yw is a production in P} oG also contains some X-transitions of the form 8G«B>, A.) = {<v> IB~ v is a production in P} As in the proof of Lemma 8.1, there is a one-to-one correspondence between paths through the machine and derivations in the grammar. Inductive statements will be the basis from which it will follow that L(AG) = L(G) (see the exercises). t::. The following example may be helpful in providing a firmer grasp of the nature of A G. EXAMPLE 8.11 Let G1 = <{T, S},{a, b},S,{S~ as, S~ bT, T~aa}>. Then AG1= <{a,b},{<aS>,<S>, <bT>, <T>, <aa>, <a>, <A.>},{<S>},8GI'{<A.>}>, where oG1is given by Sec. 8.3 Regular Grammars and Regular Expressions 267 8Gj«S>, A) = {<as>, <bT>} 8G1«T>, A}= {<aa>} 8Gj«aS>, a) = {<S>} 8Gj«bT>, b) = {<T>} 8Gj( <aa>, a) = {<a>} 8G1( <a>, a) = {<A>} and all other transitions are empty [for example, 8Gl <S>, a) = 0]. This automaton is shown in Figure 8.2. Note that abaa is accepted by this machine by visiting the states <S>, <as>, <S>, <bT>, <T>, <aa>, <a>, <A>, and that the corresponding derivation in G1 is S :;. as :;. abT:;' abaa. Figure 8.2 The automaton corresponding to the grammar G, V Theorem 8.1. Given any alphabet S, '9~ = ~~. Proof. Lemma 8.1 guaranteed that every DFA has a corresponding grammar, and so ~~ ~ '9~. By Lemma 8.2, every grammar has a corresponding NDFA, and so '9~ ~W~ = ~~. Thus '9~ = ~~. l\ 8.3 REGULAR GRAMMARS AND REGULAR EXPRESSIONS The grammars we have considered so far are called right linear because productions are constrained to have the resulting nonterminal appear to the right of the terminal symbols. We next consider the class of grammars that arises by forcing the lone nonterminal to appear to the left of the terminal symbols. V Definition 8.11. A left-linear grammar over an alphabet ~ is a quadruple G = <O,~,S,P>, where: o is a (nonempty) set of nonterminals. ~ is a (nonempty) set of terminal symbols (and 0 n ~ = 0). S is the designated start symbol (and S EO). P is a set of productions of the form Ã Bx, where A EO, B E (0 U A), and x E~*. 268 Regular Grammars Chap. 8 Note that a typical production might now look like Ã Bcd, where the nonterminal B occurs to the left of the terminal string cd. EXAMPLE 8.12 Let Gz = <{A, S},{a, b},S,{S~Abaa, Ã Aa,Ã A}>. Then L (Gz) = a*baa = {baa, abaa, aabaa, ... }= L (Gi), and so Gz = Gj (compare with Example 8.7). Note that there does not seem to be an obvious way to transform the right-linear grammar Gj discussed in Example 8.7 into an equivalent left-linear grammar such as Gz (see the exercises). As was done for right-linear grammars in the last section, we could show that these left-linear grammars also generate the set of regular languages by constructing corresponding machines and grammars (see the exercises). However, we will instead prove that left-linear grammars are equivalent in power to right-linear grammars by applying known results from previous chapters. The key to this strategy is the reverse operator r (compare with Example 4.10 and Exercises 5.20 and 6.36). V Definition 8.12. For an alphabet I, and x = ajaz* .. an-janE I*; define x r = anan-j ... aZaj. For a language L over I, define L' = {xrlx E L}. For a grammar G = <O,I,S,P>, define Gr= <O,I,S,P'>, where P' is given by P' ={ÃxrIÃx was a production in Pl. Ll V Lemma 8.3. Let G be a right-linear grammar. Then G' is a left-linear grammar, and L(Gr)=L(G)'. Similarly, if G is a left-linear grammar, then G" is a right-linear grammar, and again L(Gr) = L(G)'. Proof. A straightforward induction on the number of productions used to produce a given terminal string (see the exercises). It can be shown that S~ Bx by applying n productions from G iffS ~xrB by applying n corresponding productions from Gr. Ll EXAMPLE 8.13 Consider G3 = <{T,S},{a,b,c,d},S,{S~abS, s-s err,T~ bT, T~ b}>. Then G~ = <{T,S},{a, b,c,d},S,{S~Sba,S~Tdc,T~Tb,T~ b}>, L(G3) = (ab) *cdbb*, L(GD = b*bdc(ba)* , Sec. 8.3 and Regular Grammars and Regular Expressions 269 V Theorem 8.2. Let I, be an alphabet. Then the class of languages generated by the set of all left-linear grammars over I, is the same as the class of languages generated by the set of all right-linear grammars over I,. Proof. Let G be a left-linear grammar. Gr is then a right-linear grammar, and L(Gr) is therefore FAD by Theorem 8.1. Since ~'I. is closed under the reverse operator r (see Exercise 5.20), L(Gry is also FAD. But, by Lemma 8.3, L(Gry = L(G), and so L(G) is FAD. Hence every left-linear grammar generates a member of~'I. and therefore has a corresponding right-linear grammar. Conversely, if L is generated by a right-linear grammar, then L is a language in ~'I., and so is L' (as shown by Exercise 5.20 or 6.36). Since ~'I. =~, there is a rightlinear grammar G that generates L', and hence Gr is a left-linear grammar that generates L (why?). Thus every right-linear grammar has a corresponding left-linear grammar. Ll V Definition 8.13. A regular or type 3 grammar is a grammar that is either right-linear or left-linear. Ll Thus, the languages generated by left-linear (and hence regular) grammars are referred to as type 3 languages. The class of type 3 languages is exactly ~. V Corollary 8.1. The class of languages generated by regular grammars is equal tõ. Proof. The proof follows immediately from Theorem 8.2. With the correspondences developed between the grammatical descriptors and the mechanical constructs, it is possible to transform a regular expression into an equivalent grammar by first transforming the representation of the language into an automaton (as described in Chapter 6) and then applying Lemma 8.1 to the resulting machine. Conversely, the grammar G1 in Example 8.11 gives rise to the seven-state NDFA AG1 (using Lemma 8.2), which could in turn be used to generate seven equations in seven unknowns. These could then be solved for a regular expression representing L(G 1) via Theorems 6.1 and 6.2. A much more efficient method, which generates equations directly from the productions themselves, is outlined in the following theorem. V Theorem 8.3. Let G = <{S1> S2,' .. ,Sn}, I" S1> P> be a right-linear grammar, and for each nonterminal S, define XSi to be the set of all terminal strings that 270 Regular Grammars Chap.S can be derived from S, by using the productions in P. XS 1 then representsL(G), and these sets satisfy the language equations XSk =E, U AklXSt U A k2XSz U ... U AknXSn, for k = 1,2, ... .n where E, is the union of all terminal strings x that appear in productions of the form S,--,-. x, and A ij is the union of all terminal strings x that appear in productions of the form Si--'-' X Sj' Proof. Since SI is the start symbol, XSt is by definition the set of all words that can be derived from the start symbol, and hence XS1 = L(G). The relationships between the variables XSj essentially embody. the relationships enforced by the productions in P. ~ EXAMPLE 8.14 Consider GI from Example 8.11, in which GI = <{T, S},{a, b},S, {S--,-. as, S--'-' bT, T --,-. aa}>. The corresponding equations are x,» 0UaXs UbXT XT = aa U 0Xs U 0XT Eliminating XT via Theorem 6.2 yields X, =baa U aXs. Theorem 6.1 can be applied to this equation to yield X, = L(G I ) = a*baa. Solving these two equations is indeed preferable to appealing the resulting NDFA from Example 8.11 and solving the corresponding seven equations. EXAMPLE 8.15 Let I ={a, b, c},and consider the set of all words that end in b and for which every c is immediately followed by a. This can be succinctly described by the grammar G = <{S},{a, b, c},S, {S--'-' as, S--'-' bS, S--'-' caS, S--'-' b}>. The resulting one equation in the single unknown X, isX, = b U (a U b U ca)Xs, and Theorem 6.1 can be applied to yield a regular expression for this language; that is, X, = (a U b U ca)*b. Unfortunately, another grammar that generates this same language is G' = <{S},{a, b, c},S, {S--,-. x'S, S--'-' as, S--'-' bS, S--'-' caS, S--,-. b}>. In this case, however, the resulting one equation in the single unknown X, is X, = b U (X, U a U b uca)Xs, and Theorem 6.1 explicitly prohibits X, from appearing as a coefficient of an unknown. The equation no longer has a unique solution; other solutions are now possible, such as X, = I ". Nevertheless, the reduction described by Theorem 6.1 still predicts the correct expression for this language; that is, X, = (X, U a U b U ca)*b. For equations arising from grammatical constructs, the Sec. 8.3 Regular Grammars and Regular Expressions 271 I , desired solution will always be the minimal solution predicted by the technique used in Theorem 6.1. The condition prohibiting ~ from appearing in the set A in the equation X =E U AX was required to guarantee a unique solution. Regardless of the nature of the set A, A *E is guaranteed to be a solution, and it will be contained in any other solution, as restated in Lemma 8.4. V Lemma 8.4. Let E and A be any two sets, and consider the language equation X =E U AX. A *E is always a solution for X, and any other solution Y must satisfy the property A *E f: Y. Proof. Follows immediately from Theorem 6.1. Consider again the grammar G' = <{S},{a, b, c},S, {S-,) ~S, S-,) as, S-,) bS, S-,) caS, S-,) b}>, which generates the language (~ U aU b U ca)*b. The corresponding equation was X = E U AX, where E = b, and A = (~ U a U b U ca). Note that E represents the set of terminal strings that can be generated from S using exactly one production, while A* E = (~ U a U b U ca)b is the set of all strings that can be generated from S using exactly two productions. Similarly, A* A*E represents all terminal strings that can be generated from S using exactly three productions. By induction, it can be shown that An-I. E is the set of all strings that can be generated from S using exactly n productions. From this it follows that the minimal solution A *E is indeed the language generated by the grammar. Clearly, a useless production of the form S-,) ~S in a grammar can simply be removed from the production set without affecting the language that is generated. In the above example, it was the production S-,) ~S that was responsible for ~ appearing in the coefficient set A. It is only the nongenerative productions, which do not produce any terminal symbols, that can give rise to a nonunique solution. However, the removal of productions of the form V -,) ~T will require the addition of other productions when T is a different nonterminal than V. Theorem 9.4, developed later, will show that these grammars can be transformed into equivalent grammars that do not contain productions of the form V -,) ~T. Theorem 8.4 shows that it is not necessary to perform such transformations before producing equations that will provide equivalent regular expressions: the techniques outlined in Theorem 6.2 can indeed be used to solve systems of equations, even if the coefficients contain the empty word. Indeed, the minimal solution found in this manner will be the regular expression sought. This robustness is similar to that found in Theorem 6.3, which was stated for deterministic finite automata. Regular expressions for nondeterministic finite automata can be generated by transforming the NDFA into a DFA and then applying Theorem 6.3, but it was seen that it is both possible and more efficient to apply the method directly to the NDFA without performing the transformation. The following theorem justifies that a transforma212 Regular Grammars Chap. 8 tion to a well-behaved grammar is an unnecessary step in the algorithm for finding a regular expression describing the language generated by a right-linear grammar. V Lemma 8.5. Consider the system of equations in the unknowns Xl, X2, ... .X, given by x, = El U AllXl U A12X2U ... U Al(n-l)Xn-l U AlnXn X2= E2U A21Xl U A22X2U* ..* U A2(n-l)Xn-l U A 2nXn Xn-l = En-l U A(n-l)lXl U A(n-l)2X2 U ... U A(n-l)(n-l)Xn-l U A(n-l)nXn x, = En U AnlXl U A n2X2U ... U An(n-l)Xn-l U AnnXn a. Define E; = E; U (An*A:n .En) for all i = 1,2, ... , n -1 and Ai= AiU (Ain*A:n* Ani) for all i,j = 1,2, ... ,n 1. Any solution of the original set of equations will agree with a solution of the following set of n 1 equations in the unknowns Xl, X2, ... ,Xn-l: Xl =El U Allxl U A12x2U U Al(n-l)Xn-l X2=E2U A21Xl U A22X2U U A2(n-l)Xn-l x., = Enl U A(n-l)lXl U A(n-l)2X2 U ... U A(n-l)(n-l)Xn-l b. Given a solution to the above n 1 equations in (a), that solution can be used to find a compatible expression for the remaining unknown: X, = A:n.(En U AnlXl U A n2X2U ... U An(n-l)Xn-l) c. This system has a unique minimal solution in the following sense: Let Wl> W2,* .. ,Wn denote the solution found by eliminating variables and backsubstituting as specified in (a) and (b). If Yl> Y2, ... .Y, is any other solution to the original n equations in n unknowns, then Wl C Yl> W2C Y2,... , and WnCYn. Proof. This proof is by induction on the number of equations. Lemma 8.4 proved the basis step for n = 1. As in Theorem 6.2, the inductive step is proved by considering the last of the n equations, x, = (En U AnlXl U A n2X2U ... U An(n-l)Xn-l) U AnnXn This can be thought of as an equation in the one unknown X, with a coefficient of Ann for X n, and the remainder of the expression a "constant" term not involving X; For a given solution for X, through Xn-l, Lemma 8.4 can therefore be applied to the above equation in the one unknown X n , with coefficients Sec. 8.3 Regular Grammars and Regular Expressions 273 E = (En U AnlXl U AnzXz U ... U An(n-l)Xn-l) and A= Ann to find a minimal solution for X, for the corresponding values of X, through Xn+ This is exactly as given by part (b) above: x, = A:n*(EnU AnlXl U AnzXz U" . U An(n-l)Xn-l) or X, = A:n-E, U A:n.AnlXl U A:n.A n2XzU'" U A:n*An(n-l)Xn-l) Specifically, if X, through Xn l are represented by a minimal solution W l through Wnl. then Lemma 8.4 implies that the inclusion of W n , given by W n= A:n -E, U A:n.AnlWl U A:n .AnzWzU'" U A:n.An(n-l)Wn-l) will yield a minimal solution W1 through Wn of the original n equations in n unknowns. The minimal solution for the n 1 equations in the unknowns Xl. Xz, ... ,Xnl. denoted by W l through W n-l. can be found by substituting this particular solution W n for X, in each of the other n 1 equations. If the kth equation is represented by X, = E, U AklXl U AkZXZU ... U AknXn then the substitution will yield X, = Ek U AklXl U A k2Xz U ... U (Akn*(A:n-E, U A:n.AnlXl U A:n .AnzXzU'" U A:n.An(n-l)Xn-l)) Due to the nature of union and concatenation, no other solution for X, can possibly allow a smaller solution for Xl. Xz, ... ,Xnl to be found. Specifically, if Yn is a solution satisfying the nth equation, then Lemma 8.4 guarantees that W n ~ Y n , and consequently x, = Ek U AklXl U AkZXZU'" U AknWnc Ek U AklXl U AkZXZU'" U AknYn Thus, the minimal value for each X, is compatible with the substitution of Wn defined earlier. Hence, by using the distributive law, the revised equation becomes X, = E, U AklXl U AkZXZU'" U (Akn*A:n'EnU Akn*A:n .AnlXl U Akn*A:n.AnzXzU'" U Akn*A:n .A n(n-l)Xn-l) Collecting like terms yields X, = (E, U Akn*A:n*En) U (AklXl U Akn*A:n.AnlXl) U (Ak2Xz U Akn*A:n.AnzXz) U'" U (Ak(n-l)Xn-l U Akn*A:n*An(n-l)Xn-l), or X, = (E, U Akn*A:n*En) U (Akl U Akn*A:n .Anl)Xl U (Ak2 U Akn*A:n.Anz)XzU'" U (Ak(n-l) U Akn*A:n.An(n-l»)Xnl 274 Regular Grammars , ' , Chap. 8 The constant term in this equation is (E; UAkn*Ãn*En), and the coefficient for Xj is A.kj = A kj U (Akn•Ãn •A nj), which agrees with the formula given in (a). The substitution of X, was shown to yield a minimal set of n 1 equations in the unknowns X, through Xn-1o and the induction assumption guarantees that the elimination and back-substitution method yields a minimal solution for W1through Wn-1. Lemma 8.4 then guarantees that the solution for Wn= Ãn -E, U Ãn •An1W1U Ãn .An2W2U'" U Ãn •A n(n-1)Wn-1) is minimal, which completes the minimal solution for the original system of n equations. a As with Lemma 8.4, the minimal expressions thus generated describe exactly those terminal strings that can be produced by a right-linear grammar. In an analogous fashion, left-linear grammars give rise to a set of left-linear equations, which can be solved as indicated in Theorem 6.4. The above discussion describes the transformation of regular grammars into regular expressions. Generating grammars from regular expressions hinges on the interpretation of the six building blocks of regular expressions, as described in Definition 6.2. Since '9l is the same as ~l, all the closure properties known about ~l must also apply to '9l , but it can be instructive to reprove these theorems using grammatical constructions. Such proofs will also provide guidelines for directly transforming a regular expression into a grammar without first constructing a corresponding automaton. V Theorem 8.4. Let ~ be an alphabet. Then '9l is effectively closed under union. Proof. Let G1= <Ob~' S10 Pt> and G2= <02 ' ~ , S2, P2>be two right-linear grammars, and without loss of generality assume that 0 1n O2= 0. Choose a new nonterminal Z such that Z f/=. 0 1U O2,and consider the new grammar GU defined by GU = <01U O2U {Z},~, Z, P1 U P2 U{Z~ S10Z~ S2}>' It is straightforward to show thatL (GU) = L(G1) U L (G2) (see the exercises). From the start symbol Z there are only two productions that can be applied; if Z~ Sl is chosen, then the derivation will have to continue with productions from P1 and produce a word from L (G1) (why can't productions from ~ be applied?). Similarly, if Z~ S2 is chosen instead, the only result can be a word from L(G2) . a In an analogous fashion, effective closure of ~ can be demonstrated for the operators Kleene closure and concatenation. The proof for Kleene closure is outlined below. The construction for concatenation is left for the exercises; the technique is illustrated in Example 8.18. V Theorem 8.5. Let ~ be an alphabet. Then '9l is effectively closed under Kleene closure. Sec. 8.3 Regular Grammars and Regular Expressions 275 Proof. Let G = <0, I, S, P> be a right-linear grammar. Choose a new nonterminal Z such that Z $.0, and consider the new grammar G. defined by G. = <0 U{Z}, I, Z, P.>, where P. = <{Z~A,Z~S} U{ÃxB I(x E I*)/\(A,B E n)/\(ÃxB E P)} U{ÃxZI(x E I*)/\(A E n)/\(Ãx E P)}. That is, all productions in P that end in a nonterminal are retained, while all other productions in P are appended with the new symbol Z, and the two new productions Z~ Aand Z~ S are added. A straightforward induction argument will show that the derivations that use n applications of productions of the form Ã x Z generate exactly the words in L (G)", Consequently, L (G.) = L (G)*. Ii V Theorem 8.6. Let I be an alphabet. Then ~ is effectively closed under concatenation. Proof. See the exercises. V Corollary 8.2. Every regular expression has a corresponding right-linear grammar. Proof. While this follows immediately from the fact that '§I = 2IJI and Theorem 6.1, the previous theorems outline an effective procedure for transforming a regular expression into a right-linear grammar. This can be proved by induction on the number of operators in the regular expression. The basis step consists of the observation that expressions with zero operators, which must be of the form 0, A, or a, can be represented by the right-linear grammars <{S},I,S,{S~S}>, <{S},I,S,{S~A}>, and <{S},I,S,{S~a}>, respectively. To prove the inductive step, choose an arbitrary regular expression R with m + 1 operators, and identify the outermost operator. R must be of the form RI U R2 or RI •R2 or RT, where RI (and R2) have m or fewer operators. By the induction hypothesis, RI (and R2) can be represented as right-linear grammars, and therefore by Theorem 8.4, 8.S, or 8.6, R can also be represented by a right-linear grammar. Any regular expression can thus be methodically transformed into an equivalent right-linear grammar. Ii EXAMPLE 8.16 Let I={a,b,c}, and consider the regular expression (aUb). The grammars G1 = <{R},{a,b,c},R,{R~a}> and G2 = <{T},{a, b,c}, T,{T~ b}> can be combined as suggested in Theorem 8.4 (with A playing the role of Z) to form G = <{T,R,A},{a, b,c},A,{ÃR,ÃT,R~a,T~ b}>. 276 EXAMPLE 8.17 Regular Grammars , , Chap. 8 Consider the regular expression (a U b)*. The grammar G = <{T,R,A},{a,b,c},A,{ÃR,ÃT,R~a, T~b}> can be modified as suggested in Theorem 8.5 to form G. = <{T, R,A, Z},{a, b,c}, Z,{Z~ x.,Z~ A, Ã R,ÃT, R~aZ, T~ bZ}>. G. generates (a U b)". EXAMPLE 8.18 Consider the regular expression (a U b)*c. The grammars G. = <{T, R,A, Z},{a, b, c},Z,{Z~ x,Z~ A,Ã R,ÃT, R~aZ, T~ bZ> and G3= <{V},{a,b,c}, V,{V~c}> can be combined with modified productions to form G' =<{T,R,A,Z, V,S},{a, b,c},S, {S~Z,Z~X.V,Z~A,ÃR,ÃT,R~aZ,T~bZ,V~c}>. G' generates (a U b)*c. The previous examples illustrate the manner in which regular expressions can be systematically translated into right-linear grammars. Constructions corresponding to those given in Theorems 8.4,8.5, and 8.6 can similarly be found for left-linear grammars (see the exercises). Normal forms for grammars are quite useful in many contexts. A standard representation can be especially useful in proving theorems about grammars. For example, the construction given in Lemma 8.2 would have been more concise and easier to investigate if complex productions such as S~ bcaaT could be avoided. Indeed, if all productions in the grammar G had been of the form Ã aB or Ã x., both the state set and the state transition function of AG could have been defined more easily. Other constructions and proofs may also be able to make use of the simpler types of productions in grammars that conform to such normal forms. The following theorem guarantees that a given right-linear grammar has a corresponding equivalent grammar containing only productions that conform to the above standard. V Theorem 8.7. Every right-linear grammar G has an equivalent right-linear grammar G1 in which all productions are of the form Ã aB or Ã x.. Proof. Let G be a right-linear grammar. By Lemma 8.2, there exists an NDFA AG that is equivalent to G. From Chapter 4, Ã is an equivalent deterministic finite automaton, and Lemma 8.1 can be applied to Ã to form an equivalent Sec. 8.3 Regular Grammars and Regular Expressions 277 right-linear grammar. By the construction given in Lemma 8.1, all the productions in this grammar are indeed of the form A --') aB or A --') A. A Note that the proof given is a constructive proof: rather than simply arguing the existence of such a grammar, a method for obtaining G1 is outlined. The above theorem could have been proved without relying on automata constructs. Basically, "long" productions like T--') abcR would be replaced by a series of productions involving newly introduced nonterminals, for example, T --') aX, X --') bY, Y--') cR. Similarly, a production like T--') aa might be replaced by the sequence T--') aB, B--') aC, C--') A. If the existence of such a normal form had been available for the proof of Lemma 8.2, the construction of AG could have been simplified and the complexity of the proof drastically curtailed. Indeed, the resulting machine would have contained no A-moves. Only one state per nonterminal would have been necessary, with final states corresponding to nonterminals that had productions of the form A --') A. Productions of the form A --') aB would imply that B E 8(A, a). EXAMPLE 8.19 G = <{S, T, B, C], {a, b},S, {S--') as, S--')bT, T --') aB, B--')aC, C--')A}>can be represented by the NDFA shown in Figure 8.3. a Figure 8.3 An automaton corresponding to a grammar in normal form In practice, given an arbitrary right-linear grammar G, the work associated with finding the complex machine defined in Lemma 8.2 has simply been replaced by the effort needed to transform G into the appropriate normal form. Nevertheless, the guarantee that regular languages have grammars that conform to the above normal form is useful in many proofs, as illustrated above and in Theorem 8.8 below. As with context-free and context-sensitive languages, the contracting productions can be limited to Z--') A, where Z is the start symbol. This is only necessary if AE L; if A$. L, there need be no contracting productions at all. We wish to show how to produce a grammar with no more than one contracting production. By relying on the existence of the normal form described in Theorem 8.7, this can be done without dealing with right-linear grammars in their full generality. V Theorem 8.8. Every right-linear grammar G has an equivalent right-linear grammar GO in which the start symbol Z never appears on the right in any production, and the only length-contracting production that may appear is Z--') A. Furthermore, all other productions are of the form A --') aB or A --') a. 278 Regular Grammars Chap. 8 Proof. Let G = <0, I, S, P> be a right-linear grammar. Without loss of generality, assume that G is of the form specified by Theorem 8.7. (If G were not of the proper form, Theorem 8.7 guarantees that an equivalent grammar that is in the proper form could be found and used in place of G.) Choose a new nonterminal Z such that Z $. 0, and consider the new grammar GO defined by GO = <0 U {Z},I, Z, pO>, where pO contains Z~ S, and all productions from P of the form ÃxB, where x E I* and A, B E O. pO also contains the productions in the set {Ãal(3BEO)(ÃaBEP I\B~AEP)}. Finally, if S~A was a production in P, then Z~ Ais included in pO. Note that no other productions of the form B~ Aare part of pO. Other productions have been added to compensate for this loss. Derivations using the productions in pO typically start with Z~ S, then proceed with productions of the form Ã x B, and terminate with one production of the form Ã a. The corresponding derivation in the original grammar G would be very similar, but would start with the old start symbol S and therefore avoid the Z~ S application used in GO. The productions of the form Ã x B are common to both grammars, and the final step in GO that uses Ã a would be handled by two productions in G: Ã aB and B~ A. An induction argument on the number of productions in a derivation will show that every derivation from GO has a corresponding derivation in G that produces the same terminal string, and vice versa. Thus, L(GO) =L(G), which justifies that GO is equivalent to G. GO was constructed to conform to the conditions specified by the theorem, and thus the proof is complete. ~ V Corollary 8.3. Every type 3 language is also a type 2 language. Proof. Let L be a type 3 language. Then there exists a right-linear grammar G that generates L. By Theorem 8.8, there is an equivalent right-linear grammar GO that satisfies the definition of a context-free grammar. Thus, L is context free. ~ Section 8.1 explored several generalizations of the definition of a regular grammar, and, unlike the generalization from DFAs to NDFAs, new and larger classes of languages result from these generalizations. These new types of grammars will be explored in the following chapters, and the corresponding generalized machines will be developed. EXERCISES 8.1. Can strings like abBAdBc (where B and A are nonterminals) ever be derived from the start symbol S in a right-linear grammar? Explain. 8.2. Given A and GA as defined in Lemma 8.1, let P(n) be the statement that ('fix E 1")(3j EN) [if tõxtj then BA(to, x) = tJ. Prove that P(n) is true for all n EN. 8.3. Give regular expressions that describe the language generated by: Chap. 8 Exercises 279 (a) G4 = <{S, A, B, C, V, W, X}, {a, b, c}, S, {S~abAlbbBlccV, Ã sctex, B~ab,C~Alcs, v-s svlex, W~aalaW,X~bVlaaX}> (b) Gs = <{So,Sl, Sz};{O, I}, So,{Sõ AIOSzllSl,s,~ osd IS z,Sz~ OSzIISo> (c) G6 = <{T,Z},{a,b},Z,{Z~aZ, Z~bT, T~aZ}> (d) G7 = <{S,B, q,{a, b, c},S,{S~aSlabBlcC, B~abBIA, C~cClca}> 8.4. Use the inductive fact proved in Exercise 8.2 to formally prove Lemma 8.1. 8.5. Draw the automata corresponding to the grammars given in Exercise 8.3. 8.6. Give, if possible, right-linear grammars that will generate: (a) All words in {a, b, c}* that do not contain two consecutive bs. (b) All words in {a, b, c}* that do contain two consecutive bs. (c). All words in {a, b, c]" that have the same number of as as bs. (d) All words in {a, b, c]" that have an even number of as. (e) All words in {a, b, c]" that do not end in the letter b. (0 All words in {a, b, c]" that do not contain any cs. 8.7. Give left-linear grammars that will generate the languages described in Exercise 8.6. 8.8. Complete the inductive portion of the proof of Theorem 8.8. 8.9. Complete the inductive portion of the proof of Theorem 8.5. 8.10. Use the more efficient algorithm indicated in Theorem 8.3 to find regular expressions to describe L(G s), L(G 6 ) , and L(G 7) in Exercise 8.3. 8.11. (a) Restate Theorem 8.3 so that itgenerates valid language equations for left-linear grammars. (b) Restate Lemmas 8.4 and 8.5 for these new types of equations. (c) Use your new methods to find a regular expression for L(G z) in Example 8.12. 8.12. Consider the grammar Q = <{I, F},{O, 1, .},I,{I~OIllIlo.Fll.F,F~AIOFIIF}>. L(Q) generates the set of all (terminating) binary numbers including 101.11, 011., 10.0, 0.010, and so on. (a) Find the corresponding NDFA for this grammar. (b) Write the right-linear equations corresponding to this grammar. (c) Solve the equations found in part (b) for both unknowns. 8.13. Find right-linear grammars for: (a) (aUb)c*(dU(ab)*) (b) (aUb)*a(aUb)* 8.14. Find left-linear grammars for: (a) (a U b)c*(d U (ab)*) (b) (a U b)*a(a U b)* 8.15. (a) Describe an efficient algorithm that will convert a right-linear grammar into a left-linear grammar. (b) Apply your algorithm to G4 = <{S,A,B, C, V, W,X},{a, b,c},S,{S~abAlbbBlccV,ÃbClcX,B~ab, C~AlcS,v-s svlex, W~aalaW,X~bVlaaX}> 8.16. Describe an algorithm that will convert a given regular grammar G into another regular . grammar G I that generates the complement of L (G). 8.17. Without appealing to results from Chapter 12, outline an algorithm that will determine whether the language generated by a given regular grammar G is empty. 280 Regular Grammars Chap. 8 8.18. Without appealing to results from Chapter 12, outline an algorithm that will determine whether the language generated by a given regular grammar G is infinite. 8.19. Without appealing to results from Chapter 12, outline an algorithm that willdetermine whether two right-linear grammars GI and G2generate the same language. 8.20. Consider the grammar H = <{A,B,S},{a, b,c},S,{S~aSBc,S~A,SB~bS,cB~Bc}> Determine L (H). 8.21. What is wrong with proving that %: is closed under concatenation by using the following construction? Let GI = <01, I, SI, PI> and G2= <02,I, S2,P2> be two right-linear grammars, and, without loss of generality, assume that 0 1 ñ = 0. Choose a new nonterminal Z such that Z f/=. 0 1 U O2, and define a new grammar GO = <01 U n, U {Z},I, Z, PI U P2 U {Z~SI'~}>' Note: It is straightforward to show that L(GO)=L(G I)*L(G2) (see Chapter 9). 8.22. Prove that %:is closed under concatenation by: (a) Constructing a new grammar GO with the property that L(GO) = L(G I)*L(G 2 ) . (b) Proving that L(GO) = L(G I)*L(G2) . 8.23. Use the constructs presented in this chapter to solve the following problem from Chapter 4: Given a nondeterministic finite automaton A without A-transitions, show that it is possible to construct a nondeterministic finite automaton with A-transitionsA' with the properties (1) A' has exactly one start state and exactly one final state, and (2) L(A') =L(A). 8.24. Complete the proof of Lemma 8.2 by: (a) Defining an appropriate inductive statement. (b) Proving the statement defined in part (a). 8.25. Complete the proof of Lemma 8.3 by: (a) Defining an appropriate inductive statement. (b) Proving the statement defined in part (a). 8.26. Fill in the details in the second half of the proof of Theorem 8.2 by providing reasons for each of the assertions that were made. 8.27. (a) Refer to Example 8.7 and use induction to formally prove that L(G I ) = {a"baa In EN}. (b) Refer to Example 8.9 and use induction to formally prove that L(G s) = [a''baa]» EN}. 8.28. Notice that regular grammars are defined to have production sets that contain only right-linear-type productions or only left-linear-type productions. Consider the following grammar C, which contains both types of productions: C = <{S,A,B},{O, 1},S,{S~OAI1BloI1IA,ÃSO,B~Sl}>. Note that S=>OA=>OSO=>01BO=>01S10=>01l0. (a) Find L(C). (b) IsL(C) FAD? (c) Should the definition of regular grammars be expanded to include grammars like this one? Explain. 8.29. (a) Why was it important to assume that 0 1 ñ = 0 in the proof of Theorem 8A? Give an example. Chap. 8 Exercises 281 (b) Why was it possible to assume that 0 1 n O2 = f/J in the proof of Theorem 8.4? Give a justification. 8.30. Consider the NDFA AG defined in Lemma 8.2. If AG is disconnected, what does this say about the grammar G? 8.31. Apply Lemma 8.1 to the automata in Figure 8.4. 8.32. (a) Restate Lemma 8.1 so that it directly applies to NDFAs. (b) Prove this new lemma. (c) Assume I = {a, b, c}and apply this new lemma to the automata in Figure 8.5. a) c) e) Figure 8.4 Automata for Exercise 8.31 282 a) Regular Grammars Chap.S d) g) Figure 8.5 Automata for Exercise 8.32 8.33. Define context-free grammars for the following languages: (a) L1 all words over I* for which the last letter matches the first letter. (b) Lz = all odd-length words over I* for which the first letter matches the center letter. (c) L3 = all words over I* for which the last letter matches none of the other letters. (d) L4 all even-length words over I * for which the two center letters match. (e) Ls all odd-length words over I* for which the center letter matches none of the other letters. (I) Which of the above languages are regular? Chap. 8 Exercises 283 8.34. Define context-free grammars for the following languages: (a) L={xE{a,b}*llxla<lxlb} (b) G = {x E {a, b]" Ilxlã Ixlb} (e) K = {w E{O, 1}*lw = w r} (d) <I> = {x E {a, b, c}"13i,j, k E ~ ~ x = wbkem , where j ~ 3 and k = m} 8.35. Define context-free grammars for the following languages: (a) L1={xE{a,b}*llxla =2Ixlb} (b) Lz = {x E {a, b}*I Ix la=f Ix Ib} (e) The set of all postfix expressions over the alphabet {A, B, +, -} (d) The set of all parenthesized infix expressions over the alphabet {A, B, +, -, (,)} 8.36. Definecontext-sensitive grammars for the following languages: (a) r = {x E {O, 1, 2}*13w E {O, 1}* ~ x = w *2 o w} = {2, 121, 020, 11211, 10210, ... } (b) <I> ={x E{b}*13j E ~ ~ Ixl = 2/}= {b, bb, bbbb, b8 , b16, b32, . o.} 8.37. Consider the grammar G = <{A, 13, S},{a, b, e}, S, {S---+ aSBe, S---+ X, SB ---+ bS, eB---+ Be}> Show that this context-sensitive grammar is not equivalent to Gil given in Example 8.3, where Gil = <{A, B, S, T}, {a, b, e}, S, {S---+ aSBe, S---+ T, T ---+ X, TB---+ bT, eB---+ Be}> 8.38. Design context-free grammars that accept: (a) L1= a*(b Ue)* n{x E{a, b,e}*llxla = Ixlb + Ixlc} (b) L2 = {x E {a, b, c}" 13i,j, k E ~ ~ x = a'b'c", where i + j = k} (e) L3 = {x E {a, b, c]" I Ix la+ Ix Ib = IxIc} 8.39. Refer to Definition 8.4 and prove that L(G') = L(G) U {X}. 8.40. Refer to Definition 8.6 and prove that L(G') = L(G) U {X}. 8.41. (a) Show that if G is in the form specified in Theorem 8.8 so is G. in Theorem 8.5. (b) Give an example that shows that, even if G1 and G2 are in the form specified in Theorem 8.8, the grammar GU described in Theorem 8.4 may not be. (e) Is your construction for G* in Example 8.22 normal form preserving? .8.42. Given two left-linear grammars G1 and G2 , give a set of rules to find a new left-linear grammar that will generate: (a) L(G 1) UL(G 2) (b) L(G 1)*L(G2) (e) L(G 1)* CHAPTER CONTEXT-FREE GRAMMARS The preceding chapter explored the properties of the type 3 grammars. The next class of grammars in the language hierarchy, the type 2 or context-free grammars, are central to the linguistic aspects of computer science. Context-free grammars were originally used to help specify natural languages and are thus well-suited for defining computer languages. These context-free grammars represent a much wider class of languages than did the regular grammars. Due to the need for balancing parentheses and matched begin-end pairs (among other things), the language Pascal cannot be specified by a regular grammar, but it can be defined with a contextfree grammar. Programming languages are specificallydesigned to be representable by context-free grammars in order to take advantage of the desirable properties inherent in type 2 grammars. These properties are explored in this chapter, while Chapter 10 investigates the generalized automata corresponding to context-free languages. 9.1 PARSE TREES Derivations in a context-free grammar are similar to those of regular grammars, and the definition of derivation given below is compatible with that given in Definition 8.8. V Definition 9.1. Let ~ be any alphabet, G = <n,~, S, P> be a right-linear grammar, Cl..A-y E (I u n)*, and Ã 13 be a production in P. We will say that Cl..13')' 284 Sec. 9.1 Parse Trees 285 can be directly derived from aAy by applying the production Ã [3, and write aA"Y~a[3"Y. Furthermore, if (al~a2)I\(a2~a3)1\" 'I\(an-I~an), then we will say that 0.1 derives an and write a1!:;> an' a As with Definition 8.8, 0.1 !:;> 0.1 in zero steps. In generating a particular string, regular grammars typically allowed only a single sequence of applicable productions. Context-free grammars are generally more robust, as shown by Example 9.4, which illustrates several derivations for a single string. The special nature of the productions in a context-free grammar, which replace a single nonterminal with a string of symbols, allow derivations to be diagrammed in a treelike structure, much as sentences are diagrammed in English. For example, the rules of English specify that a sentence is composed of a subject followed by a predicate, which is reflected in the production <sentence>~ <subject> <predicate> Other rules include <noun phrase>~ <modifier><noun> and <predicate>~ <verb> <prepositional phrase> A specific sequential application of these and other rules to form an English sentence might be diagrammed as shown in Figure 9.1. Such diagrams are called parse trees or derivation trees. V Definition 9.2. A parse tree or derivation tree for a regular or context-free grammar G = <0, I, S, P> is a labeled, ordered tree in which the root node is labeled S, and the n subtrees of a node labeled A are labeled 0.1 through an only if Ã 0.1' 0.2 ... an is a production in P, and each ai E (0 U I). However, if B~ Xis a production in P, then a node labeled B may instead have a single subtree labeled A. The parse tree is called complete if no leaf is labeled with a nonterminal. a Recall that for context-free grammars only the start symbol Z can have a production of the form B~ A; regular grammars are allowed to have several such rules. EXAMPLE 9.1 As illustrated in Figure 9.1, a parse tree shows a particular sequence of substitutions allowed by a given grammar. A left-to-right rendering of the leaves of this complete parse tree yields the terminal string "the check is in the mail." 286 Context-Free Grammars Chap. 9 <sentence> <verb> <noun> /~ <subject> <predicate> /~ <prepositional phrase> /~ <preposition> :;p~ <noun phrase> /""<modifier> <modifier> <noun> the check is in the mail Figure 9.1 A parse tree for the English grammar EXAMPLE 9.2 Regular grammars form parse trees that are much more restrictive; at any given level in the tree, only one node can be labeled with a nonterminal. Figure 9.2 shows the parse tree for the word aaabaa from the grammar G1 = <{T, S},{a, b},S, {S~ as, S~ bT, T~aa}>. In general, since productions in a right-linear grammar allow only the rightmost symbol to be a nonterminal, parse trees for right-linear grammars will only allow the rightmost child of a node to have a nontrivial subtree. EXAMPLE 9.3 Given a context-free grammar G, a common task required of compilers is to scan a proposed terminal string x belonging to L(G) and build a parse tree corresponding Sec. 9.1 a ParseTrees Figure 9.2 The parse tree discussed in Example 9.2 287 to x. If G is the "regular expression" grammar defined in Example 8.2, G = <{R},{a, b, c, (,),E, e, U,*, *}, R, {R~alblcIElel(R.R)I(RUR)IR*}> and x is «aUb)**c), the desired result would be a representation of the tree shown in Figure 9.3. In a perfect world of perfect programmers, it might be appropriate to assume that x can definitely be generated by the productions in G. In our world, however, compilers must unfortunately perform the added task of determining whether it is possible to generate the proposed terminal string x, that is, whether the file presented represents a syntactically correct program. This is typically done as the parse trees are being built, and discrepancies are reported to the user. For the "regular expression" grammar used in Example 9.3, there is an algorithm for scanning the symbols of proposed strings such as «aUb)**c) to determine whether a parse tree can be constructed. In the case of a string like «aU*b), no such parse tree exists, and the string therefore cannot be generated by the grammar. If the productions of a grammar follow certain guidelines, the task of finding the correct scanning algorithm is greatly simplified. The desired properties that should be inherent in a programming language grammar are investigated later in the text. In a separate phase, after the parse trees are found, the compiler then uses the trees and other constructs to infer meaning to the program, that is, to generate appropriate machine code that reflects the advertised meaning (that is, the semantics) of the program statements. For example, the parse tree for «aUb)**c) in Figure 9.3 clearly shows both the order in which the operators U, -, and " should be applied and the expressions to which they should be applied. Given a particular complete parse tree for a string x, there may be some freedom in the order in which the associated productions are applied. 288 Context-Free Grammars Chap. 9 a U b * • c ) EXAMPLE 9.4 Figure 9.3 The parse tree discussed in Example 9.3 For the grammar G <{R},{a,b, c, (,), E., tt,u, -, *}, R, {R-,> a!blclE.lttl(R*R)I(RUR)IR*}>, each ofthe following are validderivations of the string x =«aUb)*.c). Derivation 1: R::}(R.R) . ::}(R**R) «RUR)*.R) ::}. «aUR)*.R) ::}«aUb)*.R) ::}«aUb)**c) . Derivation 2: R (R*R) ::}(R*.R) «RUR)*.R) ::}«RUR)*.c) ::}«RUb)**c) ::}«aUb)*.c) Sec. 9.1 Parse Trees 289 Derivation 3: Derivation 4: R~(R.R) ~(R.c) ~(R**c) ~ «RUR)**c) ~ «aUR)**c) ~«aUb)**c) R~(R.R) ~(R.c) ~(R**c) ~ «RUR)**c) ~ «RUb)**c) ~ «aUb)**c) V Definition 9.3. A derivation sequence is called a leftmost derivation if at each step in the sequence the leftmost nonterminal is next expanded to produce the following step. A derivation sequence is called a rightmost derivation if at each step in the sequence the rightmost nonterminal is next expanded to produce the following step. a The first ofthe derivations given in Example 9.4' is a leftmost derivation since at each step it is always the leftmost nonterminal that is expanded to arrive at the next step. Similarly, the last of these, derivation 4, is a rightmost derivation. There are many other possible derivations, such as derivations 2 and 3, which are neither leftmost nor rightmost. The restrictions on regular grammars ensure that there is never more than one nonterminal present at any point during a derivation. This linear nature of regular grammars ensures that all derivations of a parse tree follow exactly the same sequence, since there is never a choice of nonterminals to expand. Thus, the rightmost derivation of a parse tree in a regular grammar is always the same as its leftmost derivation. Parse trees in context-free grammars are generally more robust, allowing several different derivation sequences to correspond to the same tree. For a given parse tree, though, there is only one leftmost derivation. In Figure 9.4, the nodes in the parse tree for «aUb)**c) are numbered to show the order in which they would 290 Context-Free Grammars Chap. 9 Figure 9.4 The preorder traversal of the parse tree be visited by a pre order traversal. Note that the sequence in which the nonterminals would be expanded in a leftmost derivation corresponds to the order in which they appear in the preorder traversal. 9.2 AMBIGUITY Whereas each tree corresponds to a unique leftmost derivation, it is possible for a terminal string to have more than one leftmost derivation. This will happen whenever a string x corresponds to more than one parse tree, that is, whenever there are truly distinct ways of applying the productions of the grammar to form x. Grammars for which this can happen are called ambiguous. V Definition 9.4. A grammar G = <O,~, S, P> is called ambiguous if there exists a string x E ~* that corresponds to two distinct parse trees. A grammar that is not ambiguous is called unambiguous. 6. EXAMPLE 9.5 Consider the grammar G2=<{S,A},{a},S,{S--,>AA,A--'>aSa,A--,>a}>. Figure 9.5 shows the two distinct parse trees associated with the word aaaaa. Note that the leftmost derivations corresponding to these trees are indeed different: Sec. 9.2 Ambiguity 291 S=?AA =?aSaA =?aAAaA =?aaAaA =?aaaaA =?aaaaa is the sequence indicated by the parse tree in Figure 9.5a, while S=?AA =?aA =?aaSa =?aaAAa =?aaaAa =?aaaaa corresponds to Figure 9.5b. Recall that context-free grammars are used to inspect statements within a computer program and determine corresponding parse trees. Such ambiguity is undesirable in a grammar that describes a programming language, since it would be unclear which of the trees should be used to infer the meaning of the string. Indeed, this ambiguity would be intolerable if a statement could give rise to two trees that implied different meanings, as illustrated in Example 9.6 below. It is therefore of practical importance to avoid descriptions of languages that entail this sort of ambiguity. The language defined by the grammar G2 in Example 9.5 is actually quite simple. Even though Gz is not a regular grammar, it can easily be shown that L(G z) is the regular set {a', as, as, all, a", ... }. The ambiguity is therefore not inherent in the language, but is rather a consequence of the needlessly complex grammar used to describe the language. A much simpler context-free grammar is given by G3=<{T},{a},T,{T-?aaaT,T-?aa}>. This grammar happens to be right linear and is definitely not ambiguous. EXAMPLE 9.6 The following sampling from a potential programming language grammar illustrates the semantic problems that can be caused by ambiguity. Consider the grammar G, = <{<expression>, <identifier>}, {a, b, c, d, -}, <expression>, P>, where P consists of the productions 292 Context-Free Grammars Chap. 9 s a A aaa A I s /""A A a (a) s A a a a a a (b) Figure 9.5 (a) A parse tree for aaaaa in Example 9.5 (b) An alternate parse tree for aaaaa Sec. 9.2 Ambiguity 293 <expression>~ <identifier> <expression>~ <identifier> - <expression> <expression>~ <expression> - <identifier> <identifier>~ a <identifier>~ b <identifier>~ c <identifier>~ d L(G.) then contains the string a b d, which can be generated by two distinct parse trees, as shown in Figure 9.6. Figure 9.6a corresponds to the following leftmost derivation. <expression>~ <expression> - <identifier> ~ <identifier> - <expression> - <identifier> ~ a - <expression> - <identifier> ~ a - <identifier> - <identifier> ~ a b - <identifier> ~a-b-d Figure 9.6b corresponds to a different leftmost derivation, as shown below. <expression>~ <identifier> - <expression> ~ a - <expression> ~ a - <identifier> - <expression> ~ a b - <expression> ~ a b - <identifier> ~a-b-d If the productions of G. were part of a grammatical description of a programming language, there are obvious semantics associated with the productions involving the operator. The productions <expression>~ <identifier> - <expression> and <expression>~ <expression> - <identifier> indicate that two values should be combined using the subtraction operator to form . a new value. The compiler would be responsible for generating code that carried out the appropriate subtraction. Unfortunately, the two parse trees give rise to functionally different code. For the parse tree in Figure 9.6a, the subtraction will be performed left to right, while in the parse tree in Figure 9.6b the ordering of the operators is right to left. Subtraction is not a commutative operation, and the expression (a b) d will usually produce a different value than a - (b d). Ambiguity can thus be a fatal flaw in a grammar describing a programming language. 294 Context-Free Grammars <expression> Chap. 9 <expression> /~ <identifier> <expression> I <identifier> <identifier> (a) a b <expression> c <identifier> <expression> /~ <identifier> <expression> I <identifier> a (b) b c Figure 9.6 (a) A parse tree for a-bod in Example 9.6 (b) An alternate parse tree for a-bod Sec. 9.2 Ambiguity 295 In the language L(Gs) discussed in Example 9.6, the ambiguity is again not inherent in the language itself, but is rather a consequence of the specific productions in the grammar G, describing the language. In most programming languages, the expression a b d is allowed and has a well-defined meaning. Most languages decree that such expressions be evaluated from left to right, and hence a b d would be interpreted as (a b) d. This interpretation can be enforced by simply removing the production <expression>~ <identifier> - <expression> from G, to form the new grammar Gm = <{<expression>, <identifier>}, {a, b, c, d, -}, <expression>, P'> where P' consists of the productions <expression>~ <identifier> <expression>~ <expression> - <identifier> <identifier>~ a <identifier>~ b <identifier>~ C <identifier>~ d It should be clear that G, and Gm are equivalent, and both generate the regular language «aUbUcUd)o - )*o(aUbUcUd). Gm gives rise to unique parse trees and is therefore unambiguous. It should be noted that the language could have been defined with a single nonterminal; a simpler grammar equivalent to Gm is G, = <{T},{a, b,c, d, -}, T,{T~alblcldITT}>. However, since G, is ambiguous, it is much more difficult to work with than Gm • The pair of nonterminals <expression> and <identifier> are used to circumvent the ambiguity problem in this language. For the grammar Gm , the production <expression>~ <expression> - <identifier> contains the nonterminal <expression> to the left of the subtraction token and <identifier> to the right of the -. Since <identifier> can only be replaced by a terminal representing a single variable, the resulting parse tree will ensure that the entire expression to the left of the will be evaluated before the operation corresponding to this current subtraction token is performed. In this fashion, the distinction between the two nonterminals forces a left-to-right evaluation sequence. In fact, a more robust language with other operators like x and -:will require more nonterminals to enforce the default precedence among these operators. Most modern programming languages employ a solution to the ambiguity problem that is different from the one just described. Programmers generally do not want to be constrained by operators that can only be evaluated from left to right, and hence matched parentheses are used to indicate an order of evaluation that may differ from the default. Thus, unambiguous grammars that correctly reflect the meaning of expressions like d - (b c) or even (a) - «c - (d))) are sought. 296 EXAMPLE 9.7 Context-Free Grammars Chap. 9 The following grammar Gp allows expressions with parentheses, minus signs, and single-letter identifiers to be uniquely parsed. Gp = <{<expression>, <identifier>}, {a, b, C, d, -, (, )}, <expression>, P"> where P" consists of the productions <expression> -,'> «expression» <expression> -,'> <expression> - «expression» <expression> -,'> <identifier> <expression> -,'> <expression> - <identifier> <identifier> -,'> a <identifier> -,'> b <identifier> -,'> C <identifier> -,'> d The first two productions in P", which were not present in P', are designed to handle the balancing of parentheses. The first rule allows superfluous sets of parentheses to be correctly recognized. The second rule ensures that an expression that is surrounded by parentheses is evaluated before the operator outside those parentheses is evaluated. In the absence of parentheses, the left-to-right ordering of the operators is maintained. Figure 9.7 illustrates the unique parse tree for the expression (a) - ((c - (d))). Gp is a context-free language that is too complex to be regular; the pumping lemma for regular sets (Theorem 2.3) can be used to show that is impossible for a DFA to maintain an unlimited number of corresponding balanced parentheses. This language, and the others discussed so far, can all be expressed by unambiguous grammars. It should be clear that every language generated by grammars has ambiguous grammars that also generate it, since an unambiguous grammar can always be modified to become ambiguous. What is not immediately clear is whether there are languages that can only be generated by ambiguous grammars. V Definition 9.5. A context-free language L is called inherently ambiguous if every grammar that generates L is ambiguous. A context-free language that is not inherently ambiguous is called unambiguous. 6. . V Definition 9.6. Let the class of context-free language L over the alphabet I be denoted by '€~. Let the-class of unambiguous context-free languages be denoted by OU~. 6. V Theorem 9.1. There are context-free languages that are inherently ambiguous; that is, OU~ is properly contained in '€~. Sec. 9.2 Ambiguity <expression> <expression> I <identifier> I 297 a c d Figure 9.7 The parse tree discussed in Example 9.7 Proof. The language L = {anbnemdmln,m EN} U {aiJ>ieidili,j E N}is a contextfree language (see the exercises). L is also inherently ambiguous, since there must exist two parse trees for some of the strings in the intersection of the two sets {anbnemdmln,mE!\I} and {aiJ>ieidili,jEN}. The proof of this last statement is tedious to formalize; the interested reader is referred to [HOPe]. ~ Theorem 9.1 states that there exist inherently ambiguous type 2 languages. No type 3 language is inherently ambiguous. Even though there are regular grammars that are ambiguous, every regular grammar has an equivalent grammar that is unambiguous. This assertion is supported by the following examples and results. EXAMPLE 9.8 Consider the following right-linear grammar G;; G, = <{S, A, C},{a, b, e},S, {S--?Abc, S--? abC, A--? a, C--? e}> Only one terminal string can be derived from Gn but this word has two distinct derivation trees, as shown in Figure 9.8. Thus, there are regular grammars that are ambiguous. V Theorem 9.2. Given any right-linear grammar G =<n,!', S, P>, there exists an equivalent right-linear grammar that is unambiguous. ebaeba 298 Context-Free Grammars Chap. 9 S S-, / C A Figure 9.8 The parse trees discussed in Example 9.8 Proof. Let G' = G(A(id). That is, beginning with the right-linear grammar G, use the construction outlined in Lemma 8.2 to find the corresponding automaton AG• Use Definition 4.9 to remove the lambda-transitions and Definition 4.5 to produce a deterministic machine, and then apply the construction outlined in Lemma 8.1 to form the new right-linear grammar G'. By Lemma 8.2, Theorem 4.2, Theorem 4.1, and Lemma 8.1, the language defined by each of these constructs is unchanged, so G' is equivalent to G. Due to the deterministic nature of the machine from which this new grammar was built, the resulting parse tree for a given string must be unique, since only one production is applicable at any point in the derivation. A formal inductive statement of this property is left as an exercise. a V Corollary 9.1. The class '9~ of languages generated by regular grammars is properly contained in ou.~. Proof. Containment follows immediately from Theorem 9.2. Proper containment is demonstrated by the language and grammar discussed in Example 9.3. a EXAMPLE 9.9 The right-linear grammar e.= <{S, B, q, {a, b, e},S, {S--?aB, S--? abC, B--? be, C--? e}> in Example 9.8 can be transformed, as outlined in Theorem 9.2, into an unambiguous grammar. The automaton corresponding to G, found by applying the technique given in Lemma 8.2 is shown in Figure 9.9a. The version of this automaton without lambda-moves (with the inaccessible states not shown) is illustrated in Figure 9,9b. The deterministic version, with the disconnected states again removed, is given in Figure 9.9c. For simplicity, the states are relabeled in Figure 9.9d. The corresponding grammar specified by Lemma 8.1 is G' = < {So, s.. Sz,S3,S4}, {a, b, c},So, {So--?aSllbS4IeS4, Sl--?aS4IbSzleS4, Sz --?aS41 bS4/ eS3, S3 --? l\ IaS4/ bS41eS4, S4 --?aS4/ bS4/ eS4}> Sec. 9.2 Ambiguity 299 (a) (b) (c) (d) a,b,c a.b,c Figure 9.9 (a) The automaton discussed in Example 9.9 (b) The simplified automaton discussed in Example 9.9 (c) The deterministic automaton discussed in Example 9.9 (d) The final automaton discussed in Example 9.9 300 Context-Free Grammars Chap. 9 The orderly nature of this resulting type of grammar easily admits the specification of an algorithm that scans a proposed terminal string and builds the corresponding parse tree. The partial parse tree for a string such as abb would be as pictured in Figure 9.l0a. This would clearly be an invalid string since S4 cannot be replaced by A. By contrast, the tree for the word abc would produce a complete parse tree, and it is instructive to step through the process by which it is built. The root of the tree must be labeled So, and scanning the first letter of the word abc is sufficient to determine that the first production to be applied is Sõ aSI (since no other So-rule immediately produces an a). Scanning the next letter provides enough information to determine that the next SI rule that is used must be SI~ bS2, and the third letter admits the production S2~ CS3 and no other. Recognizing the end of the string causes a check for whether the current nonterminal can produce the empty string. Since S3~ A is in the grammar, the string abc is a valid terminal string, and corresponds to the parse tree shown in Figure 9.l0b. So "'sI", S2 "'s4 a (a) a b b b c (b) Figure 9.10 (a) The partial parse tree for the string abb (b) The parse tree for the string abc Sec. 9.3 Canonical Forms 301 Grammars that admit scanning algorithms like the one outlined above are called LLO grammars since the parse tree can be deduced using a left-to-right scan of the proposed string while looking ahead Qsymbols to produce a leftmost derivation. That is, the production that produces a given symbol can be immediately determined without regard to the symbols that follow. Note that the grammar G3 = <{T},{a}, T, {T~ aaaT,T~ aa}> is LL2; that is, upon seeing a, the scanner must look ahead two symbols to see if the end-of-string marker is imminent. In this grammar, a may be produced by either of the two T-rules; the letters following this symbol in the proposed string are an important factor in determining which production must be applied. The language described by G3 is simple enough to be defined by a grammar that is LLO, since every regular grammar can be transformed as suggested by the proof of Theorem 9.2. The deterministic orderliness of LLO grammars may be generally unattainable, but it represents a desirable goal that a compiler designer would strive to approximate when specifying a grammatical model of a programming language. When a grammar is being defined to serve as a guide to construct a compiler, an LLO grammar is clearly the grammar of choice. Indeed, if even a portion of a context-free grammar conforms to the LLO property, this is of considerable benefit. Whereas the technique outlined in Theorem 9.2 could be applied to any regular language to find a hospitable LLO grammar, programming languages are generally more complex than regular languages, and these languages are unlikely to have LLO models. For context-free languages, it is much more likely that it will not be possible to determine which production (or sequence of productions) will produce the symbol currently being scanned. In such cases, it will be necessary to look ahead to successive symbols to make this determination. A classic example of the need to look ahead in parsing programming languages is reflected in the following FORTRAN statement: D077I=1.S Since FORTRAN allows blanks within identifiers, this is a valid statement and should cause the variable D077I to be assigned the value 1.5. On the other hand, the statement D077I=1,S specifies a "do" loop, and has an entirely different meaning. A lexical analyzer that sees the three characters 'DO ' cannot immediately determine whether this represents a token for a do loop, or is instead part of a variable identifier. It may have to wait until well after the equal sign is scanned to correctly identify the tokens. 9.3 CANONICAL FORMS The definition of a context-free grammar was quite broad, and it is desirable to establish canonical forms that will restrict the type of productions that can be employed. Unrestricted context-free grammars do not admit very precise relation- , . ' 302 Context-Free Grammars Chap. 9 ships between the strings generated by the grammar and the production sequences generating those strings. In particular, the length of a terminal string may bear very little relation to the number of productions needed to generate that string. EXAMPLE 9.10 A string of length 18 can be generated with only three applications of productions from the grammar <is}, {a, b, c},S,{SabcabcS, sabcabc}> A string of length 1 can be generated by no less than five productions in the grammar It should be clear that even more extreme examples can be defined, in which the number of terminal symbols markedly dominates the number of productions, and vice versa. The pumping theorem for context-free grammars (Theorem 9.7) and other theorems hinge on a more precise relationship between the number of terminal symbols produced and the number of productions used to produce those symbols. Grammars whose production sets satisfy more rigorous constraints are needed if such relationships are to be guaranteed. The constraints should not be so severe that some context-free languages cannot be generated by a set of productions that conform to the restrictions. In other words, some well-behaved normal forms are sought. A practical step toward that goal is the abolition of productions that cannot participate in valid derivations. The algorithm for identifying such productions constitutes an application of the algorithms developed previously for finite automata. The following definition formally identifies productions that cannot participate in valid derivations. V' Definition 9.7. A production A-[3 in a context-free grammar G = <0):, S, P> is useful if it is part of a derivation beginning with the start symbol and ending with a terminal string. That is, the A-rule A -[3 is useful if there is a derivation S? oAeo=? al3w? x, where X E ~*. A production that is not useful is called useless. A nonterminal that does not appear in any useful production is called useless. A nonterminal that is not useless is called useful. Sec. 9.3 Canonical Forms 303 EXAMPLE 9.11 Consider the grammar with productions S~gAe,S~aYB,S~CY ÃbBY,ÃooC B~dd,B~D C~jVB,C~gi D~n U~kW V~baXXX,V~oV W~c x-s rv Y~Yhm This grammar illustrates the three basic ways a nonterminal can qualify as useless. 1. For the nonterminal W above, it is impossible to find a derivation from the start symbol S that produces a sentential form containing W. U also lacks this quality. 2. No derivation containing the nonterminal Y can produce a terminal string. X and V are likewise useless for the same reason. 3. B is only produced in conjunction with useless nonterminals, and it is therefore useless also. Once B is judged useless, D is seen to be useless for similar reasons. V Theorem 9.3. Every nonempty context-free language L can be generated by a context-free grammar that contains no useless productions and no useless non terminals. Proof. Note that if L were empty the conclusion would be impossible to attain: the start symbol would be useless, and every grammar by definition must have a start symbol. Assume that L is a nonempty context-free language. By Definition 8.6, there is a context-free grammar G = <0, I, S, P> that generates L. The desired grammar GU can be formed from G, with the useless productions removed from P and the useless nonterminals removed from O. The new grammar GU will be equivalent to G, since the lost items were by definition unable to participate in significant derivations. GU will then obviously contain no useless productions and no useless nonterminals. Agrammar with the desired properties must therefore exist, but the outlined argument does not indicate how to identify the items that must be removed. The following algorithm, based on the procedures used to investigate finite automata, shows how to effectively transform a context-free grammar G into an equivalent context-free grammar GU with no useless items. Several nondeterministic finite automata over the (unrelated) alphabet {I} will be considered, each identical except for the placement of the start state. The 304 Context-Free Grammars Chap. 9 states of the NDFA correspond to nonterminals of the grammar, and one extra state, denoted by w, is added to serve as the only final state. A transition from A to C will arise if a production in P allows A to be replaced by a string containing the nonterminaI C. States corresponding to nonterminals that directly produce terminal strings will also have transitions to the sole final state w. Formally, for the grammar G = <n,~, S, P> and any nonterminal BEn, define the NDFA AB=<{I},nU{w},B,8,{w}>, where 8 is defined by 8(w,I)=0, and for each A E n, let 8(A, 1) = {C/ (C E n 1\ (3a, -y E rn u ~)*)(A -HtOy E pm u {w} if (3a E ~*)(A --.7 a E P), and 8(A, 1) = {C/ (C E n 1\ (3a, -y E (n U ~)*)(A --.7 aC-yE pm otherwise. Note that, for any two nonterminals Rand Q in n, AR and AQ are identical except for the specification of the start state. The previously presented algorithms for determining the set of connected states in an automaton can be applied to these new automata to identify the useless nonterminals. As noted before, there are three basic ways a nonterminal can qualify as useless. The inaccessible states in the NDFA N correspond to nonterminals of the first type and can be eliminated from both the grammar and the automata. For each remaining nonterminal B, if the final state w is not accessible in AB, then B is a useless nonterminal of the second type and can be eliminated from further consideration in both the grammar and the automata. Checking for disconnected states in the pared-down version of N will identify useless nonterminals of the third type. The process can be repeated until no further disconnected states are found. 6. EXAMPLE 9.12 Consider again the grammar introduced in Example 9.11. The structure of each of the automata is similar to that of AS, shown in Figure 9.lla. Note that the disconnected states are indeed Wand U, which can be eliminated from the state transition table. Checking the accessibility of win N, A\ AB , AC, and ADresult in no changes, but V, X, and Y are eliminated when Av, AX, and AYare examined, resulting in the automaton displayed in Figure 9.llb. Eliminating transitions associated with the corresponding useless productions yields the automaton shown in Figure 9.llc. Checking for disconnected states in this machine reveals the remaining inaccessible states. Thus, the equivalent grammar GU with no useless nonterminals contains only the productions S--.7 gAe, A--.7 ooC, and C--.7 gi. Note that the actual language described by the NDFA AS is of no consequence, nor may any finite automaton be capable of producing the context-free language in question. However, the above method illustrates that the tools developed for automata can be brought to bear in areas that do not directly apply to FAD languages. A more efficient algorithm for identifying useless nonterminals can be Sec. 9.3 Canonical Forms (a) 305 (b) (c) Figure 9.11 (a) The automaton discussed in Example 9.12 (b) The simplified automaton discussed in Example 9.12 (c) The final automaton discussed in Example 9.12 306 Context-Free Grammars Chap. 9 found in [HOPe]. If computerized, such a tailored algorithm would consume less CPU time than if the automata modules described above were employed. In terms of the programming effort required, though, it is often more advantageous to adhere to the "toolbox approach" and adapt existing tools to new situations. Note that the algorithm developed in Theorem 9.3 relied on connectedness, and hence the specification of the final states was unimportant in this approach. With co as the lone final state, some of the decision algorithms developed in Chapter 12 could have been used in place of the connectivity and accessibility checks. Example 9.12 illustrates the simplification that can be attained by the elimination of useless productions. Further convenience is afforded by the elimination of nongenerative A-rules of the form A B. Recall that in the grammar <{Sb Sz, S3, S4, Ss},{a, b, c},Sb {SlSz, SzS3, S3S4, S4Ss,Ssa]>, all the nonterminals were useful, but the production set was still needlessly complex. V Definition 9.8. A production of the form A B, where A, B E 0, is called a unit production or a nongenerative production. .Il As with the elimination of useless nonterminals, unit productions can be removed with the help of automata constructs. The interested reader is referred to [DENN] for the constructive proof. The proof given below indicates the general algorithmic approach. V Theorem 9.4. Every pure context-free language L can be generated by a pure context-free grammar which contains no useless non-terminals and no unit productions. Every context-free language L' can be generated by a context-free grammar which contains no useless non-terminals and no unit productions except perhaps the Z-rule ZS, where Z is the new start symbol. Proof. If the first statement of the theorem is proved, the second will follow immediately from Definition 8.6. If L is a pure context-free language, then by Definition 8.5 there is a pure context-free grammar G = <O,!', S, P> that generates L. Divide the production set up into P" and P", the set of unit productions and the set of nonunit productions, respectively. For each nonterminal B found in P", find BU ={CIB~ C}, the unit closure of B. The derivations sought must all come from the (finite) set P", and there is clearly an algorithm that correctly calculates BU. In fact, BU is represented by the set of accessible states in a suitably defined automaton (see the exercises). Define a new grammar G' = <O,!"S,P'>, where P' =pnU{B_aIB is a nonterminal in P" 1\ CEBu 1\ C_aEpn}. A straightforward induction argument shows that G' is equivalent to G, and G I contains no unit productions. Note that if G is pure, so is G'. G' is likely to contain useless nonterminals, even if all the productions in G were useful (see Example 9.13). However, the algorithm from Theorem 9.3 can now be applied to G' to eliminate useless nonterminals. Since that algorithm creates no new productions, the resulting grammar will still be free of unit productions. .Il . Sec. 9.3 Canonical Forms 307 EXAMPLE 9.13 Consider again the pure context-free grammar <{SJ, S2,S3,S4,Ss}, {a, b, e},SJ, {Sc-~ S2,S2~ S3,S3~ S4,S4~ Ss,Ss~ a}> The production set is split into P" = {Ss~ a} and P" = {SI~ S2,S2~ S3'S3~ S4,S4~ Ss}. The unit-closure sets are S~ ={SJ,S2'S3'S4,Ss} S~ ={S2'S3'S4,Ss} S~ = {S3'S4,Ss} S~ == {S4'Ss} S~ = {Ss} Since Ss~ a and Ss E S~, the production S3~ a is added to P'. The full set of productions is P'= {SI~a,S2~a, S3~ a, S4~ a, Ss~ a}. The elimination of useless nonterminals and productions results in the grammar <{SI}, {a, b, e}, SJ,{SI~ a]'>. EXAMPLE 9.14 Consider the context-free grammar with productions Z~S,Z~A S~CBh,S~D ÃaaC B~Sf,B~ggg C~ eA, C~ d, C~ C D~E,D~SABC E~be The unit closures of each of the appropriate nonterminals and the new productions they imply are shown below. Note that Z~ S is not considered and that the productions suggested by C~ C are already present. S~D S~SABC S~E S~be D~E D~be C~C C~eA,C~d 308 Context-Free Grammars Chap. 9 The new set of productions is therefore Z-+S, Z-+A S-+ SABC, S-+ be, S-+ CBh A-+aaC B-+ Sf,B-+ ggg C-+cA,C-+d D -+ be, D -+ SABC Note that D is now useless and can be eliminated. The assurance that every context-free grammar corresponds to an equivalent grammar with no unit productions is helpful in many situations. In particular, it is instrumental to the proof showing that the following restrictive type of grammar is indeed a canonical form for context-free languages. V Definition 9.9. A pure context-free grammar G = <n, I, S, P> is in pure Chomsky normal form (PCNF) if P contains only productions of the form A -+ BC and A -+ d, where Band Care nonterminals and d E I. A context-free grammar G = <n,I, Z, P> is in Chomsky normal form (CNF) if the Z-rules .Z-+ Sand Z-+ Aare the only allowable productions involving the start symbol Z, and all other productions are ofthe form A-+BC and A-+d, where Band Care nonterminals and d E I. a Thus, in PCNF the grammatical rules are limited to producing exactly two nonterminals or one terminal symbol. Few of the grammars discussed so far have met the restricted criteria required by Chomsky normal form. However, every context-free grammar can be transformed into an equivalent CNF grammar, as indicated in the following proof. The basic strategy will be to add new nonterminals and replace undesired productions such as A -+ JKcb by a set of equivalent productions in the proper form, such as A -+ JYll , Yll-+ KYI2, Y12-+ x.,Xb , x.,-+ C, Xb-+ b, where Yll , Y12, Xb, and X, are new nonterminals. V Theorem 9.5. Every pure context-free language L can be generated by a pure Chomsky normal form grammar. Every context-free language L' can be generated by a Chomsky normal form grammar. Proof. Again, if the first statement of the theorem is proved, the second will follow immediately from Definition 8.6. If L is a pure context-free language, then by Definition 8.5 there is a pure context-free grammar G = <n,I, S, P> that generates L. Theorem 9.4 shows that without loss of generality we may assume that Pcontains no unit productions. We construct a new grammar G' = <n, I, S, P"> in Sec. 9.3 Canonical Forms 309 the following manner. Number the productions in P, and consider each production in turn. If the right side of the kth production consists of only a single symbol, then it must be a terminal symbol, since there are no unit productions. No modifications are necessary in this case, and the production is retained for use in the new set of productions P'. The same is true if the kth production consists of two symbols and they are both nonterminals. If one or both ofthe symbols is a terminal, then the rule must be modified by replacing any terminal symbol a with a new nonterminal Xa. Whenever such a replacement is done, a production of the form X,~ a must also be included in the new set of productions P'. If the kth production is Ã ala2a3 ... an> where the number of (terminal and nonterminal) symbols is n > 2, then new nonterminals Ykh Yk2, ... , Y kn2 must be introduced and the rule must be replaced by the set of productions ÃalYkhYkl~a2Yk2,Yk2~a3Yk3,"" Ykn-2~an-lan' Again, if any o; is a terminal symbol such as a, it must be replaced as indicated earlier by the nonterminal Xa. Each new set of rules is clearly capable of producing the same effect as the rule that was replaced. Each nonterminal Y ki is used in only one such replacement set to ensure that the new rules do not combine in unexpected new ways. Tedious but straightforward inductive proofs will justify that L(G) = L(G'). a EXAMPLE 9.15 The grammar discussed in Example 9.14 can be transformed into CNF by the algorithm given in Theorem 9.5. After elimination of the unit productions and the consequent useless productions, the productions (suitably numbered) that must be examined are 1. S~SABC 2. S~be 3. S~CBh 4. ÃaaC 5. B~Sf 6. B~ggg 7. C~cA 8. C~d In the corresponding lists given below, notice that only production 8 is retained; the others are replaced by S~SYll, Yll~AY12' Y12~BC s-s x.x, S~ CY3h Y31~ BXh ÃXaY4h Y41~XaC B~SXr B~XgY61' Y61~XgXg C~x.,A C~d 310 Context-Free Grammars Chap. 9 and the terminal productions Xb~ b, x,,~ e, Xh~ h, X.~ a, Xr~ f, Xg~ g. Since d did not appear as part of a two-symbol production, the rule ~~ d was not needed. The above rules, with S as the start symbol, form a pure Chomsky normal form grammar. The new start symbol Z and productions Z~S and Z~A would be added to this pure context-free grammar to obtain the required CNF. Grammars in Chomsky normal form allow an exact correspondence to be made between the length of a terminal string and the length of the derivation sequence that produces that string. If the empty string can be derived, the production sequence will consist of exactly one rule application (Z~ A). A simple inductive argument shows that, if a string of length n > 0 can be derived, the derivation sequence must contain exactly 2n steps. In the grammar derived in Example 9.15, for example, the following terminal string of length 5 is generated in exactly ten productions: Z => S => CY31 => dY31 => dBXh=> dSXrXh=> dXbx"XrXh=> dbx"XrXh => dbeXrXh=> dbefXh=> dbefh Other useful properties are also assured for grammars in Chomsky normal form. When a grammar is in CNF, all parse trees can be represented by binary trees, and upper and lower bounds on the depth of a parse tree for a string of length n can be found (see the exercises). The derivational relationship between the number of production steps used and the number of terminals produced implies that CNF grammars generate an average of one terminal every two productions. The following canonical form requires every production to contain at least one terminal symbol, and grammars in this form must produce strings of length n ( > 0) in no more than n steps. V Definition 9.10. A pure context-free grammar G == <fl,!, S, P> is in pure Greibach normal form (PGNF) if P contains only productions of the form Ã du, where 0: E (fl U !)* and dE!. A context-free grammar G == <fl,!, Z, P> is in Greibach normal form (GNF) if the Z-rules Z~ S and Z~ Aare the only allowable productions involving the start symbol Z, and all other productions are of the form Ã dn, where 0: E (fl U !)* and dE!. Ii. In pure Greibach normal form, the grammatical rules are limited to producing at least one terminal symbol as the first symbol. The original grammar in Example 9.9 is a PGNF grammar, but few of the other grammars presented in this chapter meet the seemingly mild restrictions required for Greibach normal form. The main obstacle to obtaining a GNF grammar is the possible presence of left recursion. A nonterminal A is called left recursive if there is a sequence of one or more productions for which A!:? AI3 for some string 13. Greibach normal form disallows such occurrences since no production may produce a string starting with a nonterminal. Sec. 9.3 Canonical Forms 311 Replacing productions involved with left recursion is complex; but every contextfree grammar can be transformed into an equivalent GNF grammar, as shown by . Theorem 9.6. Two techniques will be needed to transform the productions into the appropriate form, and the following lemmas ensure that the grammatical transformations leave. the language unchanged. The first indicates how to remove an X-rule that begins with an undesired nonterminal; Lemma 9.1 specifies a new set of productions that compensate for the loss. V Lemma 9.1. Let G = <n, I, S, P> be a context-free grammar, and assume there is a string a and nonterminals X and B for which X~ Bo E P. Further assume that the set of all B-rules is given by {B~ 131, B~ 132, ... , B ~l3m} and let G' = <n,I,S,p'>, where pi = P U{X~ 131a,X~ 132a, ... ,X~ I3ma} - {X~ Bo}. ThenL(G) =L(G/). Proof. Let each nonterminal A be associated with the set of sentential form XA that A can produce. That is, let A=XA={XE(IUn)*IÃx}. The nonterminals then denote variables in a set of language equations that reflect the productions in P. These equations will generally not be linear; several variables may be concatenated together within a single term. Since the set of all B-rules were B~ 13], B~132, ... ,B~ 13m, XBsatisfies the equation XB= 131 U 132 U ... U 13m Similarly, ifthe X-rules other than X~ Boare X~ 'Y], X~ 'Y2, ... , X~ "t«, then X satisfies the equation x, = 'Y1 U 'Y2 U ... U 'Yn U XBa Substituting for XBin the Xx equation yields x, = 'Y1 U 'Y2 U ... U 'Yn U (131 U 132 U ... U I3m)a which by the distributive law becomes Xx = 'Y1 U 'Y2 U ... U 'Yn U 131a U 132a U U I3ma This shows why the productions X~ 131a, X~ 132a, , X~ I3ma can replace the ruleX~Ba. a The type of replacement justified by Lemma 9.1 will not eliminate left recursion. The following lemma indicates a way to remove all the left-recursive X-rules by introducing a new right-recursive nonterminal. V Lemma 9.2. Let G = <n,I,S,p> be a context-free grammar, and choose a nonterminal X E n. Denote the set of all recursive X-rules by X' = {X~Xa],X~Xa2';" ,X7Xam} and the set of all nonrecursive X-rules by 312 Context-Free Grammars , . , Chap. 9 X" = {X~ 'Yb X~ 'Yz, ... ,X~ 'Yn}. Choose a new nonterminal Y f/:. n and let GIt =<nU {Y},I,S,P">, where pit = P U {X~'YlY'X~'YZY"" ,X~'YnY}U {Y~ab Y~az, ... , Y~am}U{Y~alY' Y~azY, ... , Y~amy}-xr. Then L(G) =L(G It ) . Proof. As in Lemma 9.1, let each nonterminal A be associated with the set of sentential forms XA that A can produce, and consider the set of language equations generated by P. The Xx equation is x, = 'Yl U 'Yz U ... U 'Yn U XXal U Xxaz U ... U x,«, Solving by the method indicated in Theorem 6.4c for an equivalent expression for Xx shows that Xx = (-Yl U 'Yz U ... U 'Yn)(al U az U*** U a m)* In the new set of productions pit, the equations of interest are x, = 'Yl U 'Yz U U 'Yn U 'YlXy U 'YzXy U U 'YnXy X, = al U az U U am U alXy U azXy U U amXy Factoring each equation produces x, = 'Yl U 'Yz U U 'Yn U (-Yl U 'Yz U U 'Yn)Xy Xy = al U az U U am U (al U az U U am)Xy and the second can also be solved for an equivalent expression for Xv, yielding Xy = (al U az U*** U am)*(al U az U*** U am) Substituting this expression for X, in theXy equation produces Xx = 'Yl U 'Yz U*** U 'Yn U (-Yl U 'Yz U ... U 'Yn)(alU az U*** U am)*(al U az U ... U am) which by the distributive law becomes Xx = (-Yl U 'Yz U* .. U 'Yn)(A U (al U az U* .. U am)*(al U az U ... U am» Using the fact that AU B*B = B*, this simplifies to Xx = (-Yl U 'Yz U* .. U 'Yn)(al U az U** . U a m)* Therefore, when X, is eliminated from the sentential forms, Xx produces exactly the same strings as before. This indicates why the productions in the sets {X~'YlY'X~'YZY"" ,X~'YnY}U{Y~abY~az, ... ,Y~am}U {Y~alY' Y~azY, ... ,Y~amY} can replace the recursive X-rules X~ 'YI, X~ 'Yz, ... ,X~ 'Yn. A Note that the new production set eliminates all recursive X-rules and does not introduce any new recursive productions. The techniques discussed in Lemmas 9.1 Sec. 9.3 Canonical Forms 313 and 9.2, when applied in the proper order, will transform any context-free grammar into one that is in Greibach normal form. The appropriate sequence is given in the next theorem. V Theorem 9.6. Every pure context-free language L can be generated by a pure Greibach normal form grammar. Every context-free Language L' can be generated by a Greibach normal form grammar. Proof. Because of Definition 8.6, the second statement will follow immediately from the first. If L is a pure context-free language, then by Definition 8.5 there is a pure context-free grammar G = <{Sh Sz, . . . ,S,},!" Sh P> that generates L. We construct a new grammar by applying the transformations discussed in the previous lemmas. Phase 1: The replacements suggested by Lemmas 9.1 and 9.2 will be used to ensure that the increasingcondition is met: if Si-i> SjG~ belongs to the new grammar, then i > j. We transform the S, rules for k = r, r 1, ... ,2,1 (in that order), considering the productions for each nonterminal in turn. At the end of the ith iteration, the top i nonterminals will conform to the increasing condition. After the final step, all nonterminals (including any newly introduced ones) will conform, all left recursion will be eliminated, and we can proceed to phase 2. The procedure for the ith iteration is: If an Si-rule of the form Si-i> Sja is found where i < j. eliminate it as specified in Lemma 9.1. This may introduce other rules of the form Si-i> Spa', in which i is still less thanj'. Such new rules will likewise have to be eliminated via Lemma 9.1, but since the offending subscript will decrease each time, this process will eventually terminate. Si-rules of the form Si-i> Sja where i = j can then be eliminated according to Lemma 9.2. This will introduce some new nonterminals, which can be given new, higher-numbered subscripts. Lemma 9.2 is designed so that the new rules will automatically satisfy the increasing condition specified earlier. The remaining Si-rules must then conform to the increasing condition. The process continues with lower-numbered rules until all the rules in the new production set conform to the increasing condition. Phase 2: At this point, S, conforms to the increasing condition, and since there are no nonterminals with subscripts that are less than 1, all the Sj-rules must begin with terminal symbols, as required by GNP. The only Sz-rules that may not conform to GNF are those ofthe form Sz-i> S,«, and Lemma 9.1 can eliminate such rules by replacing them with the Sj-rules. Since all the Sj-rules now begin with terminal symbols, all the new Sz-rules will have the same property. This process is applied to Sk-rules for increasing k until the entire production set conforms to GNP. The resulting context-free grammar is in GNF, and since all modifications were of the type allowed by Lemmas 9.1 and 9.2, the new grammar is equivalent to the original. ~ 314 EXAMPLE 9.16 Context-Free Grammars Chap. 9 Consider the pure context-free grammar <{Sb Sz,S3}, {a, b, c, d, e},Sb {SI~ SISZC, SI~ S3bS3, Sz~ SlSb Sz~ d, S3~ Sze}> If the given subscript ordering is not the most convenient, the nonterminals can be renumbered. The current ordering will minimize the number of transformations needed to produce Greibach normal form, since the only production that does not conform to the increasing condition is SI~ S3bS3. Thus, the first and second steps of phase 1 are trivially completed; no substitutions are necessary. In the third step, Lemma 9.1 allows the offending production Sl~S3bS3 to be replaced by SI~ SzebS3 The new production produces the smaller-subscripted nonterminal Sz, but the new rule still does not satisfy the increasing condition. Replacing SI~ SzebS3 as indicated by Lemma 9.1 yields the two productions SI~ SlSlebS3 and SI~ debS3 At this point, the grammar contains the productions SI~ SISZC, SI~ SlSlebS3, Sl~debS3, Sz~ SISb Sz~ d, S3~ Sze The first nonterminal has a left-recursive rule that must be eliminated by introducing the new nonterminal S4. In the notation of Lemma 9.2, n = 1, m = 2, "11 = debS3, UI = Szc, and Uz = SlebS3. Eliminating SI~ SISZC and SI~ SlSlebS3 introduces the new nonterminal Y = S4 and the productions SI~ debS3S4, S4~ Szc, S4~ SlebS3, S4~ SZCS4' S4~ SlebS3S4 Phase 1 is now complete. All left-recursion has been eliminated and the grammar now contains the productions SI~ debS3S4, SI~ debS3 Sz~ SlSb Sz~ d S3~SZe S4~ Szc, S4~ SlebS3, S4~ SZCS4' S4~ SlebS3S4 all of which satisfy the increasing condition. The grammar is now setup for phase 2, in which substitutions specified by Lemma 9.1 will ensure that every rule begins with a nonterminal. The Sj-rules are in acceptable form, as is the Sz-ruleSz~ d. The other Sz-rule, Sz~ SlSb is replaced via Lemma 9.1 with Sz~ debS3S4SI and Sz~ debS3SI. Replacement of the Srrule then yields S3~ debS3S4SIe, S3~ debS3SIe and S3~ de. Sec. 9.4 Pumping Theorem 315 The S4 rules are treated similarly. The final set of productions at the completionof phase 2 contains Sc~ debS3S4, Sl~ debS3 S2~ debS3S4Sr, S2~ debS3Sl, S2~ d S3~ debS3S4S1e, S3~ debS3S1e, S3~ de S4~ de, S4~ debS3S4S1C, S4~ debS3S1c, S4~ debS3S4ebS3, S4~ debS3ebS3, S4~ debS3S4S1CS4, S4~ dCS4, S4~ debS3S1cS4, S4~ debS3S4ebS3S4, S4~ debS3ebS3S4 In this grammar, S2 is now useless and can be eliminated. . Greibach normal form is sometimes considered to require all productions to . be of the form Ã dn, where u E 0* and dE I. Such rules must produce exactly one leading terminal symbol; the rest of the string must be exclusively nonterminals. It should be clear that this extra restriction can always be enforced by a technique similar to the one employed for Chomsky normal form. The above conversion process would be extended to phase 3, in which unwanted terminals such as e are replaced by a new nonterminal X, , and new productions such as :x.~ e are introduced. For the grammar in Example 9.16, the first production might look like S,~ d:x.X bS3S4. 9.4 PUMPING THEOREM As was the case with type 3 languages, some languages are too complex to be defined by a context-free grammar. To prove a language L is context-free, one need only define a grammar that generates L. By contrast, to prove L is not context free, one must effectively argue that no context-free grammar can possibly generate L. The pumping lemma for deterministic finite automata (Theorem 2.3) showed that the repetition of patterns within strings accepted by a DFA was a consequence of the nature of the finite description. The finiteness of grammatical descriptions likewise implies a pumping theorem for languages represented by context-free grammars. The proof is greatly simplified by the properties implied by the existence of canonical forms for context-free grammars. V Theorem 9.7. Let L be a context-free language over I *. Then (3n E N)('Vz EL ~ Izl::::n)(3u, v, w,x,y EI*) ~ z = uvwxy, Ivwxl :5n, Ivxl:::: 1, and ('Vi E N)(UViwxiy E L) Proof. Given a context-free language L, there must exist a PCNF grammar G = <0, I, S, P> generating L - {A}. Let k = 110 II. The parse tree generated by this PCNF grammar for any word z E L is a binary tree with each (terminal) symbol in z 316 Context-Free Grammars Chap. 9 corresponding to a distinct leaf in the tree. Let n = 2k+1• Choose a string z generated by G of length at least n (if there are no-strings in L that are this long, then the theorem is vacuously true, and we are done). The binary parse tree for any such string z must have depth at least k + 1, which implies the existence of a path involving at least k + 2 nodes, beginning at the root and terminating with a leaf. The labels on the k + 1 interior nodes along the path must all be nonterminals, and since Ilnll = k, they cannot all be distinct. Indeed, the repetition must occur within the "bottom" k + 1 interior nodes along the path. Call the repeated label R (see Figure 9.12), and note that there must exist a derivation for the parse tree that looks like S~ u Ry~ uv Rxy~ uvwxy where u, v, w, x, and yare all terminal strings and z = uvwxy. That is, there are productions in P that allow R~vRx and R~w. Since S~uRy and R~w, S~ uwy is a valid derivation, and uwy is therefore a word in L. Similarly, S~uRy~uvRxy~uvvRxxy~uvvwxxy, and so uv 2wx 2y EL. Induction shows that each of the strings uviwxiy belongs to L for i = 0, 1,2, .... If both v and x were empty, these strings would not be distinct words in L. This case cannot arise, as shown next, and thus the existence of z implies that there is an infinite sequence of strings that must belong to L. The two occurrences of R were in distinct places in the parse tree, and hence at s heightSk+l u v w x y Figure 9.12 The parse tree discussed in the proof of Theorem 9.7 Sec. 9.4 Pumping Theorem 317 least one production was applied in deriving uvRxy from uRy. Since the PCNF grammar G contains neither contracting productions nor unit productions, the sentential form uvRxy must be of greater length than uRy, and hence Ivi + Ixl > O. Furthermore, the subtree rooted at the higher occurrence of R was of height k + 1 or less, and hence accounts for no more than Zk+l( = n) terminals. Thus, Ivwx 1::5 n. All the criteria described in the pumping theorem are therefore met. Since a context-free language must be generated by a CNF grammar with a finite number of nonterminals, there must exist a constant n (such as n = Zllnll+l) for which the existence of a string of length at least n implies the existence of an infinite sequence of distinct strings that must all belong to L, as stated in the theorem. !1 As with the pumping lemma, the pumping theorem is usually applied to justify that certain languages are complex (by proving that the language does not satisfy the pumping theorem and is thus not context free). Such proofs naturally employ the contrapositive of Theorem 9.7, which is stated next. V Theorem 9.8. Let L be a language over I*. if (Vn E ~)(3z E L 11z1 ~ n)(Vu, v, w,x,y E I* 1z = uvwxy, Ivwxl ::5n, Ivxl ~ 1) (3i E ~ 1 uviwxiy tt. L) then L is not context free. Proof. See the exercises. Examples 8.5 and 9.17 show that there are context-sensitive languages which are not context free. EXAMPLE 9.17 The language L ={a"bkckIkE ~} is not a context-free language. Let n be given, and choose z = a"b"c". Then z ELand Iz I= 3n ~ n. If L were context free, there must be choices for u, v, w, x, and y satisfying the pumping theorem. Every possible choice of these strings leads to a contradiction, and hence L cannot be context free. A sampling of the various cases is outlined below. If the strings v and x contain only one type of letter (for example, e), then uv2wx 2y willcontain more es than as or bs, and thus uv2wx 2y tt. L. If v were, say, all bs and x were all es, then uv2wx 2y would contain too few as and would again not be a member of L. If v were to contain two types of letters such as v = aabb, then uv2wx 2y = uvvwxxy = uaabbaabbwxxy and would represent a string that had some bs preceding some as, and again uv2wx 2y tt. L. All other cases are similar to these, and they collectively imply that L is not a context-free language. 318 Context-Free Grammars Chap.S Example 9.17 illustrates one major inconvenience of the pumping theorem: the inability to specify which portion of the string is to be "pumped." With the pumping lemma in Chapter2,variants were explored that allowed the first n letters to be pumped or the last n letters to be pumped. Indeed, any n consecutive letters in a word from an FAD language' can be pumped. For context-free languages, such precision is more elusive. The uncertainty as to where the vwxportion of the string was in Example 9.171ed to manysubcases, since all combinations of u, v, W, x, and y had to be shown to lead to contradictions. The following result, a variant of Ogden's lemma, allows some choice in the placement of the portion of the string to be pumped in a "long" word froin a context-free language. V Theorem 9.9. Let L be a context-free language over l*. Then (3n EN) (Vz E L.j [z]~ nand z has any n or more positions marked as distinguished) (3u, v, w,x,y E l*)j z= uvwxy, vwx contains no more than n distinguished positions, vx contains at least one distinguished position, W contains at least one distinguished position, and (Vi E N)(UViwxiy E L) Proof. Given a context-free language L, there must exist a PCNF grammar G = <n,I,S,p> generating L-{lI.}. Let n =211011+1. The proof is similar to that given for the pumping theorem (Theorem 9.7); the method for choosing the path now depends on the placement of the distinguished positions. A suitable path is constructed by beginning at the root of the binary parse tree and, at each level, descending to the right or left to lengthen the path. The decision to go right or left is determined by observing the number of distinguished positions generated in the right subtree and the number of distinguished positions generated in the left subtree. The path should descend into the subtree that has the larger number of distinguished positions; ties can be broken arbitrarily. The resulting path will terminate at a leaf corresponding to a distinguished position, willbe of sufficient length to guarantee a repeated label R within the bottom Ilnll + 1 interior nodes,and so on. The conclusions now follow in much the same manner as those given ill the pumping theorem. Ii. EXAMPLE 9.18 For the language L = {akbkcklk EN} investigated in Example 9.17, Ogden's lemma could be applied with the first n letters of anbncnas the distinguished positions. Since w musthave at least one distinguished letter (that is, an a), and u and v must precede w, the u and v portions of the string would then be required to be all as. This greatly reduces the number of cases that must be considered. Note that more than n letters can be chosen as the distinguished positions, and they need not be consecutive. Sec. 9.4 Closure Properties 319 9.5 CLOSURE PROPERTIES Recall that '€~ represented the class of context-free languages over I. The applications of the pumping theorem show that not every language is context free. The ability to show that specific languages are not context free makes it feasible to decide which language operators preserve context-free languages. The context-free languages are closed under most of the operators considered in Chapter 5; the major exceptions are complement and intersection. We begin with a definition of substitution for context-free languages. V Definition 9.11. Let I ={all az, ... ,am} be an alphabet and let f be a second alphabet. Given context-free languages Lll Lz, .L; over f, define a substitution s: I~ p(f*) by sea;) = L; for each i = 1,2, ,m, which can be extended to s: I*~ p(f*) by SeA) = A and (Va E I)Vx E I*)(s(a*x) = s(a)*s(x)) s can be further extended to operate on a language L ~ I * by defining s. p(I*)~ p(f*), where A s(L) = U s(z) zEL A substitution is similar to a language homomorphism (Definition 5.8), where letters were replaced by single words, and to the regular set substitution given by Definition 6.5. For context-free languages, substitution denotes the consistent replacement of the individual letters within each word of a context-free language with setsof words. Each such set of words must also be a context-free language, although not necessarily over the original alphabet. EXAMPLE 9.19 Let L = L (G t) , where G t = <{T},{a, b,c,d, -}, T,{T~alblcldITT}> Let L1 denote the set of all valid FORTRAN identifiers. Let L, denote the set of all strings denoting integer constants. Let L3 denote the set of all strings denoting real constants. Let L4 denote the set of all strings denoting double-precision constants. If the substitution s were defined by sea) =~, s(b) = Lz, s(c) =~, sed) = L4, then 320 Context-Free Grammars Chap. 9 s(L) would represent the set of all unparenthesized FORTRAN expressions involving only the subtraction operator. In this example, s(L) is a language over a significant portion of the ASCII alphabet, whereas the original alphabet consisted of only five symbols. The result is still context free, and this can be proved for all substitutions of context-free languages into context-free languages. In Example 9.19, the languages L1 through L4 were not only context free, but were in fact regular. There are clearly context-free grammars defining each of them, and it should be obvious how to modify G, to produce a grammar that generates s(L). If G1= <,010 lb Sb P1> is a grammar generating L1 , for example, then occurrences of a in the productions of G,should simply be replaced by the start symbol S, of G1 and the productions of P1 added to the new grammar that will generate s(L). This is essentially the technique used to justify Theorem 9.10. The closure theorem is stated for substitutions that do not modify the terminal alphabet, but it is also true in general, as a trivial modification of the following proof would show. V Theorem 9.10. Let l be an alphabet, and let s: l~l* be a substitution. Then ~I is closed under t. Proof. Let l =[a., a2, ... ,am}' If L is a context-free language, then there is a context-free grammar G = <n, l, S, P> that generates L1• If s: l~ l* is a substitution satisfying Definition 9.11, then for each letter akE l there is a corresponding grammar G, = <nb l, Sb Pi> for which L(Gk) =s(ak)' Since nonterminals can be freely renamed, we may assume that n,nb ~, ... ,nm have no common symbols. s(L) will be generated by the context-free grammar G' = «nun, U~U***unm,l,S,P' U P1 U P2U'" UPm>, where P I consists of the rules of P, with each appearance of a, replaced by Sk. From the start symbol S, the rules of P' can be used as they were in the original grammar G, producing strings with the start symbol of the kth grammar where the kth terminal symbol would be. Since the nonterminal sets were assumed to be pairwise disjoint, only the rules in Pk can be used to expand Sb resulting in the desired terminal strings from s(ak)' It follows that L(G') = s(L), and thus s(L) is context free. ~ V Theorem 9.11. Let l be an alphabet, and let \(I: l~ l* be a homomorphism. Then ~I is closed under \iI. Proof. Languages that consist of single words are obviously context free. Hence, Theorem 9.10 applies when single words are substituted for letters. Since homomorphisms are therefore special types of substitutions, ~I is closed under homomorphism. ~ Many of the other closure properties of the collection of context-free gramSec. 9.4 Closure Properties 321 mars follow immediately from the result for substitution. Closure under union could be proved by essentially the same method presented in Theorem 8.4. An alternate proof, based on Theorem 9.10, is given next. V Theorem 9.12. Let I be an alphabet, and let LI and L, be context-free languages over I. Then LI U L, is context free. Thus, ~l is closed under union. Proof. Assume LI and L, are context-free languages over I. The grammar U=<{S},{a,b},S,{S~a,S~b}> clearly generates the context-free language {a, b}. The substitution defined by s (a) = LI and s (b) = L, gives rise to the language s({a, b}), which obviously equals LIU~. By Theorem 9.10, this language must be context free. Ll A similar technique can be used for concatenation and Kleene closure. It is relatively easy to directly construct appropriate new grammars that combine the generative powers of the original grammars. The exercises explore constructions that prove these closure properties without relying on Theorem 9.10. V Theorem 9.13. Let I be an alphabet, and let LI and L, be context-free languages over I. Then LI*L, is context free. Thus, ~l is closed under concatenation. Proof. Let LI and L, be context-free languages over I. The pure context-free grammar C = <{S},{a, b},S, {S~ ab}> generates the language {ab}. The substitution defined by s (a) = LI and s (b) = L, gives rise to the language s({ab}) = LJ'~. By Theorem 9.10, LI'~ must therefore be context free. Ll Closure under Kleene closure could be justified by Theorem 9.10 in a similar manner, since the context-free grammar K= <{Z,S},{a, b},S,{Z~A,Z~S,S~aS,S~a}> generates the language a", The substitution defined by s (a) = LI gives rise to the language s(a*), which is L]', and so Lt is also context free. The proof of Theorem 9.14 instead illustrates how to modify an existing grammar. V Theorem 9.14. Let I be an alphabet, and let LI be a context-free language over I. then Lt is context free. Thus, ~l is closed under Kleene closure. Proof. If LI is a context-free language, then there is a pure context-free grammar GI = <Ob I, Sb PI> that generates LI - {A}. Choose nonterminals Z' and S' such that Z' ~ 0 1 and S' ~ 0b and define a new grammar G. = <01 U{S',Z'},I,Z',PIU{Z'~A,Z'~S', S'~ S'SbS'~SI}>' A straightforward induction shows that L (G.) = L (GI)*. Ll 322 Context-Free Grammars , . , Chap. 9 Thus, many of the closure properties of the familiar operators investigated in Chapter 5 for regular languages carryover to the class of context-free languages. Closure under intersection does not extend, as the next result shows. V Lemma 9.3. ~{a,b.c} is not closed under intersection. Proof. The languages L1= {a ibicili,jEN} and Lz={anbncmln,mEN} are context free (see the exercises), and yet L1n L, = {akbkckIkE N} was shown in Example 9.17 to be a language that was not context free. Hence ~{a.b,c} is not closed under intersection. tl The exercises show that ~l is not closed under intersection for any alphabet I with two or more letters. It was noted in Chapter 5 that De Morgan's laws implied that any collection of languages that is closed under union and complementation must also be closed under intersection. It therefore follows immediately that ~{a,b.c} cannot be closed under complementation either. V Lemma 9.4. ~{a, h.c] is not closed under complementation. Proof. Assume that ~{a.b,c} is closed under complementation. Then any two context-free languages L1 and L, would have context-free complements ~L1 and ~Lz. ByTheorem 9.12, ~L1 U ~Lz is context free, and the assumption would imply that its complement is also context free. But ~(-L1U Lz) = L1n Lz, which would contradict Lemma 9;3 (for example, if L1were {aibicili,jE N} and L, were {anbncmln,mEN}). Hence the assumption must be false and ~{a.b,C}cannot be closed under complementation. tl Thus, the context-free languages do not enjoy all ofthe closure properties that the regular languages do. However, the distinction between a regular language and a context-free language is lost if the underlying alphabet contains only one letter, as shown by the following theorem. The proof demonstrates that there is a certain regularity in the lengths of any context-free language. It is the relationships between the different letters in the words of a context-free language that give it the potential for being non-FAD. If L is a context-free language over the singleton alphabet {a}, then no such complex relationships can exist; the character of a word is determined solely by its length. V Theorem 9.15. ~{a} = 2lJ{a}; that is, every context-free language over a single letter alphabet is regular. Proof. Let L be a context-free language over the singleton alphabet {a}, and assume the CNF grammar G = <0, I, S, P> generates L. Let n = 2111111+1. Consider the words in L that are of length n or greater, choose the smallest such word, and denote it by alt. Since i. 2':: n, the pumping theorem can be applied to this word, Chap.9 Exercises 323 and hence ail can be written as uvwxy, where u = aPI, V = aQ1 , W = a", x = aS1, and y = a". Let i! = q! + S!. Note that Ivwxl::s; n implies that i!::s; n. The pumping theorem then implies that all strings in the set L! = {ail+kill k = 0, 1,2, ... }must belong to L. These account for many of the large words in L. If there are other large words in L, choose the next smallest word ah that is of length greater than n that belongs to L but is not already in the set L!. By a similar argument, there is an integer iz::S; n for which all strings in the set L, = {ah+ki21 k = 0, 1,2, ... } must also belong to L. Note that if i! happens to equal iz, then i, h is not a multiple of n, or else ah would belong to L!. That is, hand h must in this case belong to different equivalence classes modulo n. While large words remain unaccounted for, we continue choosing the next smallest word aim +1 that is of length greater than n and belongs to L but is not already in the set L! U L, U ... U Lm • Since each ik::s; n, there are only n choices for the iks, and only n different equivalence classes mod n in which the jkS may fall, totaling n z different combinations. Thus, all the long words in L will be accounted for by the time m reaches n z. The words in L of length less than n constitute a finite set F, which is regular. Each L, is represented by the regular expession indicated by (aik)**aik, and there are less than n Z of these expressions, so L is the finite union of regular sets, and is therefore regular. d If a regular language is intersected with a context-free language, the result may not be regular, but it will be context free. The proof that '€~ is closed under intersection with a regular set will use the tools developed in Chapter 10. The constructs in Chapter 10 will also allow us to show that '€~ is closed under inverse homomorphism. Such results are useful in showing closure under other operators and will also be useful in identifying certain languages as non-context-free. These conclusions will be based on a more powerful type of machine, called a pushdown automaton. The context-free languages will correspond to the languages that can be represented by such recognizers. EXERCISES 9.1. Characterize the nature of parse trees of left-linear grammars. 9.2. Give context-free grammars for the following languages: (a) {anbncmdmln,m EN} (b) {aibieidili,j EN} (c) {anbncmdmln,mEN} U {a'b'c'd'[z.] E N} 9.3. (a) Find, if possible, unambiguous context-free grammars for each of the languages given in Exercise 9.2. (b) Prove or disprove: If L1 and L, are unambiguous context-free languages, then L1 U L, is also an unambiguous context-free language. (c) Is OIL}; closed under union? 9.4. State and prove the inductive result needed in Theorem 9.2. 324 Context-Free Grammars Chap. 9 9.5. Consider the proof of Theorem 9.4. Let G = <il, I, S, P> be a context-free grammar, with the production set divided up into P" and P" (the set of unit productions and the set of nonunit productions, respectively). Devise an automaton-based algorithm that correctly calculates BU = {C]B~ C}for each nonterminal B found in PU • 9.6. (a) What is wrong with proving that '€:l; is closed under concatenation by using the following construction? Let G1 = <ilt, I, Sl, PI> and Gz = <ilz, I, Sz, Pz> be two context-free grammars, and without loss of generality assume that ill n Oz = rfJ. Choose a new nonterminal Z such that Z ft. ill UOz, and define a new grammar Ge = <ill U ilz U{Z}, I, Z, PIU Pz U{Z~ Sl *SZ}>. Note: It is straightforward to show that L(G e) =L(G 1)*L(G z). (b) Modify G" so that it reflects an appropriate valid context-free grammars. (Hint: Pay careful attention to the treatment of lambda productions.) (c) Prove that '€:l; is closed under concatenation by using the construction defined in part (b). 9.7. Let I = {a, b, c}. Show that {a'b'c"Ii,j, kEN and i + j = k} is context free. 9.8. (a) Show that the following right-linear grammar is ambiguous. G = <{S,A,B},{a},S,{S~A,S~B,ÃaaA,ÃA,B~aaaB,B~A}> (b) Use the method outlined in Theorem 9.2 to remove the ambiguity in G. 9.9. The regular expression grammar discussed in Example 9.3 produces strings with needless outermost parentheses, such as «aUb)ec). (a) Define a grammar that generates all the words in this language and strings that are stripped of (only) the outermost parentheses, as in (aUb)ec. (b) Define a grammar that generates all the words in this language and also allows extraneous sets of parentheses, such as ««a)Ub»ec). 9.10. For the regular expression grammar discussed in Example 9.3: (a) Determine the leftmost derivation for «a*eb)U(ced)*). (b) Determine the rightmost derivation for «a*eb)U(ced)*). 9.11. Consider the grammars G and G' in the proof of Theorem 9.5. Induct on the number of steps in a derivation in G to show that L (G) = L (G'). 9.12. For a grammar G in Chomsky normal form, prove by induction that for any string x E L(G) other than x = Athe number of productions applied to derive x is 2jxj. 9.13. (a) For a grammar G in Chomsky normal form and a string x EL(G), state and prove a lower bound on the depth of the parse tree for x. (b) For a grammar G in Chomsky normal form and a string x E L (G), state and prove an upper bound on the depth of the parse tree for x. 9.14. Convert the following grammars to Chomsky normal form. (a) <{S,B, C},{a,b,c},S,{S~aB,S~abC,B~ bc,C~c}> (b) <{S,A,B}, {a, b, c},S,{S~cBA, S~B,ÃcB,ÃAbbS,B~aaa}> (c) <{R}, {a, b.c, (,), E,rfJ, U, -, *},R,{R~ajblcIElrfJj(ReR)I(RUR)IR*}> (d) <{T}, {a, b,c, d, -, +}, T,{T~alblcldIT TIT+ T}> 9.15. Convert the following grammars to Greibach normal form. (a) <{Sl, Sz},{a, b, c, d, e},St, {Sl~ SZSle,S,~ Szb,Sz~ SlSZ,Sz~c}> (b) <{Sl' Sz, S3}, {a,b, c, d, e},Sl, {S.~ S3St,Sl~ Sza,Sz~ be, S3~ Szc}> (c) < {St, Sz, S3}, {a, b, c, d, e},Si, {Sl~ SlSZC, Si~ dS3,Sz~ SlSt, Sz~ a, S3~ S3e}> 9.16. Let G be a context-free grammar, and obtain G' from G by adding rules of the form Chap. 9 Exercises 325 Ã A. Prove that there is a context-free grammar Gil that is equivalent to GI. That is, show that apart from the special rule Z~ A all other lambda productions are unnecessary. 9.17. Prove the following generalization of Lemma 9.1. Let G = <n,~, S, P> be a context-free grammar, and assume there are strings a and 'Y and nonterminals X and B for which X~ 'YBaE P. Further assume that the set of all B rules is given by {B~ f3l,B~ 132,'" ,B~ 13m}, and let G' = <n,~, S, P">, where P' = P U{XB~ 'Yf3la,XB~ 'Yf32a, ... ,XB~ 'Yf3ma} - {X~ 'YBa}. ThenL(G) =L(G'). 9.18. Let P = {y E [d}" 13 prime p ~ y = di'} = [dd, ddd, ddddd, d", d", d13 , ••• }. (a) Prove that P is not context free by directly applying the pumping theorem. (b) Prove that P is not context free by using the fact that P is known to be a nonregular language. 9.19. Let r = {x E {O, 1, 2}* 13w E {O, 1}* ~ x = w *2*w} = {2, 121,020, 11211, 10210, ... }. Prove that r is not context free. 9.20. Let 'I' = {x E {O, 1}* 13w E {O, 1}* ~ x = w-w} = {A, 00,11,0000,1010,1111, ... }. Prove that 'I' is not context free. 9.21. Let E = {x E{b}*13j E ~ ~ Ixl= 2J} = {b, bb, bbbb, b8 , b16,b32, ... }. Prove that E is not context free. 9.22. Let <I> = {x E{a}*13j E ~ ~ Ixl = /} = {A,a,aaaa,a9,aI6,a25, ... }, and let <I>'={xE{b,c,d}*llxlb;:::1/\ IxIc=(14)2}. (a) Prove that <I> is not context free. (b) Use the conclusion of part (a) and the properties of homomorphism to prove that <1>' is not context free. (c) Use Ogden's lemma to directly prove that <1>' is not context free. (d) Is it possible to use the pumping theorem to directly prove that <1>' is not context free? 9.23. Consider L = {y E {O, 1}* l b 10 = Iy Id. Prove or disprove that L is context free. 9.24. Refer to the proof of Theorem 9.9. (a) Give a formal recursive definition of the path by (1) stating boundary conditions, and (2) giving a rule for choosing the next node on the path. (b) Show that the conclusions of Theorem 9.9 follow from the properties of this path. 9.25. Show that '€:l: is closed under U by directly constructing a new context-free grammar with the appropriate properties. 9.26. Let~:l: be the set of all languages that are not context free. Determine whether or not: (a) ~:l: is closed under union. (b) ~:l: is closed under complement. (c) ~:l: is closed under intersection. (d) ~:l: is closed under Kleene closure. (e) If:l: is closed under concatenation. 9.27. Let ~ be an alphabet, and x = ala2' .. an-Ian E ~*; define x r = anan-I ... a2al. For a language L over S, define L' = {xrlx E L}. Note that the (unary) reversal operator r is thus defined by L' = {anan-I ... a3a2allala2a3' .. an-Ian E L}, and L' therefore represents all the words in L written backward. Show that '€:l: is closed under the operator r. 326 Context-Free Grammars Chap. 9 9.28. Let I = {a,b, c, d}. Define the (unary) operator T by ={wr*wlwEL} (see the definition of wrin Exercise 9.27). Prove or disprove that %:is closed under the operator T. 9.29. Prove or disprove that '€{a.b} is closed under relative complement; that is, if L1 and k are context free, then L1 L, is also context free. 9.30. (a) Prove that '€{a.b} is not closed under intersection, nor is it closed under complement. (b) By defining an appropriate homomorphism, argue that whenever I has more than. one symbol '€X is not closed under intersection, nor is it closed under complement. 9.31. Consider the iterative method discussed in the proof of Theorem 9.3. Outline an alternative method based on an automaton with states labeled by the sets in pen). 9.32. Consider grammars in Greibach normal form that also satisfy one of the restrictions of Chomsky normal form; that is, no production has more than two symbols on the right side. (a) Show that this is not a "normal form" for context-free languages by demonstrating that there is a context-free language that cannot be generated by any grammar in this form. (b) Characterize the languages generated by grammars that can be represented by this restrictive form. 9.33. Let L be any collection of words over an alphabet I. Prove that L* must be regular. 9.34. If IIIII = 1, prove or disprove that '€X is closed under complementation. 9.35. Prove that {anbncmIn, mEN} is context free. 9.36. Use Ogden's lemma to prove that {a"bncml (k 1= n) 1\ (n 1= m)} is not context free. c H A p T E R PUSHDOWN AUTOMATA In the earlier part of this text, the representation of languages via regular grammars was a generative construct equivalent to the cognitive power of deterministic finite automata and nondeterministic finite automata. Chapter 9 showed that context-free grammars had more generative potential than did regular grammars, and thus defined a significantly larger class of languages. This chapter and the next explore generalizations of the basic automata construct introduced in Chapter 1. In Chapter 4, we discovered that adding nondeterminism did not enhance the language capabilities of an automaton. It seems that more powerful automata will need the ability to store more than a finite amount of state information, and machines with the ability to write and read from an indefinitely long tape will now be considered. Automata that allow unrestricted access to all portions ofthe tape are the subject of Chapter 11. Such machines are regarded to be as powerful as a general-purpose computer. This chapter will deal with automata with restricted access to the auxiliary tape. One such device is known as a pushdown automaton and is strongly related to the context-free languages. 10.1 DEFINITIONS AND EXAMPLES A language such as {anbnln 2= I} can be shown to be non-FAD by the pumping lemma, which uses the observation that a finite-state control cannot distinguish between an unlimited number of essentially different situations. Deterministic finite automata could at best "count" modulo some finite number; unlimited matching was one of the many things beyond the capabilities of a finite-state control. One 327 , . ' 328 Pushdown Automata Chap. 10 possible enhancement would be to augment the automaton with a single integer counter, which could be envisioned as a sack in which stones could be placed (or removed) in response to input. The automaton would begin with one stone in the sack and process input much as a nondeterministic finite automaton would. With each transition, the machine would not only choose a new state, but also choose to add another stone to the sack, remove an existing stone from the sack, or leave the contents unchanged. The 8 function is independent of the status of the sack; the sack is used only to determine whether the automaton should continue to process input symbols. Perhaps some sort of weight sensor would be used to detect when there were stones in the sack, and the device would continue to operate as long as stones were present; the device would halt when the sack is empty. If all the symbols on the input tape happen to have been consumed at the time the sack empties, the input string is accepted by the automaton. Such devices are called counting automata and are general enough to recognize many non-FAD languages. A device to recognize {a"b"In ~ 1} would need three states. The start state will transfer control to a second state when an a is read, leaving the sack contents unchanged. The start state will have no valid moves for b, causing words that begin with b to be rejected since the input tape will not be completely consumed. The automaton will remain in the second state in response to each a, adding a stone to the sack each time an a is processed. The second state will transfer control to the third state upon receipt of the symbol b and withdraw a stone from the sack. The third state has no moves for a and remains in that state while removing a stone for each b that is processed. For this device, only words of the form a"b" will consume all the input just when the sack becomes empty. Another type of counting automaton handles acceptance in the same manner as nondeterministic finite automata. That is, if there is a sequence of transitions that consumes all the input and leaves the device in a final state, the input word is accepted (irrespective of the sack contents). As with NDFAs, the device may halt prematurely if there are no applicable transitions (or if the sack empties). These counting automata are not quite general enough to recognize all context-free languages. More than one type of "stone" is necessary in order for such an automaton to emulate the power of context-free grammars, at which point the order of the items becomes important. Thus, the sack is replaced by a stack, a last-in, first-out (LIFO) list. The most recently added item is positioned at the end called the top of the stack. A newly added item is placed above the current top and becomes the new top item as it is pushed onto the stack. The action of the finitestate control can be influenced by the type of item that is on the top of the stack. Only the top (that is, the most recently placed) item can affect the state transition function; the device has no ability to reexamine items that have previously been deleted (that is, have been popped). The next item below the top of the stack cannot be examined until the top item is popped (and that popped item thereby becomes unavailable for later reinspection). As with counting automata, an empty stack will halt the operation of this type of automaton, called a pushdown automaton. Sec. 10.1 Definitions and Examples 329 V' Definition 10.1. A (nondeterministic) pushdown automaton (NPDA or just PDA) is a septuple P = <I, I', S, sO, 8, B, F>, where I is the input alphabet. I' is the stack alphabet. S is a finite nonempty set of states. So is the start state (so E S). 8 is the state transition function, 8: S x (I U A) x f~ the set of finite subsets of S x I'". B is the bottom of the stack symbol (B E f). F is the set oi final states (F ~ S). By the definition of alphabet (Definition 1.1), both I and I' must be nonempty. Figure 10.1 presents a conceptualization of a pushdown automaton. As with an NDFA, there is a finite-state control and a read head for the input tape, which only moves forward. The auxiliary tape also has a read/write head, which not only moves forward, but can move backward when an item is popped. The state transition function is meant to signify that, given a current state, an input symbol being currently scanned, and the current top stack symbol, the automaton may choose input.tape stacktape read/writehead Finite State Control Figure 10.1 A model of a pushdown automaton 330 Pushdown Automata Chap. 10 both a new current state and a new string of symbols from f* to replace the top stack symbol. This definition allows the machine to behave nondeterministically, since a current state, input letter, and stack symbol are allowed to have any (finite) number of alternatives for state transitions and strings from I'" to record on the stack. The auxiliary tape is similar to that of a finite-state transducer; the second component of the range of the state transition function in a pushdown automaton specifies the string to be written on the stack tape. Thus, the functions I) and co of a FST are essentially combined in the I) function for pushdown automata. The auxiliary tape differs from that of a FST in that the current symbol from I' on the tape is sensed by the stack read/write head and can affect the subsequent operation of the automaton. If no symbols are written to tape during a transition, the tape head drops back one position and will then be scanning the previous stack symbol. In essence, a state transition is initiated by the currently scanned symbol on both the input tape and the stack tape and begins with the stack symbol being popped from the stack; the state transition is accompanied by a push operation, which writes a new string of stack symbols on the stack tape. If several symbols are written, the auxiliary read/write head will move ahead an appropriate amount, and the head will be positioned over the last of the symbols written. Thus, if exactly one symbol is written, the stack tape head does not move, and the effect is that the old top-ofstack symbol is overwritten by the new symbol. When the empty string is to be written, the effect is a pop followed by a push of no letters, and the stack tape head retreats one position. If the only remaining stack symbol is removed from the stack in this fashion, the stack tape head moves off the end of the tape. It would then no longer be scanning a valid stack symbol, so no further transitions are possible, and the device halts. It is possible to manipulate the stack and change states without consuming an input letter, which is the intent of the x-moves in the state transition function. Since at most one symbol can be removed from the stack as a result of a transition, ~-moves allow the stack to be shortened by several symbols before the next input symbol is processed. Acceptance can be defined by requiring the stack to be empty after the entire input tape is consumed (as was the case with counting automata) or by requiring that the automaton be in a final state after all the input is consumed. The nondeterminism may allow the device to react to a given input string in several distinct ways. As with NDFAs, the input word is considered accepted if at least one of the possible reactions satisfies the criteria for acceptance. For a given PDA, the set of words accepted by the empty stack criterion will likely differ from the set of words accepted by the final state condition. EXAMPLE 10.1 Consider the PDA defined by P1 = <{a, b], {A, B},{q, r},q, 8, B, 0>, where 8 is defined by Sec. 10.1 Definitions and Examples 8(q, a, B) = {(q, A)} 8(q, a, A) = {(q,AA)} 8(q,b,B)={ } 8(q, b, A) = {(r, A)} 8(r,a,B)={ } 8(r,a,A)={} 8(r, b,B) = { } 8(r, b, A) = {(r, A)} 331 Note that since the set of final states is empty no strings are accepted by final state. We wish to consider the set of strings accepted by empty stack. In general, when the set of final states is nonempty, the PDA will designate a machine designed to accept by final state; F = 0will generally be taken as an indication that acceptance is to be by empty stack. The action of the state transition function can be displayed much like that of finite-state transducers. Transition arrows are no longer labeled with just a symbol from the input alphabet, since both a stack symbol and an input symbol now govern the action of the automaton. Thus, arrows are labeled by ordered pairs from I x r. As with FSTs, this is followed by the output caused by the transition. The diagram corresponding to Pi is shown in Figure 10.2. Figure 10.2 The PDA discussed in Example 10.1 The reaction of Pi to the string aabb is the sequence of moves displayed in Figure 10.3. Initially, the heads of the two tapes are positioned as shown in Figure 1O.3a, with the (current) initial state highlighted. Since the state is q, the input symbol is a, and the stack symbol is B, the first transition rule 8(q, a, B) = {(q, A)} applies; Pi remains in state q, and the popped stack symbol B is replaced by a single A. Figure 10.3b shows the new state of the automaton. The stack read/write head is in the same position, since the length of the stack did not change. The input read head moves on to the next letter, since the first input symbol was consumed. The second rule now applies, and the single A is replaced by the pair AA as Pi returns to q again, as shown in Figure 10.3c. Note that the stack tape head advanced as the topmost symbol was written. The rule 8(q, b, A) = {(r, A)} now applies, and the state of the machine switches to r as the (topmost) A is popped and replaced with an 332 Pushdown Automata Chap. 10 (a) (b) Figure 10.3 (a-e) Walkthrough of the pushdown automaton discussed in Example 10.1 Sec. 10.1 Definitions and Examples (c) 333 (d) empty string, leaving the stack shorter than before. This is shown in Figure 1O.3d. The last of the eight transition rules now applies, leaving the automaton in the configuration shown by Figure lO.3e. Since the stack is now empty, no further moves are possible. However, since the read head has reached the end of the input string, the word aabb is accepted by Pl' The word aab would be rejected by Pt> since 334 Pushdown Automata Chap. 10 (e) the automaton would run out of input in a configuration similar to that of Figure 1O.3d, in which the stack is not yet empty. The word aabbb would not be accepted because the stack would empty prematurely, leaving PI stuck in a configuration similar to that of Figure lO.3e, but with the input string incompletely consumed. The word aaba would likewise be rejectedbecause there would be no move from the state r with which to process the final input symbol a. As with deterministic finite automata, once an input symbol is consumed, it has no further effect on the operation of the pushdown automaton. The current state of the device, the remaining input symbols, and the current stack contents form a triple that describes the current configuration of the PDA. The triple (q, bb, AA) thus describes the configuration of the PDA in Figure 1O.3c. When processing aabb, PI moved through the following sequence of configurations: (q.aabb.B) (q,abb,A) (q,bb,AA) (r, b, A) (r, A, A) Successive configurations followed from their predecessors by applying a single rule from the state transition function. These transitions will be described by the operator 1-. Sec. 10.1 Definitions and Examples 335 V Definition 10.2. The current configuration of pushdown automaton P = <I, r, S, so,8,B, F> is described by a triple (s,x, a), where s is the current state. x is the unconsumed portion of the input string. a is the current stack contents (with the topmost symbol written as the leftmost). An ordered pair (t, 'Y) within the finite set of objects specified by 8(s, a, A) can cause a move in the pushdown automaton P from the configuration (s, ay, A/3) to the configuration (t.y, 'Y/3). This transition is denoted as (s, ay, A/3) I-(t,y, 'Y/3). A sequence of successive moves in which (s., Xl, al) I-(sz,xz, az), (sz,xz, az) I-(S3' X3, a3), ... ,(Sm-l, X';'-l,am-l) I-(Sm, Xm, am) is denoted by (s., Xl, al) ~ (sm, Xm,am)' 6. The operator ~ reflects the* reflexive and transitive closure of 1-, and thus we also have (s., Xl, al) ~ (s., Xl> al) and clearly (s., Xl> al) I-(sz;xz, az) implies (s., Xl> al) ~ (sz,xz, az). EXAMPLE 10.2 For the pushdown automaton P1 in Example 10.1, (q, aabb,B) ~ (r, A, A) because (q, aabb, B) I-(q, abb, A) I-(q, bb, AA) I-(r, b, A) I-(r, A,A). V Definition 10.3. For a pushdown automaton P= <I,r,S,so,8,B,F>, the language accepted via final state by P, L( P), is {x E I* 13rE F, 3a E f* ~ (so,x, B) ~ (r, A,a)} The language accepted via empty stack by P, A(P), is {x E I* 13r E S ~ (so,x, B) ~(r, A,A)} EXAMPLE 10.3 Consider the pushdown automaton P1 in Example 10.1. Since only strings of the form aibi (for i ;;::: 1) allow(q, a'b', B) ~ (r, A, A), it follows that A(P 1) =;:{a''b" In ;;::: I}. However, F= 0 and thus L (P1) is clearly 0. The pushdown automaton P1 in Example 10.1 was deterministic in the sense that there will never be more than one choice that can be made from any configuration. The. following example illustrates a pushdown automaton that is nondeterministic. 336 Pushdown Automata Chap. 10 EXAMPLE 10.4 Consider the pushdown automaton defined by Pz = <{a, b},{S, C], {t}, t, 8, S, 0>, where 8 is defined by 8(t, a, S) ={(t, sq, (t, q} 8(t, a, C) = { } 8(t, b, S) = { } 8(t, b, C) = {(t, A)} 8(t, A, S) ={ } 8(t,A,C)={} In this automaton, there are two distinct courses of action when the input symbol is a and the top stack symbol is S, which leads to several possible options when trying to process the word aabb. One option is to apply the first move whenever possible, which leads to the sequence of configurations (t, aabb, S) I- (t, abb, sq I- (t, bb, scq. Since there are no A-moves and8(t, b, S) = { }, there are no further moves that can be made, and the input word cannot be completely consumed in this manner. Another option is to choose the second move option exclusively, leading to the abortive sequence (t, aabb, S) I- (t, abb, C); 8(t, a, C) = { }, and processing again cannot be completed. A mixture of the first and second moves results in the sequence (t, aabb, S) I- (t, abb, sq I- (t, bb, cq I- (t, b, C) I- (t, A,A), and aabb is thus accepted by Pz. Further experimentation shows that A(Pz) = {anbnln ~ 1}. To successfully empty its stack, this automaton must correctly "guess" when the last a is being read and choose the second transition pair, placing only C on the stack. V Definition 10.4. Two pushdown automata M1 = <I., flo s., SOl' 810 Bb r» and Mz = <I., f z, Sz, sO:!' 8z, Bz, F2> are called equivalent iff they accept the same language. A The pushdown automaton P1 from Example 10.1 is therefore equivalent to Pz in Example lOA. The concept of equivalence will apply even if one device accepts via final state and the other accepts via 'empty stack. In keeping with the previous broad use of the concept of equivalence, if any two finite descriptors define the same language, those descriptors will be called equivalent. Thus, if a PDA M happens to accept the language described by a regular expression R, we will say that R is equivalent to M. . EXAMPLE 10.5 The following pushdown automaton illustrates the use of A-moves and acceptance by final state for the language {anbmln ~ 1/\ (n = m V n = 2m)}. Let P3= <{a, b},{A},{so}, {so, s., SZ, S3, S4}, 8, A, {sz,S4}>, where 8 is defined by Sec. 10.1 Definitions and Examples 337 8(so,a, A) = {(so, AA)} 8(so,b, A) ={ } 8(so, A,A) = {(S1o A),(S3' A)} 8(S1o a, A) ={ } 8(S1o b, A) = {(S1o A)} 8(S1o A,A) = {(S2' A)} 8(S2' a, A) ={ } 8(S2' b, A) = { } 8(S2, A,A) ={ } 8(S3' a, A) = { } 8(S3' b, A) = { } 8(S3' A,A) = {(S4' A)} 8(S4' a, A) ={ } 8(S4' b, A) = {(S3' A)} 8(S4' A,A) = { } The finite-state control for this automaton is diagrammed in Figure lOA. Note that the A-move from state S3 is not responsible for any nondeterminism in this machine. From S3, only one move is permissible: the A-move to S4. On the other hand, the A-move from state Sl does allow a choice of moving to S2 (without moving the read head) or staying at Sl while consuming another input symbol. The choice of moves from state So also contributes to the nondeterminism; the device must "guess" whether the number of bs will equal the number of as or whether there will be half as many, and at the appropriate time transfer control to Slor S3, respectively. Notice that the moves defined by states S3 and S4 allow two stack symbols to be removed for each b consumed. Furthermore, a string like aab can transfer control to S3 as the , . ' Figure 10.4 The PDA discussed in Example 10.5 338 Pushdown Automata , . , Chap. 10 final b is processed, but the A-move can then be applied to reach S4 even though there are no more symbols on the input tape. Since A was the only stack symbol in P3, the language could have as easily been described by the sack-and-stone counting device described at the beginning of the section. It should be clear that counting automata are essentially pushdown automata with a singleton stack alphabet. Pushdown automata with only one stack symbol cannot generate all the languages that a PDA with two symbols can [DENN]. However, it can be shown that using more than two stack symbols does not contribute to the generative power of a PDA; for example, a PDA with r = {A, B, C, D} can be converted into an 'equivalent machine with I" = {O, I} and the occurrences of the old stack symbols replaced by the encodings A = 01, B = 001, C = 0001, and D = 00001. Every NDFA can be simulated by a PDA that simply ignores its stack. In fact, every NDFA has an equivalent counting automaton, as shown in the following theorem. V Theorem 10.1. Given any alphabet I, and an NDFA A: 1. There is an equivalent pushdown automaton (counting automaton) A' for which L (A) = L (A'). 2. There is an equivalent pushdown automaton (counting automaton) A" for which L(A) = A(A"). Proof. The results for pushdown automata will actually follow from the results of the next section, since pushdown automata can define all the context-free languages, and the regular language defined by the NDFA A must be context free. The following constructions will use only the one stack symbol e, and hence A' and A"are actually counting automata for which L(A) = L(A') and L(A) = A(A"). While the construction of a PDA from an NDFA is straightforward, the inductive proofs are simplified if we appeal to Theorem 4.1, and assume that the given automaton is actually a DFA A=<I,S,so,8,F>. Define the PDA A' = <I,{¢},S,so, 8', ¢,F>, where 8' is defined by ('t:/s E S)('t:/a E I)(3'(s, a, ¢) = {(8(s, a), ¢)}) and ('t:/s E S)(3 '(s, A,e) = { }). That is, the PDA makes the same transitions that the DFA does and replaces the ¢ with the same symbol on the stack at each move. The proof that A and A' are equivalent is by induction on the length of the input string, where pen) is the statement that ('t:/x E In)(3(so,x) = t ~ (so,x, ¢) ~ (t, A,¢») The PDA with a single stack symbol that accepts L via empty stack is quite similar; final states are simply given the added option of removing the only symbol on the stack. That is, A"= <I, {¢}, S, sO, 8", e, 0>, where 3" is defined by ('t:/s E S)('t:/aE I)(8"(s, a, e) ={(8(s, a), ¢)}) Sec. 10.2 and while Equivalence of PDAs and CFGs (Vs E F)(8"(s, A, e) ={(s,Am (VsE S F)(8"(s, A, e) ={ }) 339 The same type of inductive statement proved for A' holds for A", and it therefore will follow that exactly those words that terminate in what used to be final states empty the stack, and thus L(A) = A(A"). Ll 10.2 EQUIVALENCE OF PDAs AND CFGs In this section, it will be shown that if L is accepted by a PDA, then L can be generated by a CFG, and, conversely, every context-free language can be recognized by a PDA. We will also show that the class of pushdown automata that accept by empty stack defines exactly the same languages as the class of pushdown automata that accept by final state. In each case, the languages defined are exactly the context-free languages. V Definition 10.5. For a given alphabet I, let ~}; = {L~I*13PDAP:l L = A(P)} ~};={L£:I*13PDAP:l L=L(P)} Recall that~}; was defined to be the collection of context-free languages. We begin by showing that ~}; ~ ~};. To do this, we must show that, given a language L generated by a context-free grammar G, there is a PDA PG that recognizes exactly those words that belong to L. The pushdown automaton given in the next definition simulates leftmost derivations in G. That is, as the symbols on the input tape are scanned, the automaton guesses at the production that produced that letter and remembers the remainder of the sentential form by pushing it on the stack. PG is constructed in such a way that, when the stack contents are checked against the symbols on the input tape, wrong guesses are discovered and the device halts. Wrong guesses, corresponding to inappropriate or impossible derivations, are thereby prevented from emptying the stack, and yet each word that can be generated by G will be guaranteed to have a sequence of moves that results in acceptance by empty stack. V Definition 10.6. Given a context-free grammar G = <n, I, S, P> in pure Greibach normal form, the single-state pushdown automaton corresponding to G is the septuple PG = <I,n u I,{s}, S, 8G, S,0>, 340 Pushdown Automata Chap. 10 where 8G is defined by _ !{(s, a) I'I'~ ao E P}, 8G(s, a, '1') - {(s, A)}, Ll EXAMPLE 10.6 if 'I' E 0 Va E L, '1'1'E (0 U L) if 'I' E L !\ 'I' = a Consider the pure Greibach normal form grammar G = <{S},{a,b},S,{S~aSb,S~ab}> which is perhaps the simplest grammar generating [a'b" In ~ I}. The automaton PG is then PG = <{a, b}, is, a, b}, is}, S, 8G , S, 0> where 8G is defined by 8G(s, a, S) ={(s, Sb), (s, b)} 8G(s , a, a) = {(s, A)} 8G(s, a , b) = { } 8G(s, b, S) ={ } 8G(s, b, a) ={ } 8G(s, b, b) ={(s, A)} This automaton contains no A-moves and is essentially the same as Pz in Example lOA, with the state t now relabeled as s, the stack symbol b now playing the role of C, and the unused stack symbol a added to I'. The "derivation S~ aSb ~ aabb corresponds to the successful move sequence (s, aabb, S) f-(s, abb, Sb) f-(s, bb, bb) f-(s, b, b) f-(s, A,A). The exact correspondence between derivation steps and move sequences is illustrated in the next example. EXAMPLE 10.7 For a slightly more complex example, consider the pure Greibach normal form grammar G = <{R}, {a, b, c, (,),E, e, U,*, *},{R,{R~alblcIElel(R.R)I(RUR)I(R)*}>. The automaton PG is then <{a, b, c, (,), E, e, U,*, *},{R, a, b, c, (,), E, e, U,*, *},is}, S, 8G, R, 0>, where 8G is comprised of the following nonempty transitions: Sec. 10.2 Equivalence of PDAs and CFGs 8G(s, (, R) ={(s, R*R», (s, RUR», (s, R)*)} 8G(s, a, R) ={(s, >..)} 8G(s, b, R) = {(s, >..)} 8G(s, c, R) ={(s, >..)} 8G(s, E, R) = {(s, >..)} 8G(s, 4l, R) = {(s, >..)} 8G(s, a, a) ={(s, >..)} 8G(s, b, b) ={(s, >..)} 8G(s, c, c) = {(s, >..)} 8G(s , 4l, 4l) = {(s, >..)} 8G(s , E, E) = {(s, >..)} 8G(s, U, U) ={(s, >..)} 8G(s, -, 0) ={(s, >..)} 8G(s, *, *) ={(s, >..)} 8G(s, ) , ») = {(s, >..)} 8G(s, (, 0 = {(s, >..)} 341 In this grammar, it happens that the symbol (is never pushed onto the stack, and so the last transition is not utilized. Transitions not listed are empty; that is, they are of the form 8G(s, d, A) ={ }. Consider the string (aU(boc», which has the following (unique) derivation: R~(RUR) ~(aUR) ~(aU(RoR» :::} (aU(boR» ~(aU(boc» PG simulates this derivation with the following steps: (s, (aU(boc», R) f- (s, aU(boc», RUR» f- (s,U(boc», UR» f- (s, (boc» , R» f- (s, b-cj], RoR») 342 Pushdown Automata f-(s, -cj), oR») f-(s, cj), R») f-(s, », ») f-(s,),» f-(s, A,A) I I I I Chap. 10 Figure 10.5 illustrates the state of the machine at several points during the move sequence. At each point when an R is the top stack symbol and the input tape head is scanning a (, there are three choices of productions that might have generated the opening parenthesis, and consequently the automaton has three choices with which to replace the R on the stack. If the wrong choice is taken, PG will halt at some future point. For example, if the initial move guessed that the first parenthesis was due to a concatenation operation, the move sequence would be (s, (aU(boc», R) f- (s, aU(boc», RoR» f- (s, U(boc», -R) Since there are no A-moves and the entry for 8G(s, U, 0) is empty, this attempt can go no further. A construction such as the one given in Definition 10.6 can be shown to produce the desired automaton for any context-free grammar in Greibach normal form. (a) Figure 10.5 (a-f) Walkthrough of the pushdown automaton discussed in Example 10.7 Sec. 10.2 Equivalence of PDAs and CFGs 343 u R ) (b) (c) V Theorem 10.2. Given any alphabet I, '€I ~ Cf}I. In particular, for any context-free grammar G, there is a pushdown automaton that accepts (via empty stack) the language generated by G. Proof. Let G' be any context-free grammar. Theorem 9.6 guarantees that there is a pure Greibach normal form grammar G = <0, I, S, P> for which 344 Pushdown Automata (d) Chap. 10 (e) L(G) =L(G') {11.}. If 1I.~L(G'), the PDA Pa from Definition 10.6 can be used directly. If 11. E L(G'), then there is a Greibach normal form grammar Gil= <Ou{Z},I,Z,P U{Z~S,Z~1I.}>, which generates L(G'), and the state transition function for L(G') should then include the move 3G(s, 11., Z) == {(s,S), (s, 11.)} to reflect the two Z-rule's. The bottom of the stack symbol would then be Z, the new start symbol. Sec. 10.2 Equivalence of PDAs and CFGs (f) 345 In either case, induction on the number of moves in a sequence will show that ('fix E I*)('fI13 E (I U fl)*)«s, x, S) ~ (s, A,13) iff S~x 13 as a leftmost derivation). Note that x 13 is likely to be a sentential form that still contains nonterminals. The words x that result in an empty stack (13 = A) will then be exactly those words that produce an entire string of terminal symbols from the start symbol S (or Z in the case where the grammar contains the two special Z-rules). In other words, L(G') = A(P G) . Ii Given a context-free grammar, the definition of an equivalent PDA is easy once an appropriate GNF grammar is in hand. In Example 10.6, the grammar was already in Greibach normal form. To find a PDA for the grammar in Chapters 8 and 9 that generates regular expressions, the grammar <{R},{a, b,c, (,), E,e, U,', *},R,{R~alblcIElel(R'R)I(RUR)IR*}> would have to be converted to Greibach normal form. The offending left-recursive production R~ R* would have to be replaced, resulting in an extra nonterminal and about three times as many productions. The definition of the PDA for this grammar would be correspondingly more complex (see the exercises). Since every context-free language can be represented by a pushdown automaton with only one state, one might suspect that more complex PDAs with extra states may be able to define languages that are more complex than those in '€I. It turns out that extra states yield no more cognitive power; the information stored within the finite-state control can effectively be stored on the stack tape. This will follow from the fact that the converse of Theorem 10.2, that context-free grammars have equivalent pushdown automata, is also true. 346 Pushdown Automata .' Chap. 10 Defining a pushdown automaton based on a context-free grammar is not as elegant as the construction presented in Definition 10.6, but the idea is to have the leftmost derivations in the grammar correspond to successful move sequences in the PDA. V Definition 10.7. Let P = <I, f, S, so,8, B, 0> be a pushdown automaton. Define the grammar Gp= <0, I, Z, Ps>, where 0= {Z}U{NjAEf, s, tE S} and Pp = {Z~BSOllt E S} U {A'q~ aAiIIÃltzAjzI3 ... A,;;n_-/mA,;;nq!A E I', a E I U {X.}, (r, A IA2 • •• Am) E 8(s, a, A), s, q, r, t., t2 , ••• ,tmE S} U {A'r~als, rE S, A E f, a E I U {X.}, (r, X.) E 8(s, a,A)} Note that when m = 1, the transition (r, AI) E 8(s, a, A) gives rise to a rule of the form A'q~ aAiqfor each state q E S. EXAMPLE 10.8 Consider again the pushdown automaton from Example 10.4, defined by P2 = <{a, b}, is, C},it}, t, 8, S, 0>, where 8 is given by 8(t, a, S) = {(t, SC), (t, C)} 8(t, a, C) ={ } 8(t,b,S)={ } 8(t, b, C) = {(t, X.)} Since there is but one state and two stack symbols, the nonterminals for the corresponding grammar Gpzare 0 = {Z, S'', C'']. Ppz can be calculated as follows: Z~ S'' is the only rule arising from the first criteria for productions. Since 8(t, a, S) contains (t, SC), a move that produces two stack symbols, m = 2 and the resulting production is Sll~ aSllCll. The only other rule due to the second criteria arises because 8(t, a, S) contains (t, C), which with m = 1 yields Stl~ aCll. Finally, (t, X.) E 8(t, b, C) causes Cll~ b to be added to the production set. The resulting grammar is therefore Gpz= <{Z, S'', C"},{a, b}, Z, {Z~ S", Sll~ aSllet, Sll~ aCt, Cll~ b}> and Gpzdoes indeed generate {anbnln ;:::: 1}and is therefore equivalent to P2• EXAMPLE 10.9 Now consider the pushdown automaton from Example 10.1, defined by PI= <{a, b},{A,B},{q,r},q,8,B,0>, Sec. 10.2 Equivalence of PDAs and CFGs 347 where the nonempty transitions were 8(g, a, B) ={(g, A)} 8(g, a, A) ={(g, AA)} 8(g, b, A) = {(r, A)} 8(r, b, A) ={(r, A)} Since there are two stack symbols and two choices for each of the state superscripts, thenonterminalset for the grammar GP1is r ={Z, Bq\ Bqr, Br\ s-, Aq\ Aqr, N\ Arr}, although some of these will turn out to be useless. PP 1 contains the Z-rules Z~ Bqq and Z~ B" from the first criteria for productions. The transition 8(g, a, B) = {(g, A)} accounts for the productions B"~ aAqr and Bqq~ aNq. 8(g, a, A) ={(g, AA)} gives rise to the Nq-rules Aqq~ aAqqAqq and Aqq~aAqrN\ and the Aqr-rules Aqr~ aAqqAqr and Aqr~ aAqrArr. 8(g, b, A) ={(r, A)} accounts for another Aqr-rule, Aqr~ b. Finally, the transition 8(r, b, A) = {(r, A)} generates the only Arr-rule, Arr~ b. Note that some of the potential nonterminals (B'", B'", B") are never generated, and others (Aq\ Bqq) cannot produce terminal strings. The resulting grammar, with useless items deleted, is given by G = <{Z B" Aqr A"} {a b} Z {Z~Bqr Bqr~aAqr Aqr~aAqrArr Aqr~b Arr~b}> Pl: ' , , "" , , " and GP1 generates the language PI recognizes: {anbnln ;:.:: I}. Notice that the move seguence (g, aaabbb, B) I- (g, aabbb, A) I- (q, abbb, AA) I- (q, bbb, AAA) I-(r, bb,AA) I- (r, b,A) Hr, A,A) corresponds to the leftmost derivation =>aaAqrArr => aaaAqrArrArr => aaab.A"Arr =>aaabbArr =>aaabbb 348 Pushdown Automata , . , Chap. 10 Note the relationship between the sequence of stack configurations and the nonterminals in the corresponding sentential form. For example, when aaa has been processed by PI> AAA is on the stack, and when the leftmost derivation has produced aaa, the remaining nonterminals are also three A-based symbols (AqrNrArr). Aqr denotes a nonterminal (which corresponds to the stack symbol A) that will eventually produce a terminal string as the stack shrinks below the current size during a sequence of transitions that lead from state q to state r. This finally happens in the last of the following steps, where aaaAqrArrArr~aaabbb. Arr, by contrast, denotes a nonterminal (again corresponding to the stack symbol A) that will produce a terminal string as the stack shrinks in size during transitions from state r back to state r. In this example, this occurs in the next to the last two steps. The initial stack symbol position held by B is finally vacated during a sequence of transitions from q to r, and hence B" appears in the leftmost derivation. On the other hand, it was not possible to vacate B's position during a sequence of moves from q to q, so Bqq consequently does not participate in significant derivations. The strong correspondence between profitable move sequences in P and valid leftmost derivations in Gp forms the cornerstone of the following proof. V Theorem 10.3. Given any alphabet ~, 'l}I ~ '€I' In particular, for any pushdown automaton P, there is a context-free grammar Gp for which L(G p) = A(P). Proof. Let P = <~, I', S, sO, 8, B, 0> be a pushdown automaton, and let Gp be the grammar given in Definition 10.7. The key to the proof is to show that all words accepted by empty stack in the PDA P can be generated by Gp and that only such words can be generated by Gp • That is, we wish to show that the automaton halts in some state t with an empty stack after processing the terminal string x exactly when there is a leftmost derivation of the form Z~NOI~X That is, ('Ix E~*)(Z~BSOI~X ~ (so,x,B)~(t,A,A») The desired conclusion, that L (Gp) = A(P), will follow immediately from this equivalence. The equivalence does not easily lend itself to proof by induction on the length of x; indeed, to progress from the mth to the m + 1st step, a more general statement involving more of the nonterminals of Gp is needed. The following statement can be proved by induction on the number of moves and leads to the desired conclusion when s = So and A = B: ('Ix E ~*)('1A E f)('1s E S)('1t E S)(NI~X ~ (s,x,A) ~(t, A,A») The resulting grammar will then generate A(P), but Gp may not be a strict contextfree grammar; A-moves may result in some productions of the form N r~ A, which will then have to be "removed," as specified by Exercise 9.16. A Sec. 10.3 Equivalence of Acceptance by Final State and EmptyStack 349 Thus, ~I = <€I' Furthermore, only one state in a PDA is truly necessary, as noted in the followingcorollary. In essence, this means that for PDAs that accept by empty stack any state information can be effectively encoded with the information on the stack. V Corollary 10.1. For every PDA P that accepts via empty stack, there is an equivalent one-state PDA P' that also accepts via empty stack. Proof. Let P be a PDA that accepts via empty stack. Let P' = PGp' That is, from the original PDA P, find the corresponding context-free grammar Gp• By Theorem 10.3, this is equivalent to P. However, by Theorem 10.2, the grammar Gp has an equivalent one-state PDA, which must also be equivalent to P. b. Unlike the pushdown automata discussed in this section, PDAs that accept via final state cannot always make do with a single state. As the exercises will make clear, at least one final and one nonfinal state are necessary. Unlike DFAs, PDAs with only one state can accept some nontrivial languages, since selected words can be rejected because there is no appropriate move sequence. However, a single final state and a single nonfinal state are sufficient, as shown in the following section. 10.3 EQUIVALENCE OFACCEPTANCE BYFINAL STATE AND EMPTYSTACK In this section, we explore the ramifications of accepting words according to the criteria that a final state can be reached after processing all the letters on the input tape, rather than the criteria that the stack is emptied. Theorem 10.4 will show that any language that can be accepted via empty stack can also be accepted via final state. In terms of Definition 10.5, this means that ~I C ?:FI . Since ~I = <€I, this means that every context-free language can be accepted by a PDA via final state. Theorem 10.5 ensures that no "new" languages can be produced by pushdown automata that accept via final state; ?:FI C ~I, and so ?:FI = ~I = <€I' As in the last section, the key to the correspondence is the definition of an appropriate translation from one finite representation to another. We first consider a scheme for modifying a PDA so that the new device can transfer to a final state whenever the old device was capable of emptying its stack. To do this, we need to place a "buffer" symbol at the bottom of the stack, which will appear when the original automaton would have emptied its stack. The new machine operates in almost the same fashion as the original automaton; the differences amount to an additional transition at the start of operation to install the new buffer symbol and an extra move at the end of operation to transfer to the (new) final state. V Theorem 10.4. Every pushdown automaton P that accepts via empty stack has an equivalent two-state pushdown automaton Pf that accepts via final state. 350 Pushdown Automata Chap. 10 Proof. Corollary 10.1 guaranteed that every pushdown automaton that accepts via empty stack has an equivalent one-state pushdown automaton that also accepts via empty stack. Without loss of generality, we may therefore assume that P = <~, I', {s}, s, B,B, 0>. Define Pr by choosing a new state f and two new stack symbols Y and Z such that Y, Z $. I', and let P, = <~, I' U{Y, Z}, {s,f},s, Br,Z, {f}>, where Bris defined by: 1. Br(s, >.., Z) = {(s,BY)} 2. (Va E ~)(VA E f)(Ms, a, A) = B(s,a, A)) 3. (VA E f)(Br(s, >.., A) = B(s, >.., A)) 4. Br(s, >.., Y) ={(f,Y)} 5. (Va E ~)(Br(s, a, Z) = { } 1\ Br(s, a, Y) = { }) 6. (Va E ~ U {>..})(VA E r U {Y, Z})(Br(f, a, A) ={ }) Notice that rules 2 and 3 imply that, while the original stack symbols appear on the stack, the machine moves exactly as the original PDA. Rules 5 and 6 indicate that rio letters can be consumed while there is a Y or Z on the stack, and no moves are possible once the final state f is reached. Since the bottom of the stack symbol is now the new letter Z, rule 1 is the only rule that initially applies; Its application results in a configuration very much like that of the old PDA, with the symbol Y underneath the old bottom of the stack symbol B. Pr now simulates P until the Y is uncovered (that is, until a point is reached in which the old PDA would have emptied its stack). In such cases (and only in such cases), rule 4 applies, and control can be transferred to the final state f, and Prmust then halt. By inducting on the number of moves in a sequence, it can be shown for any a, 13 E I'" that (Vx,Y E ~*)«s, xy, a) ~ (s,y, 13) in P ¢::> (s,xy, a Y) ~ (s,y, I3Y) in Pr) From this, with y = 13 = >.. and a = B, it follows that (Vx E ~*)«s,x, B) ~ (s, x, >..) in P ¢::> (s,x, BY) ~ (s, >.., Y) in Pr) Consequently, since Br(s, >.., Z) ={(s,BY)} and Br(s, >.., Y) ={(f,Y)}, (Vx E ~*)«s,x, B) ~ (s, x,>..) in P ¢::> (s, x, Z) ~ (f, >.., Y) in Pr) which implies that (Vx E ~*)(x E A(P) ¢::> x EL(Pr)), as was to be proved. ~ Thus, every language which is A(P) for some PDA can be recognized by a PDA that accepts via final state, and this PDA need only employ one final and one nonfinal state. Thus, C!Jl~ ~ f!j~. One might conjecture that f!j~ might actually be larger than C!Jl~, since some added capability might arise if more than two states are used in a pushdown automaton that accepts via final state. This is not the case, as demonstrated by the following theorem. Once again, the information stored in the Sec. 10.3 Equivalence of Acceptance by Final State and Empty Stack 351 finite control can effectively be transferred to the stack; only one final and one nonfinal state are needed to accept any context-free language via final state, and context-free languages are the only type accepted via final state. V Theorem 10.5. Every pushdown automaton P that accepts via final state has an equivalent pushdown automaton PI.. that accepts via empty stack. Proof. Assume that P= <I,f,S,so,8,B,F>. Define PI.. by choosing a new stack symbols Y and Z such that Y, Z $. I' and a new state e such that e $. S, and let PI.. = <I, I' U{Y,Z}, S U {e}, so, 81.., Z, 0>, where 81.. is defined by: 1. 8A(so, A, Z) = {(so, BY)} 2. (Va E I)(VA E f)(Vs E S)(8A(s, a, A) = 8(s, a, A)) 3. (VA E f)(Vs E S F)(8A(s, A,A) = 8(s, A,A)) 4. (VA E f)(Vf E F)(8A(f , A,A) = 8(f, A, A) U{(e, A)}) 5. (VA E f)(8A( e, A,A) = {(e, A)}) 6. 8A(e , A, Y) = {(e, A)} The first rule guards against PI.. inappropriately accepting if P simply empties its stack (by padding the stack with the new stack symbol Y). The intent of rules 2 through 4 is to arrange for PI.. to simulate the moves of P and allow PI.. to enter the state e when final states can be reached. The state e does not allow any further symbols tobe processed, but does allow the stack contents (including the new buffer symbol) to be emptied via rules 5 and 6. Thus, PI.. has a sequence of moves for input x that empties the stack exactly when P has a sequence of moves that leads to a final state. By inducting on the number of moves in a sequence, it can be shown for any a, 13 E I" that (Vx, y E I*)(Vs, t E S)«s,xy, a) ~ (t, y, 13) in P <=> (s,xy, a Y) ~ (t,y, I3Y) in PI..) From this, with y = A, a = B, and t E F, it follows that (Vx,y E I*)(Vt E F)«so,x, B) ~ (t, A,13) in P <=> (so,x, BY) ~ (t, A, I3Y) in PI..) Consequently, since 8A(so, A, Z) = {(so, BY)} and 8A(t, A, A) contains (e, A), repeated application of rules 5 and 6 implies (Vx,y E I*)(Vt E F)«so,x, B) ~ (t, A, 13) in P <=> (so,x, Z) ~ (t, A, A) in PI..) This shows that (Vx E I*)(x E A(PA) <=> x E L(P)). Ll Thus, ~l k~l, and so ~l = ~l = '€l' Acceptance by final state yields exactly the same class of languages as acceptance by empty stack. This class of languages, described by these cognitive constructs, has been encountered before and can be 352 Pushdown Automata Chap. 10 defined by the generative constructs which comprise the context-free grammars. Note that since the type 3 languages are contained in the type 2 languages, the portion of Theorem 10.1 dealing with pushdown automata follows immediately from the results in this and the previous section. 10.4 CLOSURE PROPERTIES AND DETERMINISTIC PUSHDOWN AUTOMATA Since the collection of languages recognized by pushdown automata is exactly the collection of context-free languages, the results in Chapter 9 show that ~l is closed under substitution, homomorphism, union, concatenation, and Kleene closure. Results for context-free languages likewise imply that ~l is not closed under complement or intersection. It is hard to imagine how to find a method that would combine two contextfree grammars to produce a new context-free grammar that might accept the intersection of the original languages. The constructs for regular expressions and regular grammars likewise did not lend themselves to such methods, and yet it was possible to find appropriate constructs that did represent intersections. As presented in Chapter 5, this was possible by turning to the cognitive representation for this class of languages, the deterministic finite automata. It is instructive to recall the technique that allowed two DFAs A1 and Az to be combined to form a new DFA An that accepts the intersection of the languages accepted by the original devices and to see why this same method cannot be adapted to pushdown automata. The automaton An used a cross product of the states of A1 and Az to simultaneously keep track of the progress of both DFAs through an appropriate revamping of the state transition function. An only accepted strings that would have reached final states in both A1 and Az. Two pushdown automata P1 and Pzmight be combined into a new PDA pn using the cross-product approach, but the transition function for this composite PDA cannot be reliably defined. A problem arises since the 8 function depends on the top stack symbol, and it is impossible to keep track of both the original stacks through any type of stack encoding, since the stack size of P1 might be increasing while the stack size of Pz is decreasing. A device like the one depicted in Figure 10.6 could be capable of recognizing the intersection of two context-free languages, but such a machine is inherently more powerful than PDAs. The language {anbncnIn;::: O} is not context free, yet a two-tape automata could recognize this set of words by storing the initial as on the first stack tape, match them against the incoming bs while storing those bs on the second tape, and then matching the cs against the bs on the second tape (see the exercises). If one were to attempt to intersect a context-free language with a regular language, one would expect the result to be context free, since the corresponding cross-product construct would need only one tape. This is indeed the case, as shown by the following theorem. V Theorem 10.6. '€l is closed under intersection with a regular set. That is, if L1 is context free and Rz is regular, L1 n Rz is always context free. " , Sec. 10.4 Closure Properties and Deterministic Pushdown Automata 353 Finite State B Control Figure 10.6 A model of a "pushdown automaton" with two tapes Proof. Let L1 be a context-free language and let R2 be a regular set. Since '€I = ?}iI, there must be a PDA P1 = <I, r, Sl, SOl' 8l, Bl, F1> for which L1=L (P1). Let A2= <I, S2, sllz' 82,F2> be a DFA for which R2= L (A2). Define pn = <I, r; S1 x S2, (SOl' sllz)' 8n , Bl, F1 x F2>, where 8n is defined by: 1. (V'S1 E S1)(V'S2 E S2)(V'a E I)(V'A E f 1) (8n( (s., S2), a, A) = {«tl, t2), (3) I(tl, (3) E 81(sl, a, A) /\ t2= 82(S2' am. 2. (V'S1 E S1)(V'S2 E S2)(V'A E f 1) (8n«sl, S2), A,A) = {«tl, t2), (3) I(tl, (3) E 81(sl, A,A) /\ t2= S2})' As with the constructions in the previous sections, induction on the number of moves exhibits the desired correspondence between the behaviors of the machines. In particular, it can be shown for any ex, [3 E I'" that (V'x E I*)(V'sl, t1E S1)(V'S2, t2E S2)«(Sl, S2),XY, ex> ~ «tl, t2),y, (3)'in P1 ~ «(SbXy, ex) ~ (tl,y, (3) in pn) /\ (t2 =82(S2' X))) ) From this, with ex = B1and the observation that (tl, t2) E F1 x F2 iff t1E F1 /\t2E F2, it follows that (V'x E I*)(x EL(pn)~ (x EL(P1) /\ x EL(A2»). Therefore, L(pn) =L(P1) nL(A2) = L1 n R2. Since pn is a PDA accepting L1n R2, L1n R2must be context free. A Closure properties such as this are quite useful in showing that certain languages are not context free. Consider the set L = {x E {a, b, c}I Ix I. = Ix Ib = Ix Ie}. Since the letters in a word can occur in any order, a pumping theorem proof is less 354 Pushdown Automata Chap. 10 .. straightforward than for the set {anbncn!n 2:0}. However, if L were context free, then L n a*b*c* would also be context free. But L n a*b*c* = [ab''c" In 2: O}, and thus L cannot be context free. The exercises suggest other occasions for which closure properties are useful in showing certain languages are not context free. For the machines discussed in the first portion of this text, it was seen that nondeterminism did not add to the computing power of DFAs. In contrast, this is not the case for pushdown automata. There are languages that can be accepted by nondeterministic pushdown automata that cannot be accepted by any deterministic pushdown automaton. The following is the broadest definition of what can constitute a deterministic pushdown automaton. V Definition 10.8. A deterministic pushdown automaton (DPDA) is a pushdown automaton P = <I, I', S, so, 8, B, F> with the following restrictions on the state transition function 8: 1. (Va E I)(VA E r)(Vs E S)(8(s, a, A) is empty or contains just one element). 2. (VA E f)(Vs E S)(8(s, A, A) is empty or contains just one element). 3. (VA E f)(Vs E S)(8(s, A, A) =I 0 ::? (Va E I)(8(s, a, A) = 0)). ~ Rule 1 states that, for a given input letter, deterministic pushdown automata cannot have two different choices of destination states or two different choices of strings to place on the stack. Rule 2 ensures that there is no choice of A-moves either. Furthermore, rule 3 guarantees that there will never be a choice between a A-move and a transition that consumes a letter; states that have a A-move can have only that one move; no other transitions of any type are allowed out of that state. Thus, for any string, there is never any more than one path through the machine. Unlike deterministic finite automata, deterministic pushdown automata may not always completely process the strings in I*; a given string may reach a state that has no further valid moves, or a string may prematurely empty the stack. In each case, the DPDA would halt without processing any further input. EXAMPLE 10.10 The automaton P1 in Example 10.1 was deterministic. The PDAs in Examples 10.4 and 10.5 were not deterministic. The automaton PG derived in Example 10.7 was not deterministic because there were three possible choices of moves listed for 8G(s, (, R): {(s,RoR», (s, RUR», (s, R)*)}. These choices corresponded to the three different operators that might have generated the open parenthesis. Pushdown automata provide an appropriate mechanism for parsing sentences in programming languages. The regular expression grammar in Example 10.7 is quite similar to the arithmetic expression grammar that describes expressions in many programming languages. Indeed, the transitions taken within the correspondSec. 10.4 Closure Properties and Deterministic Pushdown Automata 355 ing PDA give an indication of which productions in the underlying grammar were used; such information is of obvious use in compiler construction. A nondeterministic pushdown automaton is at best a very inefficient tool for parsing; a DPDA is much better suited to the task. As mentioned in the proof of Theorem 10.2, each leftmost derivation in G has a corresponding sequence of moves in PG. If G is ambiguous, then there is at least one word with two distinct leftmost derivations, and hence if that word appeared on the input tape of PG, there would be two distinct move sequences leading to acceptance. In this case, PGcannot possibly be deterministic. On the other hand, if PGis nondeterministic, this does not mean that G is ambiguous, as demonstrated by Example 10.7. In parsing a string in that automaton, it may not be immediately obvious which production to use (and hence which transition to take), but for any string, there is at most only one correct choice; each word has a unique parse tree and a unique leftmost derivation. The grammar in Example 10.7 is not ambiguous, even though the corresponding PDA was nondeterministic. EXAMPLE 10.11 The following Greibach normal form grammar is similar to the one used to construct the PDA in Example 10.7, but with the different operators paired with unique delimiters. Let G = <{R},{a,b,c,(,),{,},[,],E,~,U,*,*},R, {R~ albjc]e I~I(R*R)I [RURl I{R}*}>. The automaton PGis then <{a, b, c, (,), {,}, [, l,E,~, U,*, *},{R, a, b, c, (,), {,}, [, l, E,~, U,*, *},{s}, S, BG, R, 0> where BGis comprised of the following nonempty transitions: BG(s, (, R) ={(s, R.R»} BG(s, [, R) = {(s,RURl)} BG(s, {, R) ={(s,R}*)} BG(s, a, R) = {(s, A)} BG(s, b, R) = {(s,A)} BG(s, c, R) = {(s,A)} BG(s, E, R) = {(s, A)} BG(s,~, R) ={(s, A)} BG(s, a, a) ={(s, A)} BG(s, b, b) = {(s,A)} BG(s, c, c) ={(s, A)} BG(s,~,~) ={(s, A)} 356 Pushdown Automata Chap. 10 8G(s, E, E) ={(s, A)} 8G(s, U, U) ={(s, A)} 8G( s, " .) ={(s, A)} 8G(s, *, *) = {(s, A)} 8G(s, ) , » ={(s, A)} 8G(s, ], ]) = {(s, A)} 8G(s,},}) = {(s,A)} 8G(s, (, 0 = {(s, A)} 8G(s , [, D== {(s, A)} 8G(s,{, {) ={(s, A)} All other transitions are empty; that is, they are of the form 8G(s, d, A) ={ }. The resulting PDA is clearly deterministic, since there are no A-movesand the other transitions are all singleton sets or are empty. It is instructive to step through the transitions in PG for a string such as [{(a'b)}*Uc]. Upon encountering a delimiter while scanning a prospective string, the parser would immediately know which operation gave rise to that delimiter, and need not "guess" at which of the three productions might have been applied. Note that G was an LLOgrammar (as defined in Section 9.2), and the properties of G resulted in PG being a deterministic device. An efficient parser for this language follows immediately from the specification of the grammar, whereas the grammar in Example 10.7 gave rise to a nondeterministic device. Programmers would not be inclined to tolerate remembering which delimiters should be used in conjunction with the various operators, and hence programming language designers take a slightly different approach to the problem. The nondeterminism in Example 10.7 may only be an effect of the particular grammar chosen and not inherent in the language itself. Note that the language {anbnln ?:: I} had a grammar that produced a nondeterministic PDA (Example 10.4), but it also had a grammar that corresponded to a DPDA (Example 10.1). In compiler construction, designers lean toward syntax that is compatible with determinism, and they seek grammars for the language that reflect that determinism. EXAMPLE 10.12 Consider again the language discussed in Example 10.7, which can also be expressed by the following grammar H = <{S, T},{a, b, c, (,), E, 0, U,', *},S, {S~ (STlalb ICJEI0,T~ -sj] US) I)*}> The automaton PH is then <{a, b, c, (,), E, 0,U,', *},{S,T, a, b, c, (,), E, 0,U,', *},{t},t, 8H, S, 0> Sec. 10.4 Closure Properties and Deterministic Pushdown Automata 357 .., where each production of H gives rise to the following transitions in &H: &H(t, (, S) ={(t, ST)} &H(t, a, S) ={(t, A)} &H(t, b, S) ={(t, A)} &H(t, C, S) ={(t, A)} &H(t, E, S) = {(t, A)} &H(t; tl, S) ={(t, A)} &H(t, -, T) ={(t, S»} &H(t, U, T) ={(t, S»} &H(t,),T) = {(t, *)} While the formal definition of &H specifies several productions of the form &H(t, d, d) ={(t, A)}, by observing what can be put on the stack by the above productions, it is clear that the only remaining useful transitions in &H are &H(t, *, *) ={(t, A)} and &H(t,),» ={(t, A)} Thus, even though the PDA PG in Example 10.7 turned out to be nondeterministic, this was not a flaw in the language itself, since PH is an equivalent DPDA. Notice that the grammar G certainly appears to be more straightforward than H. G had fewer nonterminals and fewer productions, and it is a bit harder to understand the relationships between the nonterminals of H. Nevertheless, the LLOgrammar H led to an efficient parser and G did not. To take advantage of the resulting reduction in complexity, all major programming languages are designed to be recognized by DPDAs. These constructs naturally lead to a mechanical framework for syntactic analysis. In Example 10.12, the application of the production T~ US) [that is, the use of the transition &H(t, U, T) = {(t, S»} ] signifies that the previous expression and the expression to which S will expand are to be combined with the union operator. It should be easy to see that a similar grammar and DPDA for arithmetic expressions (using +, -, *, and / rather than U, " and *) would provide a guide for converting such expressions into their equivalent machine code. Deterministic pushdown automata have some surprising properties. Recall that ~~ was not closed under complementation, and since 'lJ~ = ~~, there must be some PDAs that define languages whose complement cannot be recognized by any PDA. However, it can be shown that any language accepted by a DPDA must have a complement that can also be recognized by a DPDA. The construction used to prove this statement, in which final and nonfinal states are interchanged in a DPDA 358 Pushdown Automata Chap. 10 that accepts via final state, is similar to the approach used in Theorem 5.1 for deterministic finite automata. It is useful to recall why it was crucial in the proof of Theorem 5.1 to begin with a DFA when interchanging states, rather than using an NDFA. Strings that have multiple paths in an NDFA that lead to both final and nonfinal states would be accepted in the original automaton and also in the machine with the states interchanged. Furthermore, some strings may have no complete paths through the NDFA and be rejected in both the original and new automata. The problem of multiple paths does not arise with DPDAs, since by definition no choice of moves is allowed. However, strings that do not get completely consumed would be rejected in both the original DPDA and the DPDA with final and nonfinal states interchanged. Thus, the proof of closure under complement for DPDAs is not as straightforward as for DFAs. There are three ways an input string might not be completely consumed: the stack might empty prematurely, there may be no transition available at some point, or there might only be a cycle of A-moves available that consumes no further input. The exercises indicate that it is possible to avoid these problems by padding the stack with a new bottom-of-the-stack symbol, and adding a "garbage state" to which strings that are hopelessly stuck would transfer. V Theorem 10.7. If L is a language recognized by a deterministic pushdown automaton, then -L can also be recognized by a DPDA. Proof. See the exercises. V Definition 10.9. Given any alphabet I, let sIlI, represent the collection of all languages recognized by deterministic pushdown automata. IfL E sIlI" then L is said to be a deterministic context-free language (DCFL). d Theorem 10.7 shows that unlike rJ}I" sIlI, is closed under complementation. This divergent behavior has some immediate consequences, as stated below. V Theorem 10.8. Let I be an alphabet. If II1= 1, then ~I, =sIlI, =rJ}I,. If II I> 1, then ~I, is properly contained in sIlI" which is properly contained in ~I,. Proof. For every alphabet I, examining the proof of Theorem 10.1 shows that every finite automaton has an equivalent deterministic pushdown automaton, and thus it is always true that ~I, ~ sIlI,. Definition 10.7 implies that sIlI,~ ~I,. If II1= 1, then Theorem 9.15 showed that ~I, = 'f6I, (= rJ}I,), from which it follows that ~I, = sIlI, = rJ}'$.' If II I> 1, an example such as {anbnIn 2: l}shows that ~I, is properly contained in sIlI, (see the exercises). Since rJ}{B,b} and sIl{B,b} have different closure Sec. 10.4 Closure Properties and Deterministic Pushdown Automata 359 properties, they cannot represent the same collection, and stl"i, C '!J"i, implies that the containment must be proper. ~ In the proof of Theorem 10.6, it is easy to see that if Pl is deterministic then pn will be a DPDA, also. Hence stl"i" like '!J"i" is closed under intersection with a regular set. Also, the exercises show that both stl"i, and '!}"i,are closed under difference with a regular set. However, the closure properties of stl"i, and '!}"i, disagree in just about every other case. The languages L, = {anbml (n ~ l)/\(n = m)} and L, = {anbml (n ~ l)/\((n = 2m)} are both DCFLs, and yet L, U L, = [a'b" I(n ~ 1) /\ (n = m V n = 2m)} is not a DCFL (see the exercises). Thus, unlike '!}"i" stl"i, is not closed under union if I is comprised of at least two symbols (recall that since ~{B} = stl{B) = '!J{B), stl{B} would be closed under union). If stl"i, was closed under intersection, then stl"i, would by De Morgan's law be closed under union, since it is closed under complement. Hence, stl"i, cannot be closed under intersection. The language {cnbm!(n ~ l)/\(n = m)} U {anbml(n ~ l)/\(n = 2m)} is definitely a DCFL, and yet a simple homomorphism can transform it into [a'b" I(n ~ 1) /\ ((n = m) V (n = 2m))} (see the exercises). Thus, stl"i, is not closed under homomorphism. Since homomorphisms are special cases of substitutions, stl"i, is not closed under substitution either. stl"i, is also the only collection of languages discussed in this text that is not closed under reversal; [ca'b" I(n ~ 1) /\ (n = m)} U {anbmI(n ~ 1) /\ (n = 2m)} is a DCFL, but {bmancl (n ~ 1) /\ (n = m)} U {bmanl (n ~ 1) /\ (n = 2m)} is not. These properties are summed up in the following statements. V Theorem 10.9. Given any alphabet I, stl"i, is closed under complement. stl"i, is also closed under union, intersection, and difference with a regular set. That is, if L, is a DCFL and R, is a FAD language, then the following are deterministic, context-free languages: -r., LlnRz LlURz Ll-Rz Rz-Ll Proof. The proof follows from the above discussion and theorems and the exercises. ~ V Lemma 10.1. Let I be an alphabet comprised of at least two symbols. Then stl"i, is not closed under union, intersection, concatenation, Kleene closure, homo360 Pushdown Automata I ' I Chap. 10 morphism, substitution, or reversal. That is, there are examples of deterministic context-free languages L1 and Lz, a homomorphism h, and a substitution s for which the following are not DCFLs: L1ULz L1nLz L1*Lz Lt h(L1) s(L1) LI Proof. The proof follows from the above discussion and theorems and the exercises. ~ EXAMPLE 10.13 These closure properties can often be used to justify that certain languages are not DCFLs. For example, the language L={x E{a,b,c}*llxla=lxlb}U{X E{a,b,c}*llxlb=lxlc} can be recognized by a PDA but not by a DPDA. If L were a DCFL, then ~L = {x E {a, b,c}* Ilxla=1= Ixlb} n {x E{a, b,c}* Ilxlb =1= Ixl c} would also be a DCFL. However, ~L n a*b*c* = {akbncml (k =/= n)l\(n =/= m)}, which should also be a DCFL. Ogden's lemma shows that this is not even a CFL (see the exercises), and hence the original hypothesis that L was a DCFL must be false. The interested reader is referred to similar discussions in [HOPe] and [DENN]. The restriction that the head scanning the stack tape could only access the symbol at the top of the stack imposed limitations on the cognitive power of this class of automata. While the current contents of the top of the stack could be stored in the finite-state control and be remembered after the stack was popped, only a finite number of such pops can be recorded within the states of the PDA. At some point, seeking information further down on the stack will cause an irretrievable loss of information. One might suspect that if popped items were not erased (so that they could be revisited and reviewed at some later point) a wider class of languages might be recognizable. Generalized automata that allow such nondestructive "backtracking" are called Turing machines and form a significantly more powerful class of automata. These devices and their derivatives are the subject of the next chapter. EXERCISES 10.1. Refer to Theorem 10.1 and use induction to show (Vx E I*)(8(so, x) = t <=> (so,x, ¢) ~ (t, A,¢») Chap. 10 Exercises 361 10.2. Define a deterministic pushdown automaton Pi with only one state for which A(Pl) = {anbnln 2: 1}. 10.3. Consider the pushdown automaton defined by Pz = <{a, b},{S,q, {t}, t, 8, S, {t}>, where 8 is defined by 8(t, a, S) = {(t,SC), (t, C)} 8(t,a,C)={ } 8(t, b, S) = { } 8(t, b, C) = {(t, A)} (a) Give an inductive proof that (Vi E N)«t, a', S) ~ (t, A,a) =? (a = so V a = C)) (b) Give an inductive proof that (Vi E N)«t,x, Ci) ~ (t, A, P) =? (x = bi)) (c) Find L(P2); use parts (a) and (b) to rigorously justify your statements. 10.4. Let L = {a'b'c"] i,j, kEN and i + j = k}. (a) Find a pushdown automaton (which accepts via final state) that recognizes L. (b) Find a pushdown automaton (which accepts via empty stack) that recognizes L. (c) Is there a counting automaton that accepts L? (d) Is there a DPDA that accepts L? (e) Use Definition 10.7 to find a grammar equivalent to the PDA in part (a). 10.5. Let L = {x E {a, b, c]" 114+ Ix Ib = [x],}, (a) Fi~d a pushdown automaton (which accepts via final state) that recognizes L. (b) Find a pushdown automaton (which accepts via empty stack) that recognizes L. (c) Is there a counting automaton that accepts L? (d) Is there a DPDA that accepts L? (e) Use Definition 10.7 to find a grammar equivalent to the PDA in part (a). 10.6. Prove or disprove that: (a) ClJ''I, is closed under inverse homomorphism. (b) slh is closed under inverse homomorphism. 10.7. Give an example of a finite language that cannot be recognized by anyone-state PDA that accepts via final state. 10.8. Let L = {anbncmdmln, mEN}. (a) Find a pushdown automaton (which accepts via final state) that recognizes L. (b) Find a pushdown automaton (which accepts via empty stack) that recognizes L. (c) Is there a DPDA that accepts L? (d) Is there a counting automaton that accepts L? (e) Use Definition 10.7 to find a grammar equivalent to the PDA in part (b). 10.9. Refer to Theorem 10.2 and use induction on the number of moves in it sequence to show that (Vx E I *)(VP E (I U11)*)«s, x, S) ~ (s, A, P) iff S~ x P as a leftmost derivation) 10.10. Consider the grammar <{R}, {a, b, c, (,), E,ft, U,*, *},R, {R~alblcIE Iftl(R.R)I(RUR)IR*}> 362 Pushdown Automata Chap. 10 (a) Convert this grammar to Greibach normal form, adding the new non terminal y. (b) Use Definition 10.6 on part (a) to find the corresponding PDA. (c) Use the construct suggested by Theorem 10.4 in part (b) to find the corresponding PDA that accepts via final state. 10.11. Let L = {aibjcjdili,j EN}. (a) Find a pushdown automaton (which accepts via final state) that recognizes L. (b) Find a pushdown automaton (which accepts via empty stack) that recognizes L. (c) Is there a DPDA that accepts L? (d) Is there a counting automaton that accepts L? (e) Use Definition 10.7 to find a grammar equivalent to the PDA in part (b). 10.12. Consider the PDA P3 in Example 10.5. Use Definition 10.7 to find Gp3 • 10.13. Refer to Theorem 10.3 and use induction to show (\:Ix E~*)(\:IAEr)(\:ISE5)(\:ItE5)(A't~x ~ (s,x,A)~(t,A,A») 10.14. Let L = {anbncmdmln, mEN} U {a'b'c/d'] i,j EN}. (a) Find a pushdown automaton (which accepts via final state) that recognizes L. . (b) Find a pushdown automaton (which accepts via empty stack) that recognizes L. (c) Is there a DPDA that accepts L? (d) Is there a counting automaton that accepts L? (e) Use Definition 10.7 to find a grammar equivalent to the PDA in part (b). 10.15. Consider the PDA PG in Example 10.6. Use Definition 10.7 to find GPG • 10.16. Refer to Theorem 10.4 and use induction to show (\:la, 13 E P)(\:Ix, y E ~*)(s, xy, a) ~ (s, y, 13) in P ~ (s, xy, a Y) ~ (s.y, I3Y) in Pf) 10.17. Refer to Theorem 10.5 and use induction to show (\:la, 13 E P)(\:Ix,y E ~ *)(\:Is, t E 5)(s, xy, a) ~ (t.y, 13) in P ~ (s, xy, a Y) ~ (t, y, I3Y) in P~) 10.18. Prove that {x E {a, b, c}*1 Ix I. = Ix Ib 1\ Ix Ib > IxIc} is not context free. (Hint: Use closure properties.) 10.19. (a) Give an appropriate definition for the state transition function of the two-tape automaton pictured in Figure 10.6, stating the new domain and range. (b) Define a two-tape automaton that accepts {anbncnIn 2: I} via final state. 10.20. (a) Prove that {anbncnIn 2: I} is not context free. (b) Prove that {x E {a, b, c}*1 Ix I. = Ix Ib} is not context free. [Hint: Use closure properties and apply part (a).] 10.21. (a) Find a DPDA that accepts {cnbml(n 2: 1) 1\ (n =m)}U{anbml(n 2:1)I\(n = 2m)} (b) Define a homomorphism that transforms part (a) into a language that is not a DCFL. 10.22. Use Ogden's lemma to show that {akbncml(k i= n) 1\ (n i= m)} is not a context-free language. 10.23. Refer to Theorem 10.6 and use induction to show (\:la, 13 E P)(\:Ix E ~*)(\:ISI, hE 51)(\:IsZ' tz E 5z) (SI,Sz),xy,a)~((h,tz),y,l3) in P1~«(Sl,xy,a) ~ (h,y, l3) in pn) 1\ (tz = 3z(sz,x)))) Chap. 10 Exercises 363 10.24. Assume that P is a DPDA. Prove that there is an equivalent DPDA P' (which accepts via final state) for which: (a) P' always has a move for all combinations of states, input symbols, and stack symbols. (b) P' never empties its stack. (c) For each input string presented to P', P' always scans the entire input string. 10.25. Assume the results of Exercise 10.24, and show that stl'I, is closed under complementation. (Hint: Exercise 10.24 almost allows the trick of switching final and nonfinal states to work; the main remaining problem involves handling the case where a series of X-moves may cycle through both final and nonfinal states.) . 10.26. Give an example that shows that stl'I, is not closed under concatenation. 10.27. Give an example that shows that stl'I, is not closed under Kleene closure. 10.28. Show that {canbml(n? 1) 1\ (n = m)} U {anbml(n? 1) 1\ (n = 2m)} is a DCFL. 10.29. (a) Modify the proof of Theorem 10.6 to show that if L1 is context free and R, is regular, L1 R z is always context free. (b) Prove the result in part (a) by instead appealing to closure properties for complement and intersection. 10.30. (a) Modify the proof of Theorem 10.6 to show that if L1 is context free and R, is regular, L1 U Rz is always context free. (b) Prove the result in part (a) by instead appealing to closure properties for complement and intersection. 10.31. Argue that ifL, is a DCFL and Rz is regular, Rz L1 is always a DCFL. 10.32. (a) Prove that {w2w r Iw E {O, 1}*} is a DCFL. (b) Prove that {ww'] wE {O, 1}*} is not a DCFL. 10.33. Give examples to show that even if L I and Lz are DCFLs: (a) L I * L, need not be a DCFL. (b) L I L, need not be a DCFL. (c) Li need not be a DCFL. (d) L~ need not be a DCFL. 10.34. Consider the quotient operator / given by Definition 5.10. Prove or disprove that: (a) CJ''I, is closed under quotient. (b) stl'I, is closed under quotient. 10.35. Consider the operator b defined in Theorem 5.11. Prove or disprove that: (a) CJ''I, is closed under the operator b. (b) stl'I, is closed under the operator b. 10.36. Consider the operator Ydefined in Theorem 5.7. Prove or disprove that: (a) </P'I, is closed under the operator Y. (b) stl'I, is closed under the operator Y. 10.37. Consider the operator P given in Exercise 5.16. Prove or disprove that: (a) CJ''I, is closed under the operator P. (b) stl'I, is closed under the operator P. 10.38. Consider the operator F given in Exercise 5.19. Prove or disprove that: (a) </P'I, is closed under the operator F. (b) stl'I, is closed under the operator F. c H A p T E R TURING MACHINES In the preceding chapters, we have seen that DFAs and NDFAs represented the type 3 languages and pushdown automata represented the type 2 languages. In this chapter we will explore the machine analog to the type 1 and type 0 grammars. These devices, called Turing machines, are the most powerful automata known and can recognize every language considered so far in this text. We will also encounter languages that are too complex to be recognized by any Turing machine. Indeed, we will see that any other such (finite) scheme for the representation of languages is likewise forced to be unable to represent all possible languages over a given alphabet. Turing machines provide a gateway to undecidability, discussed in the next chapter, and to the general theory of computational complexity, which is rich enough to warrant much broader treatment than would be possible here. 11.1 DEFINITIONS AND EXAMPLES Pushdown automata turned out to be the appropriate cognitive devices for the type 2 languages, but further enhancements in the capabilities of the automaton model are necessary to achieve the generality inherent in type 0 and type 1 languages. A (seemingly) minor modification will be all that is required. Turing machines are comprised of the familiar components that have already been used in previous classes of automata. As with the earlier constructions, the heart of the device is a finite-state control, which reacts to information scanned by the tape head(s). Like finite-state transducers and pushdown automata, information can be written to tape as transitions between states are made. Unlike FSTs and PDAs, Turing machines 364 Sec. 11.1 Definitions and Examples 365 have only one tape with which to work, which serves both the input and the output needs of the device. Note that with finite-state transducers the presence of a second tape was purely for convenience; a single tape, with input symbols overwritten by the appropriate output symbol as the read head progressed, would have sufficed. Whereas a pushdown automaton could write an entire string of symbols to the stack, a Turing machine is constrained to print a single letter at a time. These new devices would therefore be of less value than PDAs were they not given some other capability. In all previous classes of automata, the read head was forced to move one space to the right on each transition (or, in the case of A-moves, remain stationary). On each transition, the Turing machine tape head has the option of staying put, moving right, or moving left. The ability to move back to the left and review previously written information accounts for the added power of Turing machines. It is possible to view a Turing machine as a powerful transducer of computable functions, with an associated function defined much like those for FSTs. That is, as with finite-state transducers, each word that could be placed on an otherwise blank tape is associated with the word formed by allowing the Turing machine to operate on that word. With FSTs, this function was well defined; the machine would process each letter of the word in a unique way, the read head would eventually find the end of the word (that is, it would scan a blank), and the device would then halt. With Turing machines, there is no built-in guarantee that it will always halt; since the tape head can move both right and left, it is possible to define a Turing machine that would reverberate back and forth between two adjacent spaces indefinitely. A Turing machine is also not constrained to halt when it scans a blank symbol; it may overwrite the blank and/or continue moving right indefinitely. Rather than viewing aTuring machine as a transducer, we will primarily be concerned with employing it as an acceptor of words placed on the tape. Some variants of Turing machines are defined with a set of final states, and the criteria for acceptance would then be that the device both halt and be in a final state. For our purposes, we will employ the writing capabilities of the Turing machine and simply require that acceptance be indicated by printing a Y just prior to halting. If such a Y is never printed or the machine does not halt, the word will be considered rejected. It may be that there are words that might be placed on the input tape that would prevent the machine from halting, which is at best a serious inconvenience; if the device has been operating for an extraordinary amount of time, we may not be able to tell if it will never halt (and thus reject the word), Orwhether we simply need to be patient and wait for it to eventually print the Y. This uncertainty can in some cases be avoided by finding a superior design for the Turing machine, which would always halt, printing N when a word is rejected and Y when a word is accepted. This is not always a matter of being clever in defining the machine; we will see that there are some languages that are inherently so complex that this goal is impossible to achieve. A conceptual model of a Turing machine is shown in Figure 11.1. Note that the tape head is capable of both reading and overwriting the currently scanned symbol. As before, the tape is composed of a series of cells, with one symbol per cell. The 366 Finite State Control Turing Machines Chap. 11 Figure 11.1 A model of a Turing machine tape head will also be allowed to move one cell to either the left or right during a transition. Note that unlike all previous automata, the tape does not have a "left end"; it extends indefinitely in both directions. This tape will be used for input, output, and as a "scratch pad" for any intermediate calculations. At the start of operation of the device, all but a finite number of contiguous cells are blank. Also, unlike our earlier devices, the following definition implies that Turing machines may continue to operate after scanning a blank. V Definition 11.1. A Turing machine that recognizes words over an alphabet I is a quintuple M = <I, I', S, sO, 8>, where I is the input alphabet. r is the auxiliary alphabet, and I, I', and {L, R} are pairwise disjoint sets of symbols. S is a finite nonempty set of states (and S n (I U f) = 0). So is the start state (so E S). 8 is the state transition function 8: S x (I Uf)~ (S U{h}) x (I U r U{L, R}). The auxiliary alphabet always includes the blank symbol (denoted by #), and neither I nor r include the special symbols Land R, which denote moving the tape head left and right, respectively. The state h is a special halt state, from which no further transitions are possible; h $. s. Ll The alphabet I is intended to denote the nonblank symbols that can be expected to be initially present on the input tape. By convention, it is assumed that the tape head is positioned over the leftmostnonblank (in the case of the empty string, though, the head will be scanning a blank). In Definition 11.1, the state transition function is deterministic; for every state in S and every tape symbol scanned, exactly one destination state is specified, and one action is taken by the tape head. The tape head may either: 1. Overprint the cell with a symbol from I or r (and thus a blank may be printed). Sec. 11.1 Definitions and Examples 367 2. Move one cell left (without printing). 3. Move one cell right (also without printing). In the case where a cell is overprinted, the tape head remains positioned on that cell. The above definition of a Turing machine is compatible with the construct implemented by Jon Barwise and John Etchemendy in their Turing's Worldv software package for the Apple'" Macintosh. The Turing's World program allows the user to interactively draw a state transition diagram of a Turing machine and watch it operate on any given input string. As indicated by the next example, the same software can be used to produce and test state transition diagrams for deterministic finite automata. EXAMPLE 11.1 The following simple Turing machine recognizes the set of even-length words over {a, b}. The state transition diagram for this device is shown in Figure 11.2 and conforms to the conventions introduced in Chapter 7. Transitions between states are represented by arrows labeled by the symbol that caused the transition. The symbol after the slash denotes the character to be printed or, in the case of Land R, the direction to move the tape head. The quintuple is <{a, b},{#, Y, N}, {so, s.], so, 8T>, where 8T is given by 8T(so, a) = (sj, R) 8T(so, b) = (sj, R) 8T(so, #} = (h, Y) 8T(sj, a) = (so, R) 8T(sj, b) = (so, R) 8T(sj, #) = (h, N) This particular Turing machine operates in much the same way as a DFA would, always moving right as it scans each symbol of the word on the input tape. Figure 11.2 The state transition diagram of the Turing machine discussed in Example 11.1 368 Turing Machines Chap. 11 When it reaches the end of the word (that is, when it first scans a blank), it prints Y or N, depending on which state it is in, and halts. It differs from a DFA in that the accept/reject indication is printed on the tape at the right end of the word. Figure 11.3 shows an alternative way of displaying this machine, in which the halt state is not explicitly shown. Much like the straight start state arrow that denotes where the automaton is entered, the new straight arrows show how the machine is left. This notation is especially appropriate for submachines. As with complex programs, a complex Turing machine may be comprised of several submodules. Control may be passed to a submachine, which manipulates the input tape until it halts. Control may then be passed to a second submachine, which then further modifies the tape contents. When this submachine would halt, control may be passed on to a third submachine, or back to the first submachine, and so on. The straight arrows leaving the state transition diagram can be thought of as exit arrows for a submachine, and they function much like a return statement in many programming languages. Example 11.4 illustrates a Turing machine that employs submachines. #IY #IN Figure 11.3 An alternate depiction of the Thring machine discussed in Example 11.1 We will see that any DFA can be emulated by a Turing machine in the manner suggested by Example 11.1. The following example shows that Turing machines can recognize languages that are definitely not FAD. In fact, the language accepted in Example 11.2 is not even context free. EXAMPLE 11.2 The Turing machine M illustrated in Figure 11.4 operates on words over {a, b, c}. When started at the leftmost end of the word, it is guaranteed to halt at the rightmost end and print Y or N. It happens to overwrite the symbols comprising the input word as it operates, but this is immaterial. In fact, it is possible to design a slightly more complex machine that restores the word before halting (see Example 11.11). The quintuple is <{a, b, c},{#, X, Y, N}, {so, Sh S2, S3, S4, S5, S6}, so, 8>, where 8 is as indicated in the diagram in Figure 11.4. It is intended to recognize the language {x E {a, b, c}"I Ixls = Ixlb = Ixl.}. One possible procedure for processing a string to check if it had the same number of as, bs, and cs is given by the pseudocode below. while an a remains do begin replace a by X return to leftmost symbol Sec. 11.1 Definitions and Examples find b; if none, halt and print N replace b by X return to leftmost symbol find c; if none, halt and print N replace c by X return to leftmost symbol end halt and print Y if no more bs nor cs remain 369 States So and Sl in Figure 11.4 check the while condition, and states S2 through S6 perform the body of the do loop. On each iteration, beginning at the leftmost symbol, state So moves the tape head right, checking for symbols that have not been replaced by X. If it reaches the end of the word (that is, if it scans a blank), the as, bs, and cs all matched, and it halts, printing Y. If b or c is found, state 1 searches for as; if the end of the string is reached without finding a corresponding a, the machine halts with N, since there were an insufficient number of as. From either So or Sh control passes to S2 when an a is scanned, and that a is replaced by X. State S2, like S4 and S6, returns the tape head to the leftmost character. This is done by scanning left until a blank is found and then moving right as control is passed on to the next state. State S3 searches for b, halting with N if none is found. The first b .encountered is otherwise replaced by X, and the Turing machine enters S4, which then passes control on to s, after returning to the leftmost symbol. State s, operates much like S3, searching for c this time, and S6 returns the tape head to the extreme left if the previous a and b have been matched with c. The process then repeats from so. Figure 11.4 The Thring machine M discussed in Example 11.2 370 Turing Machines , . , Chap. 11 To see exactly how the machine operates, it is useful to step through the computation for an input string such as babcca. To do this, conventions to designate the status of the device are quite helpful. Like the stack in a PDA, the tape contents may change as transitions occur, and the notation for the configuration of a Turing machine must reflect those changes. Steps in a computation will be represented according to the following conventions. V Definition 11.2. Let M = <~, I', S, sO, B> be a Turing machine that is operating on a tape containing ... ###ab~###... , currently in state t with the tape head scanning the b, where a, ~ E (~ U I')", a contains no leading blanks and ~ has no trailing blanks. This configuration will be represented by atb~. 'Y 1-1\1 will be taken to mean that the configuration denoted by 1\1 is reached in one transition from 'Y. The symbol ~ will denote the reflexive and transitive closure of!--. Ll That is, the symbol representing the state will be embedded within the string, just to the left of the symbol being scanned. If B(t, b) = (s, R), then atb~ I-abs~. The new placement of the state label within the string indicates that the tape head has indeed moved right one symbol. The condition S n (~ U f) = 0ensures that there is no confusion as to which symbol in the configuration representation denotes the state. As with PDAs, 'Y ~ 1\1 means that 'Y produces 1\1 in zero or more transitions. Note that the leading and trailing blanks are not represented, but a and ~ may contain blanks. Indeed, b may be a blank. The representation ac###t# indicates that the tape head has moved past the word ac and is scanning the fourth blank to the right of the word (a = ac###, b = #, ~ = A). At the other extreme, t##ac shows the tape head two cells to the left of the word (a = A, b = #, ~ = #ac). A totally blank tape is represented by t#. V Definition 11.3. For a Turing machine M = <~, J', S, sO, B>, the language accepted by M, denoted by L(M), is L(M) = {x E ~* Isox ~ xhY}. A language accepted by a Turing machine is called a Turing-acceptable language. Ll It is generally convenient to assume that the special symbol Y is not part of the input alphabet. Note that words can be rejected if the machine does not print a Y or if the machine never halts. Several reasonable definitions of acceptance can be applied to Turing machines. One of the most common specifies that the language accepted by M is the set of all words for which M simply halts, irrespective of what the final tape contents are. It might be expected that this more robust definition of acceptance might lead to more (or at least different) languages being recognized. However, this definition turns out to yield a device with the same cognitive power as specified by Definition 11.3, as indicated below. More precisely, let us define Sec. 11.1 Definitions and Examples 371 LI(A) = {x E I* j3a, [3 E (I U I')" ~ sox ~ ah[3} LI(A) is thus the set of all words that cause A to halt. Let L be a language for which L = L I (8) for some Turing machine 8. It can be shown that there exists another Turing machine C that accepts L according to Definition 11.3; that is, LI(8) = L(C) for some C. The converse is also true: any language of the form L(M) is LI(A) for some Turing machine A. Other possible definitions of acceptance include Lz(M) =: {x E I* 13a, [3 E (I U I')" ~ sox ~ ahY[3} and L3(M) =: {x E I* 13a E (I u r)* ~ sox ~ ahY} These distinguish all words that halt with Y somewhere on the tape and all words that halt with Y at the end of the tape, respectively. It should be clear that a Turing machine A, accepting L =LI(AI) has an equivalent Turing machine Azfor which L = Lz(Az). Az can be obtained from Al by simply adding a new state and changing the transitions to the halt state so that they now all go to the new state. The new state prints Y wherever the tape head is and then, upon scanning that Y, halts. Similarly, a Turing machine A3can be obtained from A, by instead requiring the new state to scan right until it finds a blank. It would then print Y and halt, and Lz(Az) =: L3(A3) . The technique for modifying such an A3 to obtain A4 for which L3(A3) =: L (A4) is discussed in the next section and illustrated in Example 11.11. EXAMPLE 11.3 Consider again the machine M in Example 11.2 and the input string babcca. By the strict definition of acceptance given in Definition 11.3, L(M) = {"'}, since x is the only word that does not get destroyed by M. Using the looser criteria for acceptance yields a more interesting language. The following steps show that sobabcca ~ XXXXXXhY. ssbabcca fbs-abcca fbszXbcca fszbXbcca fsz#bXbcca fs3bXbcca fs4XXbcca fs4#XXbcca fssXXbcca fXs.Xbcca fXXssbcca fXXbsscca fXXbs6Xca fXXs6bXca fXs6XbXca fs6XXbXca fs6#XXbXca fsoXXbXca fXsoXbXca fXXsobXca fXXbslXca ~ sz#XXbXcX ~ XXs3bXcX ~ S4#XXXXCX ~ XXXXsscX ~ S6#XXXXXX ~ XXXXXXso fXXXXXXhY 372 Turing Machines Chap. 11 The string babcca is therefore accepted. ac is rejected since s-ac ~ XchN. Further analysis shows thatL3(M) is exactly {x E{a, b,c}* Ilxls = Ixlb = Ixlc} . Since the only place Y is printed is at the end of the word on the tape, L3(M) =L2(M). Every word eventually causes M to halt with either Y or N on the tape, and so L 1(M) =I*. EXAMPLE 11.4 The composite Turing machine shown in Figure 11.5 employs several submachines and is based on the parenthesis checker included as a sample in the Turing's World software. The machine will search for correctly matched parentheses, restoring the original string and printing Y if the string is syntactically correct, and leaving a $ to mark the offending position if the string has mismatched parentheses. Asterisks are recorded to the left of the string as left parentheses are found, and these are erased as they are matched with right parentheses. Figure 11.5a shows the main architecture of the Turing machine. The square nodes represent the submachines illustrated in Figures 11.5b and 11.5c. When So encounters a left parenthesis, it marks the occurrence with $, and transfers control to the submachine Sl' S, moves the read head to the left end of the string, and deposits one * there. The cells to the left of the original string serve as a scratch area; the asterisks record the number of unmatched left parentheses encountered thus far. Submachine S, then scans right until the $ is found; it then restores the original left parenthesis. At this point, no further internalmoves can be made in S1, and the arrow leaving S12 indicates that control should be returned to the parent automaton. The transition leaving the square S, node in Figure 11.5a now applies, and the tape head moves to the right of the left parenthesis that was just processed by S1, and control is returned to so. So continues to move right past the symbols a and b, uses S, to process subsequent left parenthesis, and transfers control to the submachine S2 whenever a right parenthesis is encountered. Submachine S2 attempts to match a right parenthesis with a previous left parenthesis. As control was passed to S2' the right parenthesis was replaced by $ so that this spot on the tape can be identified later. The transitions in state S20 move the tape head left until a blank cell is scanned. If the cell to the right of this blank does not contain an asterisk, S21 has no moves and control is passed back to the parent Turing machine, which will enter S4 and move right past all the symbols in the word, printing N as it halts. The absence of the asterisk implies that no previous matching left parenthesis had been found, so halting with N is the appropriate action. Ifan asterisk had been found, S21 would have replaced it with a blank, and then would have no further moves, and the return arrow would be followed. The blank that is now under the tape head will cause the parent automaton to pass control to S3, which will move right to $, and the $ is then restored to ). Control returns to So as the tape head moves past this parenthesis. The start state continues checking the remainder of the word in this fashion. When the end of the word is reached, S6 is used to examine the left end of the string; (a) (b) Sec. 11.1 Definitions and Examples aIR b/R </R )/R $/R 373 Figure 11.5 (a) The Turing machine discussed in Example 11.4 (b) Submachine 51(c) Submachine 52 374 Turing Machines , ' , Chap. 11 remaining asterisks indicate unmatched left parentheses, and will yield N as the machine halts from S8. If S6 does not encounter *, the Turing machine halts with Y and accepts the string from S7. As more complex examples are considered, one may begin to suspect that any programming assignment could be carried out on a Turing machine. While it would be truly unwise to try to make a living selling computers with this architecture, these devices are generally regarded to be as powerful as any general-purpose computer. That is, if an algorithm for solving a class of problems can be carried out on a computer, then there should be a Turing machine that can produce identical output for each instance of a problem in that class. The language {x E {a, b, c}"I Ix I. = Ix Ib = Ix Ie} is not context free, so it cannot be recognized by a PDA. Turing machines can therefore accept some languages that PDAs cannot, and we will see that they can recognize every context-free language. We began with DFAs, which were then extended to the more powerful PDAs, which have now been eclipsed by the Turing machine construct. Each of these classes of automata has been substantially more general than the previous class. If this text were longer, one might wonder when the next class of superior machines would be introduced. Barring the application of magic or divine intuition, there does not seem to be a "next class." That is, any machine that is constrained to operate algorithmically by a well-defined set of rules appears to have no more computing power than do Turing machines. This constraint, "to behave in an algorithmic fashion," is an intuitive notion without an obvious exact formal expression. Indeed, "behaving like a Turing machine" is generally regarded as the best way to express this notion! A discussion of how Turing machines came to be viewed in this manner is perhaps in order. An excellent in-depth treatment of their history can be found in [BARW]. At the beginning of the twentieth century, mathematicians were searching for a universal algorithm that could be applied to mechanically prove any well-stated mathematical formula. This naturally focused attention on the manipulation of symbols. In 1931, Godel showed that algorithms of this sort cannot exist. Since this implied that there were classes of problems that could not have an algorithmic solution, this then led to attempts to characterize those problems that could be effectively "computed." In 1936, Turing introduced his formal device for symbol manipulation and suggested that the definition of an algorithm be based on the Turing machine. He also outlined the halting problem (discussed later), which demonstrated a problem to which no Turing machine could possibly provide the correct answer in all instances. The search for abetter, perhaps more powerful characterization of what constitutes an algorithm continued. While it cannot be proved that it is impossible to find a better formalization that is truly more powerful, on the basis of the .accumulating evidence, no one believes that a better formulation exists. For one thing, other attempts at formalization, including grammars, A-calculus, u-recursive functions, and Post systems, have all turned out to yield exactly the same computing power as Turing machines. Second, all attempts at "improving" the capabilities of Turing machines have not Sec. 11.1 Definitions and Examples 375 expanded the class of languages that can be recognized. Some of these possible improvements will be examined in the next section. We close this section by formalizing what Example 11.1 probably made clear: every DFA can be simulated by a Turing machine. V Theorem 11.1. Every FAD language is Turing acceptable. Proof. We show that given any DFA A= <"2,S,so,5,F>, there is a Turing machine MA that is equivalent to A. Define MA = <"2,{#, Y, N}, S, so, 5A>,where 5A is defined by (Vs E S)(Va E "2)(5A(s, a) = (5(s, a), R» (Vs E F)(5A(s, #) = (h, Y» (Vs E S F)(5A(s, #) = (h, N» A simple inductive argument on Ixi shows that (Vx E "2*)(Vu,~ E ("2 U r)*)(utx~ ~ uxq~ iff 8A(t ,x ) = q) From this it follows that (Vx E "2*)(sox ~ xq# iff 8A(so, x) = q) Therefore, (Vx E"2*)(Sox ~xhY iff 8A(so, x) EF) which means that L(MA) =L(A). Ll This result actually follows trivially from the much stronger results presented later. Not only is every type 3 language Turing acceptable, but every type 0 language is Turing acceptable (as will be shown by Theorem 11.2). The above proof presents the far more straightforward conversion available to type 3 languages and illustrates the flavor of the inductive arguments needed in other proofs concerning Turing machines. By using this conversion, the Turing's World software can be employed to interactively build and test deterministic finite automata on a Macintosh. EXAMPLE 11.5 Consider the DFA T shown in Figure 11.6, which recognizes all words of even length over {a, b}. The corresponding Turing machine is illustrated in Example 11.1 (see Figure 11.2). , . ' Figure 11.6 The DFA T discussed in Example 11.5 376 11.2 VARIANTS OF TURING MACHINES Turing Machines Chap. 11 There are several ways in which the basic definition of the Turing machine can be modified. For example, Definition 11.1 disallows the tape head from both moving and printing during a single transition. It should be clear that if such an effect were desired at some point it could be effectively accomplished under the more restrictive Definition 11.1 by adding a state to the finite-state control. The desired symbol could be printed as control is transferred to the new state. The transition out of the new state would then move the tape head in the appropriate fashion, thus accomplishing in two steps what a "fancier" automaton might do in one step. While this modification might be convenient, the ability of Definition 11.1 style machines to simulate this behavior makes it clear that such modified automata are no more powerful than those given by Definition 11.1. That is, every such modified automaton has an equivalent Turing machine. It is also possible to examine machines that are more restrictive than Definition 11.1. If the machine were constrained to write on only a fixed, finite amount of the tape, this would seriously limit the types of languages that could be recognized. In fact, only the type 3 languages can be accepted by such machines. Linear bounded automata, which are Turing machines constrained to write only on the portion of the tape containing the original input word, are also less powerful than unrestricted Turing machines and are discussed in a later section. Having an unbounded area in which to write is therefore an important factor in the cognitive power of Turing machines, but it can be shown that the tape need not be unbounded in both directions. That is, Turing machines that cannot move left of the cell the tape head originally scanned can perform any calculation that can be carried out by the less-restrictive machines given by Definition 11.1 (see the exercises). In deciding whether a Turing machine can simulate the modified machines suggested below, it is important to remember that the auxiliary alphabet r can be expanded as necessary, as long as it remains finite. In particular, it is possible to expand the information content of each cell by adding a second "track" to the tape. For example, we may wish to add check marks to certain designated cells, as shown in Figure 11.7. The lower track would contain the original symbols, and the upper track mayor may not have a check mark. This can be accomplished by doubling the I I I I I I I I b l a a I I I I I + Finite State Control Figure 11.7 A Thring machine with a two-track tape Sec. 11.2 Variants of Turing Machines 377 .., combined size of the alphabets S and r to include all symbols without check marks and the same symbols with check marks. The new symbols can be thought of as ordered pairs, and erasing a check mark then amounts to rewriting a pair such as (a, j) with (a, #). A scheme such as this could be used to modify the automaton in Example 11.2. Rather than replacing designated symbols with X, a check could instead be placed over the original symbol. Just prior to acceptance, each check mark could be erased, leaving the original string to the left of the Y (see Example 11.11). The foregoing discussion justifies that a Turing machine with a tape head capable of reading two tracks can be simulated by a Definition 11.1 style Turing machine; indeed, it is a Turing machine with a slightly more complex alphabet. When convenient, then, we may assume that we have a Turing machine with two tracks. A similar argument shows that, for any finite number k, a k-track machine has an equivalent one-track Turing machine with an expanded alphabet. The symbols on the other tracks can be more varied than just j and #; any finite number of symbols may appear on any of the tracks. Indeed, a Turing machine may initially make a copy of the input string on another track to use in a later calculation and/or to restore the tape to its original form. The ability to preserve the input word in this manner illustrates why each language L =L3(A) for some Turing machine A must be Turing acceptable; that is, L =L3(A) implies that there is a multitrack Turing machine M for which L =L(M). EXAMPLE 11.6 Conceptualizing the tape as being divided into tracks simplifies many of the arguments concerning modification of the basic Turing machine design. For example, a modified Turing machine might have two heads that move independently up and down a single tape, both scanning symbols to determine what transition should be made and both capable of moving in either direction (or remaining stationary and overwriting the current cell) as each transition is carried out. Such machines would be handy for recognizing certain languages. The set {anbnln ;::::: 1}can be easily recognized by such a machine. If both heads started at the left of the word, one head might first scan right to the first b encountered. The two heads could then begin moving in unison to the right, comparing symbols as they progressed, until the leading head encounters a blank and/or the trailing head scans its first b. If these two events occurred on the same move, the word would be accepted. A single head Turing machine would have to travel back and forth across the word several times to ascertain if it contained the same number of as as bs. The ease with which the two-headed mutation accomplished the same task might make one wonder whether such a modified machine can recognize any languages which the standard Turing machine cannot. To justify that a two-headed Turing machine is no more powerful than the type described by Definition 11.1, we must show that any two-headed machine can be 378 Turing Machines Chap. 11 I J I I I I I I I I I I I Iblalb I c I c I a I I I I I f Finite State Control Figure 11.8 Emulating a two-headed Turing machine with a three-track tape simulated by a corresponding standard Turing machine. As suggested by Figure 11.8, a three-track Turing machine will suffice. The original information would remain on the first track, and check marks will be placed on tracks 2 and 3 to signify the simulated locations of the two heads. Several moves of the single head will be necessary to simulate just one move of the two-headed variant, and the finite-state control must be replicated and augmented to keep track of the stages of the computation. Each simulated move will begin with the single tape head positioned over the leftmost check mark. The tape contents are scanned, and the symbol found is remembered by the finite state control. The tape head then moves right until the second check mark is found. At this point, the device will have available the input symbols that would have been scanned by both heads in the two-headed variant, and hence it can determine what action each of the heads would have taken. The rightmost checkmark would then be moved left or right or the current symbol on track 1 overwritten, whichever is appropriate. The single tape head would then scan left until the other check mark is found, which would then be similarly updated. This would complete the simulation of one move, and the process would then repeat. Various special cases must be dealt with carefully, such as when both heads would be scanning the same symbol and when the heads "cross" to leave a different head as the leftmost. These cases are tedious but straightforward to sort out, and thus any language that can be recognized by a two-headed machine can be recognized by a standard Turing machine. Similarly, a k-headed Turing machine can be simulated by a machine conforming to Definition 11.1. The number of tracks required would then bek + 1, and the set of states must expand so that the device can count the number of check marks scanned on the left and right sweeps of the tape. Multihead Turing machines are therefore fundamentally no more powerful than the single-head variety. This means that whenever we need to justify that some task can be accomplished by a Turing machine we may employ a variant with several heads whenever this is convenient. We have seen that this variant simplified the justification that {a'b" In ~ I} was Turing acceptable. It can also be useful in showing that other variants are no more powerful than the type of machines given by Definition 11.1, as illustrated in the next example. Sec. 11.2 Variants of Turing Machines 379 EXAMPLE 11.7 Consider now a device employing several independent tapes with one head for each tape, as depicted in Figure 11.9. Ifwe think ofthe tapes as stationary and the heads mobile, it is easy to see that we could simply glue the tapes together into one thick tape with several tracks, as indicated in Figure 11.10. The multiple heads would now scan an entire column of cells, but a head would ignore the information on all but the track for which it was responsible. In this fashion, a multitape Turing machine can be simulated by a multihead Turing machine, which can in turn be simulated by a standard Turing machine. Thus, multitape machines are no more powerful than the machines considered earlier. Finite State Control Figure 11.9 A three-tape Turing machine Finite State Control Figure 11.10 Emulating a three-tape Turing machine with a single three-track tape I • One of the wilder enhancements involves the use of a two-dimensional tape, which would actually be a surface on which the tape head can move not only left and right, but also up and down to adjacent squares. With some frantic movement of the tape head on a one-dimensional tape, two-dimensional Turing machines can be successfully simulated. Indeed, k-dimensional machines (for finite k} are no more powerful than a standard Turing machine. The interested reader is referred to [HOPe]. 380 EXAMPLE 11.8 Turing Machines Chap. 11 A potentially more interesting question involves the effects that nondeterminism might have on the computational power of a Turing machine. With finite automata, it was seen that NDFAs recognized exactly the same class of languages as DFAs. However, deterministic pushdown automata accepted a distinctly smaller class of languages than their nondeterministic cousins. It is consequently hard to develop even an intuition for what "should" happen when nondeterminism is introduced to the Turing machine construct. Before we can address this question, we must first define what we mean by a nondeterministic Turing machine. As with finite automata and pushdown automata, we may wish to allow a choice of moves from a given configuration, leading to several disparate sequences of moves for a given input string. Like NDFAs and NPDAs, we will consider a word accepted if there is at least one sequence of moves that would have resulted in a Y being printed. Simulating such machines with deterministic Turing machines is more involved than it may at first seem. If each possible computation was guaranteed to halt, it would be reasonable to try each sequence of moves, one after the other, halting only when a Y was found. If one sequence led to an N being printed, we would then move on to the next candidate. Since there may be a countable number of sequences to try, this process may never end. This is not really a problem, since if a sequence resulting in a Y exists, it will eventually be found and tried, and the machine will halt and accept the word. If no such sequence resulting in a Y exists, and there are an infinite number of negative attempts to be checked, the machine will never halt. By our original definition of acceptance, this will result in the word being rejected, which is the desired result. The trouble arises in trying to simulate machines that are not guaranteed to halt under all possible circumstances. This is not an inconsequential concern; in Chapter 12, we will identify some languages that are so complex that their corresponding Turing machines cannot halt for all input strings. A problem then arises in trying to switch from one sequence to the next. If, say, the first sequence we tried did not halt and instead simply continued operation without ever producing Y or N, we would never get the chance to try other possible move sequences. Since the machine will not halt, the word will therefore be rejected, even if some later sequence would have produced Y. Simulating the nondeterministic machine in this manner will not be guaranteed to recognize the same language, and an alternative method must be used. This problem is avoided by simulating the various computations in the following (very inefficient) manner. We begin by simulating the first move of the first sequence. We then start over with the first move of the second sequence, and then begin again and simulate two moves in the first sequence. On the next pass, we simulate the first move of the third sequence, then two moves of the second sequence, and then three moves of the first sequence. On each pass, we start computing a new sequence and move a little further along on the sequences that have already been started. If any of these sequences results in Y, we will eventually Sec. 11.2 Variants of Turing Machines 381 simulate enough of that sequence to discover that fact and accept the word. In this way, we avoid getting trapped in a dead end with no opportunity to pursue the alternatives. Implementing the above scheme will produce a deterministic Turing machine that is equivalent to the original nondeterministic machine. It remains to be shown that the Turing machine can indeed start over as necessary, and that the possible move sequences can be enumerated in a reasonable fashion so that they can be pursued according to the pattern outlined above. A three-tape (deterministic) Turing machine will suffice. The first tape will keep an inviolate copy of the input string, which will be copied onto the second tape each time a computation begins anew. A specific sequence of steps will be carried out on this second scratch tape, after which the presence of Y will be determined. The third tape is responsible for keeping track of the iterations and generating the appropriate sequences to be employed. Enumerating the sequences is much like the problem of generating words over some alphabet in lexicographic order (see the exercises). Methods for generating the "directing sequences" can be found in both [LEWI] and [HOPe]. These references also propose a more efficient approach to the whole simulation, which is based on keeping track of the sets of possible configurations, much as was done in Theorem 4.5 for nondeterministic finite automata. Thus, neither nondeterminism nor any of the enhancements considered above improved the computational power of these devices. As mentioned previously, no one has yet been able to find any mechanical enhancement that does yield a device that can recognize a langu~e that is not Turing acceptable. Attempts at producing completely different formal vstems have fared no better, and there is little cause to believe that such systems exist. We now turn to characterizing what appears to be the largest class of algorithmically definable languages. In the next section, we will see that the Thring-acceptable languages are exactly the type 0 languages introduced in Chapter 8. V Definition 11.4. For a given alphabet I, let ~I, be the collection of all Turing-acceptable languages, and let 2lI, be the collection of all type 0 languages. Ll The freedom to use several tapes and nondeterminism makes it easier to explore the capabilities of Turing machines and relate ~I, to the previous classes of languages encountered. It is now trivial to justify that every PDA can be simulated by a nondeterministic Turing machine with two tapes. The first tape will hold the input, which will be scanned by the first tape head, which will only have to move right or, at worst, remain stationary and reprint the same character it was scanning. The second tape will function as the stack, with strings pushed or symbols popped in correspondence with what takes place in the PDA. Since a Turing machine can only print one symbol at a time, some new states may be needed in the finite-state control to simulate pushing an entire string, but the translation process is quite direct. 382 Turing Machines , . , Chap. 11 V Lemma 11.1. Let s be an alphabet. Then (jp'i, C Iff'i,. That is, every contextfree language is Turing acceptable, and the containment is proper. Proof. Containment follows from the formalization of the above discussion (see the exercises). Example 11.3 presented a language over {a, b, c}that is Turing acceptable but not context free. While the distinction between ~'i, and (jp'i, disappeared for singleton alphabets, proper containment remains between (jp{a} and Iff{a}> as shown by languages such as {a"]» is a perfect square}. a In the next section, an even stronger result is discussed, which shows that the class of Turing-acceptable languages includes much more than just the context-free languages. Lemma 11.1 is actually an immediate corollary of Theorem 11.2. The next section also explores the formal relationship between Turing machines and context-sensitive languages. 11.3 TURING MACHINES, LBAs, AND GRAMMARS The previous sections have shown that the class of Turing-acceptable languages properly contains the type 2 languages. We now explore how the type 0 and type 1 languages relate to Turing machines. Since the preceding discussions mentioned that no formal systems have been found that surpass Turing machines, one would expect that every language generated by a grammar can be recognized by a Turing machine. This is indeed the case, as indicated by the following theorem. V Theorem 11.2. Let s be an alphabet. Then ~'i, c e; That is, every type 0 language is Turing acceptable. Proof. We justify that, given any type 0 grammar G = <~,r,s,p>, there must be a Turing machine TG that is equivalent to G. As with the suggested conversion of a PDA to a Turing machine, TG will employ two tapes and nondeterminism. The first tape again holds the input, which will be compared to the sentential form generated on the second tape. The second tape begins with only the start symbol on an otherwise blank tape. The finite-state control is responsible for nondeterministically guessing the proper sequence of productions to apply, and with each guess, the second tape is modified to reflect the new sentential form. If at some point the sentential form agrees with the contents of the first tape, the machine prints Y and halts. A guess will consist of choosing both an arbitrary position within the current sentential form and a particular production to attempt to substitute for the substring beginning at that position. Only words that can be generated by the grammar will have a sequence of moves that produces Y, and no word that cannot be generated will be accepted. Thus, the new Turing machine is equivalent to G. a Sec. 11.3 Turing Machines, LBAs, and Grammars 383 EXAMPLE 11.9 Consider the context-sensitive grammar G = <{a, b, c},{S,A, B, C},S, P>, where P contains the productions 1. Z~A 2. Z~S 3. S~SABC 4. S~ABC 5. AB~BA 6. BÃAB 7. CB~BC 8. BC~CB 9. CÃAC 10. AC~CA 11. Ãa 12. B~b 13. c-s e It is quite easy to show that L (G) = {x E {a, b, c}* I Ix I. = Ix Ib = Ix Ie} by observing that no production changes the relative numbers of (lowercase and capital) As, Bs, and Cs, and the six context-sensitive rules allow them to be arbitrarily reordered. One of the attempted "guesses" made by the Turing machine TG concerning how the productions might be applied is: Use (2) beginning at position 1. Use (4) beginning at position 1. Use (6) beginning at position 2 .... This would lead to a failed attempt, since it corresponds to Z~ S~ ABC, and the substring BC beginning at position 2 does not match BA, the left side of rule 6. On the other hand, there is a pattern of guesses that would cause the following sequence of symbols to appear on the second tape: Z~ S~ ABC~ BAC~ BCÃ BcA ~ Bcã bca This would lead to a favorable comparison if bca was the word on the input tape. Note that the Turing machine may have to handle shifting over existing symbols on the scratch tape to accommodate increases in the size of the sentential form. Since type 0 grammars allow length-reducing productions, the machine may also be required to shrink the sentential form when a string of symbols is replaced by a smaller string. 384 Turing Machines Chap. 11 A rather nice feature of type 1 languages is that the length of the sentential form could never decrease (except perhaps for the application of the initial production Z~ X), and hence sentential forms that become longer than the desired word are known to be hopeless. All context-sensitive (that is, type 1) languages can therefore be recognized by a Thring machine that use an amount of tape proportional to the length of the input string, as outlined below. V Definition 11.5. A linear bounded automaton (LBA) is a nondeterministic Turing machine that recognizes words over an alphabet I given by the quintuple M = <I,r,S,so,8>, where I is the input alphabet I' is the auxiliary alphabet containing the special markers < and> and I, I', and {L,R}are pairwise disjoint sets (and thus <, > f/:. I). S is a finite nonempty set of states (and S n (I U I') = 0). So is the start state (so E S). 8 is the state transition function 8: S x C~: Ur)~ (S U{h}) x C~: U r U {L,R}), where ('tisE S)(8(s, <) = (q, R) for some q E S U{h}), and ('tisE S)(8(s, » = (q, L) for some q E S U{h}, or 8(s, » (h, Y) or 8(s, » (h, N») That is, the automaton cannot move left of the symbol < nor overwrite it. the LBA likewise cannot move right of the symbol>, and it can only overwrite it with Y or N just prior to halting. The symbols #, L, R, Y, and N retain their former meaning, although # can be dropped from I' since it will never be scanned. As implied by the following definition, the special markers < and> are intended to delimit the input string, and Definition 11.5 ensures that the automaton cannot move past these limits. As has been seen, the use of several tracks can easily multiply the amount of information that can be stored in a fixed amount of space, and thus the restriction is essentially that the amount of available tape is a linear function of the length of the input string. In practice, any Turing machine variant for which each tape head is constrained to operate within an area that is a multiple of the length of the input string is called a linear bounded automaton. V Definition 11.6. For a linear bounded automaton M = <I, I', S, so, 8>, the language accepted by M, denoted by L(M), is L(M) = {x E I* I<sox> ~ <xhY}. A language accepted by a linear bounded automaton is called a linear bounded language (LBL). a Note that while the endmarkers must enclose the string x, it is the word x (rather than <x» that is considered to belong to L (M). As before, other criteria Sec. 11.3 Turing Machines, LBAs, and Grammars 385 for acceptance are equivalent to Definition 11.6. The set of all words for which a LBA merely halts can be shown to be a LBL according to the above definition. The following example illustrates a linear bounded automaton that is intended to recognize all words that cause the machine to print Y at the end of the (obliterated) word. Example 11.13 illustrates a general technique for restoring the input word, producing an LBA that accepts according to Definition 11.6. EXAMPLE 11.10 Consider the machine L shown in Figure 11.11 and the input string babeea. The following steps show that <ssbabcca> ~ <XXXXXXhY. <s.babcca> I-<bs.abcca> I-<bs-Xbcca> I-<s2bXbeea> Is2<bXbeea> I-<s3bXbeea> I-<s4XXbeea> I-s4<XXbeea> I- <ssXXbeea> I-<Xs.Xbcca> I-<XXssbeea> I-<Xxbs-cca> I- <Xxbs.Xca> I-<XXs6bXea> I-<Xs6XbXea> I-<s6XXbXea> Is6<XXbXea> I-<soXXbXea> I-<XsoXbXea> I-<XXsobXea> I- <XXbstXea> ~ s2<XXbXeX> ~ <XXs3bXeX> ~ S4<XXXXeX> ~ <XXXXsseX> ~ S6<XXXXXX> ~ <XXXXXXso> I-<XXXXXXhY Figure 11.11 The Turing machine discussed in Example 11.10 386 Turing Machines Chap. 11 V Definition 11.7. For a given alphabet !', let:£I. be the collection of all linear bounded languages, and let VI. be the collection of all context-sensitive (type 1) languages. Ll The proof Of Theorem 11.2 can be modified to show that all context-sensitive languages can be recognized by linear bounded automata. Since context-sensitive languages do not contain contracting productions, no sentential forms that are longer than the desired word need be considered. Consequently, the two-tape Turing machine in Theorem 11.2 can operate as a linear bounded automaton. The first tape with the input word never changes and thus satisfies the boundary restriction, while the finite-state control can simply abort any computation on the second tape that violates the length restriction. Just as Theorem 11.2 showed that 2lI. ~ '!II., we now have a relationship between another pair of cognitive and generative classes. V Theorem 11.3. Let!' be an alphabet. Then VI. c :£I.. That is, every type 1 language is a LBL. Proof. The proof follows from the formalization of the above discussion (see the exercises). Ll We have argued that every type a grammar must have an equivalent Turing machine, and it can conversely be shown that every Turing-acceptable language can be generated by a type agrammar. To do this, it is most convenient to use the very restrictive criteria for a Turing-acceptable language given in Definition 11.3, in which the original input string is not destroyed. For Turing machines which behave in this fashion, the descriptions of the device configurations bear a remarkable resemblance to the derivations in a grammar. EXAMPLE 11.11 Consider again the language {x E {a, b, c}* I Ixl. = Ixlb = Ix Ie}. As discussed in Example 11.3, the Turing machine in Figure 11.4 destroys the word originally on the input tape. Figure 11.12 depicts a slightly more complex Turing machine that restores the original word justprior to acceptance. It will (fortunately) not generally be necessary for our purposes to restore rejected words, since there are intricate languages for which this is not always possible. The modified quintuple is T = <{a, b, c},{#, A,B, C, Y, N}, {so, S1,S2, S3, S4, S5,S6, S7, S8}, so, 8>, where 8 is as indicated in the diagram in Figure 11.13. "Saving" the original input string is accomplished by replacing occurrences of the different letters by distinct symbols and restoring them later. The implementation reflects one of the first uses suggested for multiple-track machines: using the second track to check off input symbols. For legibility, an a with a check mark above it is denoted by A, while an a with no check Sec. 11.3 Turing Machines, LBAs, and Grammars I I I 387 I I I l b l a l b l c l c l a l I I I I I• Finite State Control Figure 11.12 The Turing machine discussed in Example 11.11 Figure 11.13 The state transition diagram discussed in Example 11.11 mark remains an a. Similarly, checked bs are represented by B and checked cs by C. Thus, if the string BAbCca were on a two-track tape employing check marks, it would look like jj j babcca 388 Turing Machines Chap. 11 The additional states S7 and Ss essentially erase the check marks just before halting by replacing A with a, B with b, and C with c. Consider again the input string babcca processed by the Turing machine in Example 11.3. It is also accepted by this Turing machine because the following steps show that sobabcca ~ babccahY. Note how closely the steps correspond with those in Example 11.3. The sequence below also illustrates how S7 converts the string back to lowercase, after which Ss returns the tape head to the right for acceptance. sobabcca Ibs.abcca Ibs-Abcca IszbAbcca Isz#bAbcca Is3bAbcca I-s4BAbcca I-s4#BAbcca IssBAbcca IBs.Abcca IBAssbcca IBAbsscca IBAbs6Cca IBAs6bCca IBs6AbCca Is6BAbCca Is6#BAbCca IsoBAbCca IBsoAbCca IBAsobCca IBAbslCca ~sz#BAbCcA ~ BAs3bCca ~s4#BABCcA ~ BABCsscA ~s6#BABCCA ~ BABCCAsoIBABCCs7AIBABCCs7a IBABCs7Ca IBABCs7ca ~ s-babcca I-s7#babcca Issbabcca Ibssabcca ~babccass IbabccahY If occurrences of the machine transition symbol Iare replaced by the derivation symbol ~, the above sequence would look remarkably like a derivation in a type 0 grammar. Indeed, we would like to construct a grammar in which sentential forms like bs.abcca could be derived from sobabcca in one step. Since the machine changed configurations because of the transition rule &(so, b) = (s., R), this transition should have a corresponding production of the form sob~ bs.. Each transition in the Turing machine will be responsible for similar productions. Unfortunately, the correspondence between transition rules and productions is complicated by the fact that the tape head may occasionally scan blank cells, which must then be added to the sentential form. The special characters [ and] will bracket the sentential form throughout this stage of the derivation and will indicate the current left and right limits of the tape head travel, respectively. Attempting to move left past the conceptual position of [ (or right past the position of]) will result in the addition of a blank symbol to the sentential form. To generate the words accepted by a Turing machine, our grammar will randomly generate a word over I, delimit it by brackets, and insert the symbol for the start state at the left edge. The rules derived from the transitions should then be Sec. 11.3 Turing Machines, LBAs, and Grammars 389 able to transform a string such as [sobabcca#] into [#babccahY#]. Since only the letters in I will be considered terminal symbols, the symbols [, ], #, and Yare nonterminals, and the derivation will not yet be complete. To derive terminal strings for just the accepted words, the presence of Y will allow further productions to delete the remaining nonterminals. V Definition 11.8. Given a Turing machine M = <I, I', S, sO, B>, the grammar corresponding to M, GM, is given by GM= <I, I' u S u {Z, W, U, v, [,]}, Z, PM>, where PM contains the following classes of productions: 1. Z~ [W#] E PM (Va E 1)([W~ [Wa E PM) W~SOEPM 2. Each printing transition gives rise to a production rule as follows (Vs E S)(Vt E S U {h})(Va,bE I U r)(if B(s,a) = (t, b), then sã tb E PM) Each move right gives rise to a production rule as follows (Vs, t E S)(Va E I U r)(if B(s,a) = (t, R>, then sã at E PM) If a = #, an additional production is needed: (Vs, t E S)(if B(s,#) = (t, R>, then s]"""" #t] E PM) Each move left gives rise to a production rule as follows (Vs, t E S)(VaE I U I') (if B(s,a) = (t, L>, then [sã [t#a E PM 1\ (Vd E I U r)(dsã tda E PM)) 3. hY~UEPM U#~UEPM U]~VEPM (Va E 1)(aV~ Va E PM) #V~VEPM [V~A.EPM The rules in class 1 are intended to generate all words of the form [soX#], where x is an arbitrary member of 1*. The remaining rules are defined in such a way that only those strings x that are recognized by M can successfully produce a terminal string. Note that once W is replaced by So neither Z nor W can appear in a later sentential form. After So is generated, the rules in class 2 may apply. It can be inductively argued that the derivations arising from the application of these rules directly reflect the changes in the configuration of the Turing machine (see Theorem 11.4). None of the class 3 productions can be used until the point at which the halt 390 Turing Machines *. Chap. 11 state would be reached in the corresponding computation. Since h $.S, none of the class 2 productions can then be used. Only if Y was written to tape as the Turing machine halted will the production hY~ U be applicable. U will then delete the trailing blanks and] from the sentential form, and then V will percolate to the left, removing the leading blanks and the final nonterminal [, leaving only the terminal string x in the (completed) sentential form. The following example illustrates a derivation stemming from a typical Turing machine. EXAMPLE 11.12 Consider the Turing machine T in Figure 11.13 and the corresponding grammar GT• Among the many possible derivations involving the class 1 productions is Z~ [W#] ~ [Wa#] ~ [Wca#] ~ [Wcca#]~ [Wbcca#]~ [Wabcca#] ~ [Wbabcca#]~ [ssbabcca-s] Only class 2 productions apply at this point, and there is exactly one derivation applicable at each step in the following sequence. [ssbabccas'] ~ [bs.abccaw] ~ [bs2Abcca#] ~ [s2bAbcca#] ~ [s2#bAbcca#]~ [#s3bAbcca#] ~ [#s4BAbcca#] ~ [s4#BAbcca#] ~ [#ssBAbcca#]~ [#BssAbcca#] ~ [#BAssbcca#] ~ [#BAbsscca#] ~ [#BAbs6Cca#] ~ [#BAs6bCca#] ~ [#Bs6AbCca#] ~ [#s6BAbCca#] ~ [s6#BAbCca#]~ [#soBAbCca#] ~ [#BsoAbCca#] ~ [#BAsobCca#] ~ [#BAbsjCca#] ~ [s2#BAbCcA#] ~ [#BAs3bCcA#] ~ [s4#BABCcA#] ~ [#BABCsscA#] ~ [s6#BABCCA#] ~ [#BABCCAso#]~ [#BABCCs7A#] ~ [#BABCCs7a#] ~ [#BABCs7Ca#] ~ [#BABCs7ca#] ~ [#s7babcca#] ~ [s7#babcca#] ~ [#sgbabcca#]~ [#bsgabcca#] ~ [#babccasg#] ~ [#babccahY] In Turing machines where the tape head travels further afield, there may be many more blanks enclosed within the brackets. At this point, the class 3 productions take over to tidy up the string: [#babccahY]~ [#babccaU]~ [#babccaV~ [#babccVa ~ [#babcVcã [#babVccã [#baVbccã [#bVabcca ~ [#Vbabccã [Vbabcca ~ babcca As expected, babcca E L (GT) . Sec. 11.3 Turing Machines, LBAs, and Grammars 391 It is interesting to observe that only stage in which a choice of productions is available is during the replacement of the nonterminal W. Once a candidate string is so chosen, the determinism of the Turing machine forces the remainder of the derivation to be unique. This is true even for strings that were not accepted by the Turing machine: if class 2 productions are applied to [sobaa#], there is exactly one derivation sequence for this sequential form, and it leads to [BAass#] and then [BAahN]. No productions apply to this sentential form, and thus no terminal string will be generated. The relationship between strings accepted by the Turing machine and the strings generated by the corresponding grammar is at the heart of the following theorem. V Theorem 11.4. Let I be an alphabet. Then ?J~ C ~~. That is, every Turingacceptable languge can be generated by a type 0 grammar. Proof. Let M be a Turing machine M = <I, I', S, sO, 3>, and let L(M) ={x EI*lsoX ~xhY}, as specified in the most restrictive sense of a Turing-acceptable language (Definition 11.3). Consider the grammar GM corresponding to M, as given in Definition 11.8. The previous discussion of GM provided a general sense of the way in which the productions could be used and justified that they could not be combined in unexpected ways. A rigorous proof requires an explicit formal statement of the general properties that have been discussed. A trivial induction on the length of x shows that by using just the productions in class 1 (Vx E I*)(Z~ [sox#]) Another induction argument establishes the correspondence between sequences of applications of the class 2 productions and sequences of moves in the Turing machine. Specifically, by inducting on the number of transitions, it can be shown that (Vs, t E S U{h})(Va, ~,'Y, co E (I U I')") (as~ ~'Ytw iff (3i,j, m, n E N)([#ias~#j] ~ [#m'Ytw#n])) The actual number of padded blanks is related to the extent of the tape head movement, but this is not important for our purposes. The essential observation is that a move sequence in M is related to a derivation sequence in GM , with perhaps some change in the number of blanks at either end. The above statement was stated in full generality to facilitate the induction proof (see the exercises). We need apply it in a very limited sense, as stated below. Observe that the productions in class 3 cannot be used unless hY appears on the tape after a finite number of steps. As discussed earlier, the presence of hY triggers the class 3 productions, which remove all the remaining nonterminals. Thus, 392 Turing Machines Chap. 11 ('fix EI*)(soX ~xhY iff Z? [soX#]? [#mxhy#n]?x) which implies that L (M) =L (GM) . Ll Since every Turing machine has an equivalent type 0 grammar and every type 0 grammar generates a Turing-acceptable language, we have two ways of representing the same class of languages. V Corollary 11.1. The class of languages generated by type 0 grammars is exactly the Turing-acceptable languages. That is, ~I = 5"I' Proof. The proof follows immediately from Theorems 11.2 and 11.4. As will be seen in Chapter 12, the linear bounded languages are a distinctly smaller class than the Turing-acceptable languages. Theorem 11.3 showed that VI C ~I, and a technique similar to that used in Theorem 11.4 will show that ~I C VI. That is, we can show that every linear bounded automaton has an equivalent context-sensitive gramr .ar. Note that the class 1 and 2 productions in Definition 11.8 contained no contracting productions; it was only when the class 3 productions were applied that the sentential form might shrink. When dealing with linear bounded automata, the tape head is restricted to the portion of the tape containing the input string, so there will be no extraneous blanks to delete. The input word on the tape of a linear bounded automaton is bracketed by distinct symbols < and>, which might be used in the corresponding grammar in a fashion similar to [ and]. These would be immovable in the sense that no new blanks would be inserted between them and the rest of the bracketed word. Unfortunately, in Definition 11.8 the delimiters [ and] must eventually disappear, shortening the sentential form. No such shrinking can occur if we hope to produce a contextsensitive grammar. To overcome this difficulty, it is useful to imagine a three-track tape with the input word on the middle track and the delimiter on the upper track of the tape above the first symbol of the word. Another will occur on the lower track below the last character of the input string. These markers will serve as guides to prevent the tape head from moving past the limits of the input word. For example, if the linear bounded automaton contained the word <babcca> on its input tape, the tape for the corresponding three-track automaton would be as pictured in Figure 11.14a. If the word were accepted, the tape would eventually reach the configuration shown in Figure 11.14b as it halted, printing Y on the lower track. It is a relatively simple task to convert a linear bounded automaton into a three-track automaton, where the tape head never moves left of the tape cell with the in the upper track, and never moves right of the cell with the in the lower track (see the exercises). We will refer to such an automaton as a strict linear bounded automaton. The definitions Sec. 11.3 Turing Machines, LBAs, and Grammars 393 (a) (b) Finite State Control Finite State Control Figure 11.14 (a) A three-track Turing machine employing delimiters (b) An accepting configuration used will depend on the upper and lower track markers occurring in different cells, which makes the representation of words of length less than two awkward. Since this construct is motivated by a need to find a context-sensitive grammar, we will simply modify the resulting grammar to explicitly generate any such short words and not rely on the above formalism. EXAMPLE 11.13 Consider the linear-bounded automaton discussed in Example 11.10, which accepted {x E {a, b, c}* I Ix I. = Ix Ib = Ix Ie}. As suggested by the exercises, this can be modified to form the three-track strict linear bounded automaton shown in Figure 11.15, which accepts {x E{a, b,c}*llxl ~2 1\ [x], = Ixlb = Ixle}. To avoid explicitly mentioning the three tracks, a cell containing b on the middle track and on the upper track is denoted by the single symbol b, a cell containing A on the middle track and on the lower track is shown as A, and so on. Thus, the six original symbols in {a, b, c, A, B, C} give rise to six other symbols employing the overbar -, six more using the underscore _, and some symbols indicating acceptance (or possibly rejection), such as ay (or CN)' For clarity, only those combinations that can actually occur in a transition sequence are shown in Figure 11.15. The sequence of moves that would transform the tape from the configuration shown in Figure 11.14a to that of Figure 11.14b is shown below. 394 Turing Machines sobabcc!! Ibs.abccg Ibs2Abcc!! Is2bAbcc!! Is3bAbcc!! I-s4BAbcc!! IssBAbcc!! IBssAbcc!! IBAssbcc!! IBAbsscc!! IBAbs6Cc!! IBAs6bCc!! IBs6AbCc!! I-s6BAbCc!! IsoBAbCc!! IBsoAbCc!! IBAsobCc!! IBAbs1Cc!! ~s2BAbCcA ~ BAs3bCcA ~s4BABCcA ~ BABCsscA ~s6BABCCA ~ Chap. 11 BABCs7c!! ~ s7Babcc!! Ssbabccg Ibssabccg ~ babccss!! lIbabcchay Consider implementing a grammar similar to that given in Definition 11.8, but applied to a strict linear bounded automaton incorporating the two delimiting markers on separate tracks. The new symbols will eliminate the need for [ and] and avoid the contracting productions that were required to delete [ and ] from the sentential form. The class 3 productions would simply replace a symbol such as ay with a and b with b. Unfortunately, it will not be possible to explicitly use distinct symbols to keep track of the state and the placement of the state head, as was done with so, s., ... .s, and h in the previous production sets. This extraneous symbol will also have to disappear to form a terminal string, and this must be done in a way that does not use contracting productions. As with the underscore and overbar, the state name will be encoded as a subscript attached to one symbol in the sentential form. Thus, each original symbol d,:which has already given rise to additional non terminals d and d, will also require nonterminals such as do,db' .. .d, to be added to r. The inclusion of d, within a sentential form will reflect that the tape head is currently scanning this d while the finite-state control is in state S;. Further symbols will also be needed; di indicates that the tape head is scanning the leftmost symbol, which happens to be d, while the finite-state control is in state Si, and d, indicates a similar situation involving the rightmost symbol. This plethora of nonterminals can be used to define a context-sensitive grammar that generates the language recognized by a strict linear bounded automaton. For the automaton given in Example 11.13, generating the terminal string babcca will begin with the random generations of the six-symbol sentential form boabcc!!with the class 1 productions, which will be transformed into babccay by the class 2 productions, and finally into babcca via the class 3 productions. In the Sec. 11.3 Turing Machines, LBAs, and Grammars 395 blR cIR lilR CIR Alii BIIi 'Cre aiL AlL blL BIL elL elL l!1L Figure 11.15 The Thring machine discussed in Example 11.13 following definition, note that by the conditions placed on a strict linear bounded automaton I' already contains symbols of the form A and A, and hence so will I'B. For simplicity, the state set is required to be of the form {so, S1> ••• ,sn}, but clearly the state names of any automaton could be renumbered sequentially to fit the given definition. V Definition 11.9. Given a strict linear bounded automaton B= <I, I', {so, Sl> ••• .s.}, So, 8>, the context-sensitive grammar corresponding to B, GB, is given by GB = <I,fB,Z,PB>, where I'B is given by fB=fU{dildEIuf,i 1,2, ... ,n,ori Y}U{Z,S,W} 396 Pa contains the following classes of productions: Turing Machines Chap. 11 HAbCrc!! HAb3CcA BABCcsA 1. If AEL(B), then Z~AE Pa Z~SEPa (Vd E 2)(if d E L(B), then S~ d EPa) (Vd E 2)(S~W!!EPa) (\fdE2)(W~WdEPa) (Vd E 2)(W~ do EPa) 2. Each printing transition gives rise to a production rule as follows: (Vsi, Sj E S)(Va, b E 2 U f)(if 8(s;, a) = (sj, b), then ai~ bj EPa) Each move right gives rise to a production rule as follows: (Vs;,sjE S)(Va E 2 U f)(if 8(s;, a):= (sbR), then (Vd E 2 U f)(aid~adjE Pa) Each move left gives rise to a production rule as follows: (Vs;,Sj E S)(Va E 2 U f) (if 8(s;, a) := (Sj, L), then (Vd E 2 U f)(dai~ dja EPa) Each halt with acceptance gives rise to a production rule as follows: (Vs;E S)(Vb E 2 U f)(Va E 2)(if 8(s;, b) (h, ay), then b;~ ay EPa) 3. (Va, b E 2)(bay~ bya EPa) (Va, b E 2)(bay~ ba EPa) A EXAMPLE 11.14 Consider again the strict linear bounded automaton B given in Figure 11.15 and the corresponding context-sensitive grammar Ga. The following derivation sequences show that babeea EL(Ga): Z:? S W!!:? Wc!!:? WCC!!:? Wbecg :? Wabeca :?boabcc!! At this point, only the class 2 productions can be employed, yielding: boabcc!! balbcc~:?bA2bcc~ :?b2Abcc~ :? b3AbcC!!:?B4AbcC!! :? BsAbcc!! :?BAsbcc!! :? BAbscC!! :?BAbcsc!! BAbC6C~:? BAb6Cc!! :?BA6bCc~ :?B6AbCc~ :? BoAbCc!!:? BAobCc~ :?HAboCc~ :? ~B2AbCcA ~H4ABCcA ~H6ABCCA Sec. 11.3 Turing Machines, LBAs, and Grammars 391 , " BABCCAo =? BABCC.!!7 =? BABCC7.!! =? BABCC7.!! ~B7abcca =? bsabcc.!! =? basbcc.!! ~ babcc.!!s Finally, since B(ss,.!!) = (h.a-), the class 3 productions now apply: babcc.!!s =? babccay =? babccya =? babcyca =? babycca =? baybcca =? babcca Once again, the grammars springing from Definition 11.9 can generate sentential forms corresponding to any string in I *, as long as the length of the string is at least two. As with the grammars arising from Definition 11.8, only strings that would have been accepted by the original machine will lead to a terminal string. If the productions of this example were applied to the sentential form boa.!!, at each step there will be exactly one choice of applicable production, until eventually the form BA,!!s is obtained. At this step, no production will apply, and therefore a terminal string cannot be generated from boa.!!. This correspondence between words accepted by the machine B and words generated by the context-sensitive grammar Ge given in Definition 11.9 is the foundation of the following theorem. V Theorem 11.5. Let I be an alphabet. Then ;£I. ~Ok' That is, every linear bounded language can be generated by a type 1 grammar. Proof. Any linear bounded language can be recognized by a strict linear bounded automaton (see the exercises). Hence, if L is a linear bounded language, there exists a strict linear bounded automaton B = <I,r,{SO,Sb'" ,sn},so,B> which accepts exactly the words in L by printing Y on the lowest of the three tracks after restoring the original word to the middle track. We will employ the grammar Ge corresponding to B, as given in Definition 11.9. Example 11.14 illustrated that these productions can be used in a manner similar to those of Definition 11.8, and it is easy to justify that they cannot be combined in unexpected ways. Induction on the length of x will show that by using just the productions in class 1, (Vx E I *)(Va, bE I)(Z~ aoX..!!) The correspondence between sequence's of applications of the class 2 productions and sequences of moves in B follows as in Theorem 11.4. Due to the myriad positions that the integer subscript can occupy, and the special cases caused by the presence of the overbars and underscores, the general induction statement is quite tedious to state and is left as an exercise. The statement will again be applied to the special case in which we are interested, as stated below. (Vx E I*)(Va, bE I)(soaxb ~ arhbv iff aoxE.~axby) A final induction argument will show that axb y~ axb. Thus, (Vx EI*)(Va,bEI)(soaxb~axhby iff Z~aoX..!!~axby~axb) 398 Turing Machines Chap. 11 , ,. This establishes the correspondence between words of length at least two accepted by 8 and those generated by Gs. Definition 11.9 included specific productions of the form Z~ 'A. and S~ d to ensure that words of length 0 and 1 also corresponded. This implies that L (8) = L (Gs), as was to be shown. A The proof of Theorem 11.5 argues that there exists a context-sensitive grammar Gs for each strict linear bounded automaton 8, and it certainly appears that given an automaton 8 we can immediately write down all the productions in Ps, as specified by Definition 11.9. However, some of the class 1 productions may cause some trouble. For example, determining whether the production Zs--» 'A. is included in Ps depends on whether the automaton halts with Y when presented with a blank tape. In the next chapter, we will see that even this simple question cannot be effectively answered for arbitrary Turing machines! That is, it is impossible to find an algorithm that, when presented with the state diagram of a Turing machine, can reliably determine whether or not the machine accepts the empty string. It will be shown that any such proposed algorithm is guaranteed to give the wrong answer for some Turing machines. Similarly, it now seems that there might be some uncertainty about which members of I give rise to productions of the form S~ d. The productions specified by Definition 11.9 were otherwise quite explicit; only the productions relating to the immediate generation of a single character or the empty string were in any way questionable. There are only II I+ 1 such productions, and some combination of them has to be the correct set of productions to include in Ps. Thus, as stated in the theorem, we are assured that a context-sensitive grammar does exist, even if we are unclear as to exactly what productions it should contain. As will be seen in Chapter 12, it is possible to determine which words are accepted (and which are rejected) by linear bounded automata. Unlike unrestricted Turing machines, there is only a finite span of tape upon which symbols can be placed. Furthermore, there are only a finite number of characters that can appear in those cells, a finite number of positions the tape head can be in, and a finite number of states to consider. The limited number of configurations makes it possible to determine exactly which words of a given size are recognized by the LBA. We have seen that every linear bounded automaton is equivalent to a strict linear bounded automaton, and these have equivalent type 1 grammars. Conversely, every type 1 grammar generates a linear bounded language, which implies there is another correspondence between a generative construct and a cognitive construct. V' Corollary 11.2. The class of languages generated by context-sensitive grammars is exactly the linear bounded languages. That is, :£I = (JI. Proof. The proof follows immediately from Theorems 11.3 and 11.5. A Sec. 11.4 Closure Properties and the Hierarchy Theorem 399 11.4 CLOSURE PROPERTIES AND THE HIERARCHY THEOREM Finally, we consider some of the closure properties of the classes of languages explored in this chapter. Since 5"I = 2lI , we may use either cognitive or generative constructs for this class, whichever is most convenient. The fact that ;£I = OI will allow the same choice for the type 1 languages. The next theorem illustrates a case in which the grammatical construct is the easier to use. V Theorem 11.6. Let l be an alphabet. Then 5"Iis closed under union. Proof. If L1 and L, are two Turing-acceptable languages, then by Theorem 11.4 there are type 0 grammars G1 = <01, l, Sb P1>and Gz = <Oz, l, Sz, Pz> that recognize L1 and Lz. Without loss of generality, assume that 0 1 n Oz = 0. Choose a new nonterminal Z such that Z f/=. 0 1 U 0z, and consider the new type 0 grammar GU defined by GU = <01 U n,U{Z},s, Z, P1 U Pz U{Z~ Sb Z~ s,». Clearly, L(G U) =L(G1) U L(Gz). By Theorem 11.2, there is a Turing machine equivalent to GU, and hence L1 U L, is Turing acceptable. ~ Theorem 11.6 could be proved directly by constructing a new Turing machine from Turing machines T1 and Tz accepting L1 and Lz. It is a bit harder to give a concrete proof and care must be taken to avoid inappropriate constructions. For example, it would be incorrect to build the new machine in such a way that it first simulates Tb halting with Y if T1 does, and then simulating Tz if T1 would have halted with N. It must be remembered that there is no guarantee that a Turing machine will ever halt for a given word. The above construction will incorrectly reject words that could be recognized by Tz but which were rejected by T1 because T1 never halted; the new machine would never get a chance to simulate Tz. One valid construction involves a two-tape Turing machine, which immediately copies the input word onto the second tape. By using a cross product of the states of the T1 and Tz and appropriate transitions, the action of both machines could be simultaneously simulated, and the new machine would accept as soon as either simulation indicated that the word should be accepted. A slight modification of this construct would show that 5"I is also closed under intersection, but the next theorem outlines a superior method. V Theorem 11.7. Let l be an alphabet. Then 5"Iis closed under intersection. Proof. L1 and L, are two Turing-acceptable languages recognized by the Turing machines T1 and Tz, respectively. We build a new Turing machine T" with T1 and Tz as submachines. T" transfers control to the submachine T1. If T1 never halts, the input will be rejected, which is the desired result. If T1 halts, r erases the Y and moves the tape head back to the leftmost character and transfers control to the submachine Tz. r will halt if Tz does, and if Tz also accepts, Y will be left in the , .. 400 Turing Machines Chap. 11 proper place on the tape. T" therefore accepts if and only if both T1 and T2 accept, and hence L1 n L, is Turing acceptable. a Note that it was important that, except for the presence of Y after the input word, Tj left the tape in the same condition it found it, with the input string intact for Tz• As with type 3 and type 2 grammars, there is no pleasant way to combine type 0 grammars to produce a grammar that generates the intersection of type 0 languages, although Theorem 11.7 guarantees that such a grammar must surely exist. V Theorem 11.8. Let l be an alphabet. Then 5''J, is closed under reversal, homomorphism, inverse homomorphism, substitution, concatenation, and Kleene closure. Proof. The proof for reversal is almost trivial; it is almost as simple as replacing every transition that moves the tape head to the right with a transition to the left, and likewise making left moves into right moves. This will yield a mirror image machine, which when started at the rightmost character will print Y just past the leftmost character. We therefore have to modify this machine by adding a preliminary states that will move the tape head from its traditional leftmost starting position to the opposite end of the word. Similarly, just before the Y would be printed, we must again move the tape head to the right. The description of the modifications necessary to convert a type 0 grammar into one that generates the reverse of the original is even more succinct: Each rule in the original grammar is modified by writing the characters to the left of the production symbol ~ backward, and similarly reversing the string on the right of ~. That is, a production like Dc~ ABe would become cD~ eBA. A relatively trivial induction on the number of steps in a derivation proves that the new grammar accepts the reverse of the original language. The proofs of closure under the remaining operators are left for the exercises. a As shown in Chapter 12, there are some operators under which 5''J, is not closed. Complementation is perhaps the most glaring exception. The closure properties of :£'J, are very similar to those of 5''J,. In most cases, slight modifications of the above proofs carryover to the type 1 languages. V Theorem 11.9. Let l be an alphabet. Then (J'J, is closed under reversal, homomorphism, inverse homomorphism, substitution, concatenation, union, and intersection. Proof. Both proofs given for reversal carry over without modification. In the cognitive approach, the states added to the mirror image Turing machine keep the tape head within the confines of the input word, and hence if the original machine Chap. 11 Exercises 401 was a LBA, the new version will also be a LBA. In the generative approach, reversing the characters in type 1 productions still results in a type 1 grammar. That is, if the original grammar had no contracting productions, neither will the new grammar. Proving that the union of two type 1 languages is type 1 is similar to the proof given in Theorem 11.6, although care must be taken to avoid extraneous productions of the form Z,~ A. Building an intersection machine from two linear bounded automata can be done exactly as described in Theorem 11.7. The remaining closure properties are left for the exercises. a It is clear from our definitions that VI ~ ~I, but we have yet to prove that VI c ~I' That the inclusion is proper and ~I is truly a larger class than VI will be shown to be a consequence of the material considered in Chapter 12. Apart from this one missing piece, we have over the course of several chapters encountered the major components of the following hierarchy theorem. V Theorem 11.10. Let ~ be an alphabet for which II~II;::: 2. Then ~=~=~=~C~C~=~C~=~C~=~ Proof. The cognitive power of deterministic and nondeterministic finite automata was shown to be equivalent in Chapter 4, and their relation to regular expressions was investigated in Chapter 6. These were all shown to describe the type 3 languages in Chapter 8. In Chapter 9, Theorem 9.1 and Corollary 9.1 showed that the context-free languages (over alphabets with at least two symbols) properly contained the unambiguous context-free languages, which in turn properly contained the regular languages. In Chapter 10, the (nondeterministic) pushdown automata were shown to recognize exactly the type 2 languages. The contextsensitive language {x E{a, b,c}*//xl. = [x], = Ixlc}is not context free, so the type 1 languages properly contain the type 2 languages. In this chapter, the linear bounded automata were shown to be recognize exactly the type 1 languages and Turing machines were shown to accept the type 0 languages. Corollary 12.4 will show that the type 1 languages are properly included in the type 0 languages. a EXERCISES 11.1. By making the appropriate analogies for states and input, answer the musical question "How is a Turing machine like an elevator?" What essential (missing) component prevents an elevator from modeling a general computing device? 11.2. Let I = {a, b, c}and let L = {w Iw = wr } . (a) Explicitly define a deterministic, one-tape, one-head Turing machine that will recognize L. 402 Turing Machines Chap. 11 (b) Justify that there exists a linear bounded automaton that accepts L. (c) Describe how nondeterminism or additional tapes and heads might be employed to recognize L. 11.3. Let I = {a}. Explicitly define a deterministic, one-tape, one-head Turing machine that will recognize {anln is a perfect square}. 11.4. Let I = {a, b, c}. (a) Explicitly define a deterministic, one-tape, one-head Turing machine that will recognize {akbncml(k =Fn)l\(n =Fm)}. (b) Explicitly define a deterministic, one-tape, one-head Turing machine that will recognize {x E {a, b, c}* Ilxla =F Ix Ib 1\ Ix Ib =F IxIc}* 11.5. (a) Recall that there are several common definitions of acceptance that can be applied to Turing machines. Design a machine M for which L(M) =L1(M) = Lz(M) =L3(M) = {x E {a, b, c]" Ilxla= Ixlb = Ixle } . (b) For any Turing-acceptable language L, is it always possible to find a corresponding machine for which L (M) =L1(M) = Lz(M) = L3(M) = L? Justify your answer. 11.6. Let L = {ww IwE {a, b, c}*}. (a) Explicitly define a deterministic, one-tape, one-head Turing machine that will recognize L. (b) Justify that there exists a linear bounded automaton that accepts L. (c) Describe how nondeterminism or additional tapes and heads might be employed to recognize L. 11.7. Given an alphabet I = {a., az, a3, ... , an},associate each word with the base n number derived from the subscripts. Thus, a3aZa4 is associated with 324, a, with 1, and Awith O. These associated numbers then imply a lexicographic ordering of I *, with (a) Given an alphabet I, build a Turing machine that, given an input word x, will replace that word with the string that follows x in lexicographic order. (b) Using the machine in part (a) as a submachine, build a Turing machine that will start with a blank tape and sequentially generate the words in l* in lexicographic order, erasing the previous word as the following word is generated. (c) Using the machine in part (a) as a submachine, build a Turing machine that will start with a blank tape and sequentially enumerate the words in I * in lexicographic order, placing each successive word to the right of the previous word on the tape, separated by a blank. (d) Explain how these techniques can be used in building a deterministic version of a nondeterministic Turing machine. 11.8. Define a semi-infinite tape as one that has a distinct left boundary but extends indefinitely to the right, such as those employed by DFAs. (a) Given a Turing machine satisfying Definition 11.1, define an equivalent two-track Turing machine with a semi-infinite tape. (b) Prove that your construction is equivalent to the original. 11.9. Let I = {a}. Explicitly define a deterministic, one-tape, one-head Turing machine that will recognize {an In is a power of 2}= {a, aa, aaaa, ... }. 11.10. Define a three-head Turing machine that accepts {x E {a, b, c]" I Ix la = Ix Ib = Ix Ie}. Chap. 11 Exercises 403 Assume that all three heads start on the leftmost character. Is there any need for any of the heads to ever move left? 11.11. Let I be an alphabet. Prove that every context-free language is Turing-acceptable by providing the details for the construction discussed in Lemma 11.1. 11.12. Let I be an alphabet. Prove that every type 1 language is a LBL by providing the details for the construction discussed in Theorem 11.3. 11.13. Let M= <I, r, S, sO, B> be a linear bounded automaton. Show how to convert M into a three-track automaton that never scans any cells but those containing the original word by: (a) Explicitly defining the new alphabets. (b) Explicitly defining the new transitions from the old. (Hint: From any state, an old transition "leaving" the word to scan one of the delimiters must return to the word in a unique manner.) (c) Prove that for words of length at least 2 your new strict linear bounded automaton accepts exactly when M does. 11.14. By adding appropriate new symbols (of the form ~ and suitable transitions: (a) Modify the strict linear bounded automaton defined in Exercise 11.13 so that it correctly handles strings of length 1. (b) Assume that a strict LBA that initially scans a blank is actually scanning an empty tape. If we expect to handle the empty string, we cannot insist that a strict linear bounded automaton never scan a cell that is not part of the input string, since the tape head must initially look at something. If we instead require that the tape head of a strict LBA may never actively move to a cell that is not part of the input string, then the dilemma is solved. Show that such a strict LBA can be found for any type 1 language. 11.15. Refer to Theorem 11.4 and show, by inducting on the number of transitions, that (\:Is,t E S U {h})(\:Ia, 13,"1, w E (I U r)*) (ctsf3 htw iff (3i,j, m, n E N)(Wctsf3#J]~ [#m"Ytw#"])) 11.16. State and prove the general induction statement needed to rigorously prove Theorem 11.5. 11.17. If G= <I, r, Z, P> is a grammar for a type 0 language: (a) Explain why the following construction may not accept L (G) *: Choose a new start symbol W, and form G. = <I,ru{W}, W,P U{W~A,W~WW, W~Z}>. (b) Give an example of a grammar that illustrates this flaw. (c) Given a type 0 grammar G = <I,r,Z,p>, define an appropriate grammar G. that should accept the Kleene closure of L(G). (d) Prove that the construction defined in part (c) has the property that L(G.) =L(G)*. 11.18. Let I be an alphabet. Prove that '!J"£ is closed under: (a) Homomorphism (b) Inverse homomorphism (c) Concatenation (d) Substitution 11.19. (a) Show that any Turing machine Al accepting L = LI(AI) has an equivalent Turing 404 Decidability Chap. 12 machine A2 for which L = L2(A2 ) by explicitly modifying the quintuple for Al and proving that your construction behaves as desired. (b) Show that any Turing machine A2 accepting L = L2(A2) has an equivalent Turing machine A3 for which L = L3(A3) by explicitly modifying the quintuple for A2 and proving that your construction behaves as desired. 11.20. Let ~ be an alphabet. Prove that 01: is closed under: (a) Homomorphism (b) Inverse homomorphism (c) Concatenation (d) Substitution c H A p T E R DECIDABILITY In this chapter, the nature and limitations of algorithms are explored. We will first look at the general properties that can be ascertained about finite automata and FAD languages. For example, we might like to be able to enter the state transition table of a DFA into a suitably sized array and then run a program that determines whether the DFA was connected. An algorithm for checking this property was outlined in Chapter 3. Similarly, we have seen that it is possible to write a program to check whether an arbitrary DFA is minimal. We know this property can be reliably checked because we proved that the algorithms in Chapter 3 could be applied to ascertain the correct answer for virtually every conceivable DFA. There are an infinite number of DFAs about which the question can be posed, and yet our algorithm decides the question correctly in all cases. In the following section we consider questions that can be asked about more complex languages and machines. In the latter part of this chapter, we will see that unlike the questions in Sections 12.1 and 12.2, there are some questions that are in a fundamental sense unanswerable in the general case. That is, there cannot exist an algorithm that correctly answers such a question in all cases. These questions will be called undecidable. An undecidable question about Pascal programs is considered in detail in Section 12.3 and is independent of advanced machine theory. The concept of undecidability is addressed formally in Section 12.4, and other undecidable problems are also presented. 12.1 DECIDABLE QUESTIONS ABOUTREGULAR LANGUAGES Recall that a procedure is a finite set of instructions that unambiguously specifies deterministic, discrete steps for performing some task. In this chapter, the task will generally involve providing the correct answer to some yes-no question. Most 405 406 Decidability Chap. 12 questions that involve a numerical answer can be rephrased as a yes-no question of similar complexity. For example, the question "What is the minimum number of states necessary for a DFA to accept the language represented by the regular expression R?" has the yes-no analog "Does there exist a DFA with fewer than, say, five states that accepts the language represented by the regular expression R?" Clearly, if we can answer the first question, the second question is easy to answer. Conversely, if questions like the second one can be answered for any number we wish (rather than just five), then the answer to the first question can be deduced. Recall also that an algorithm is a procedure that is guaranteed to halt in all instances. Note that "guaranteed to halt" does not mean that there is a fixed time limit on how long it may take to finish the procedure for all inputs; some instances may take far longer than others. For example, the question "Does there exist a DFA with fewer than ten states that accepts the language represented by ab(b U c)"?" will probably take less time to answer than "Does there exist a DFA with fewer than ten states that accepts the language represented by a*b«b*d U c*b)d U e)"?" It is important to keep in mind that algorithms are intended to provide a general solution to a vast array of similar problems and are (usually) not limited to a single specific instance. As an example, consider the task of sorting a file containing the three names: Williams Jones Smith A variety of sorting algorithms, when applied to this file, will produce the correct output. It is also possible to write a program that ignores its input and always prints the lines Jones Smith Williams This program does yield the correct answer for the particular problem we wished to solve, and indeed it solves the sorting problem for all files that contain exactly these three particular names in some arbitrary order (there are six such files). Thus, this trivial program is an algorithm that solves the sorting problem for these six specific instances. A slightly more complex program might be capable of printing two or three distinct answers, depending on the input, and thus solve the sorting problem for an even larger (but still finite) class of instances. It should be clear that producing an algorithm that solves a finite set of instances is no great' accomplishment, since these algorithms are guaranteed to exist. Such an algorithm could be programmed as one big case statement, which Sec. 12.1 Decidable Questions About Regular Languages 407 identifies the particular input instance and produces the corresponding output for that instance. Algorithms that apply to an infinite set of instances are of much more theoretical and practical interest. V Definition 12.1. Given a set of instances and a yes-no question that can be applied to those instances, we will say that the question is decidable if there is an algorithm for determining in each instance the (correct) answer to the question. Ll A more precise definition of decidability is presented in Section 12.4, based on the perceived relationship between Turing machines and algorithms. As mentioned earlier, if the set of instances is finite, an algorithm is guaranteed to exist, no matter how complex the question appears to be. EXAMPLE 12.1 A typical set of instances might be the set of all deterministic finite automata over a given alphabet!'; a typical question might be whether a given automaton accepts at least one string in !,*. It is possible to devise an algorithm to correctly answer the question posed in Example 12.1 for every finite automaton A = <!', S, so, 5, F>. The first idea that might come to mind is to simply look at strings from!' * in an orderly manner and use "5 to determine whether that string is accepted by A; if we find a string that does reach a final state, it is clear that the answer to the question should be "YES-L (A) 10," while if we never find a string that is accepted, the answer should be "NO-L(A) = 0." This procedure is guaranteed to halt and give the correct answer if the language isindeed nonempty. However, the procedure will never halt and answer NO (in a finite amount of time) because there are an infinite number of strings in I* that must be checked. A modification of this basic idea is necessary to produce a procedure that will halt under all circumstances (that is, to produce an algorithm). V Theorem 12.1. Given any alphabet!' and a DFA A = <!', S, so, 5, F>, it is decidable whether L(A) = 0. Proof. Let n = liS II* Since both!' and Sare finite sets, B = {X-} U!, U!,2 u*** u !,n-l is a finite set, and we can examine each string of this set and still have a procedure that halts. There is clearly an algorithm for determining the set C of all states that are reached by these few strings. Specifically, C ={5(so, x) Ix E!'* 1\ Ixl < n} ={"5(so, x) Ix E B}. , , 408 Decidability Chap. 12 Note that Theorem 2.7 implies that if a string (of any length) is accepted by A then there is another string of length less than n that is also accepted by A. Consequently, it is sufficient to examine only the "short" strings contained in B rather than examine all of I *. If any ofthe strings in B lead to a final state (that is, if C n F =F O), then the answer to the question is clearly "NO-L(A) is not empty," while if C n F = O, then Theorem 2.7 guarantees that "YES-L (A) is empty" is the correct answer. We have therefore constructed an algorithm (which computes C and then examines C n F, both of which can be done in a finite amount of time) for determining whether the language accepted by a given machine is empty. Ii. The definition of C does not suggest the most efficient algorithm for calculating the set C; better strategies are available. The technique is similar to that employed to find the state equivalence relation EA. C is actually the set of connected states SC, which can be calculated recursively as indicated in Definition 3.10. Note that Theorem 12.1 answers the question posed in Example 12.1. The set of instances to which this question applies can easilybe expanded. It can be shown that it is decidable whether L (A) =°for any NDFA A by first employing Definition 4.5 to find the equivalent DFA Adand then applying the method outlined in Theorem 12.1 to that machine. It is possible to find a much more efficient algorithm for answering this question that does not rely on the conversion to a DFA (see the exercises). Just as the algorithm for converting an NDFA into a DFA allows the emptiness question to be answered for NDFAs, the techniques in Chapter 6 justify that the similar question for regular expressions is decidable. That is, since every regular expression has an equivalent DFA, the question of whether a regular expression describes any strings is clearly decidable. Similar extensions can be applied to most of the results in this section. Just as we can decide whether a DFA A accepts any strings, we can also decide if A accepts an infinity of strings, as shown by Theorem 12.2. This can be proved by a related appeal to Theorem 12.1, but an efficient algorithm for answering this question depends on the following lemma. V Lemma 12.1. Let I be an alphabet, A = <I, S, sO, 8, F> be a finite automaton, n =IISII, and M={xlx EL(A) 1\ Ixl;:=::n}. Then, ifMi0, M must contain a string of minimal length (call it xm), and furthermore Ixml < 2n. Proof. The proof is obtained by repeated application of the pumping lemma with i = 0 (see the exercises and Theorem 2.7). Ii. A question similar to the one posed in Theorem 12.1 is "Does a given DFA accept a finite or an infinite number of strings?" This is also a decidable question, as demonstrated by the following theorem. The proof is based on the observation that a DFA A that accepts no strings of length greater than some fixed constant must by definition recognize a finite set, while the pumping lemma implies that if L (A) Sec. 12.1 Decidable Questions About Regular Languages 409 , . ' contains a sufficiently long string, then L(A) must contain an infinite number of related strings. V Theorem 12.2. Given any alphabet I and a DFA A= <I, S, so, 8, F>, it is decidable whether L(A) is an infinite set. Proof. Let n = liSII* Clearly, if A accepts no strings of length n or greater, then L(A) is finite. From the pumping lemma, we know that if A accepts even one string of length equal to or greater than n, then A must accept an infinite number of strings. We still cannot check all the strings of length greater than n and have a procedure that halts, so Lemma 12.1 will be invoked to argue that if a long string is accepted by A, then a string whose length is in the range n :5 Ix I< 2n must be accepted, and it is therefore sufficient to check the strings in this limited range. Thus, our algorithm will consist of computing the intersection of {'8(so,Y) jy E I* /\ n :5ly1< 2n} and F. L (A) is infinite iff this intersection is nonempty. ~ If we were to write a program that consulted the matrix containing the state transition tablefor A to actually determine {'8(so,Y) Iy E I* /\ n :5ly1< 2n}, it would be very inefficient to implement this computation as implied by the definition. Repeatedly looking up entries in the state transition table to determine '8 for each word in this large class of specified strings would involve an enormous duplication of effort. It is far better to recursively calculate R; = {'8(so, x) Ix E Ii}, which represents the set of all states that can be reached by strings of length exactly i. This can be easily computed by defining Ro= {so} and using the recursive formula Ri+l = {8(s, a) [a E I, s E R;} Successive sets can thereby be calculated from Ro. When R; is reached, it is checked against F, and the algorithm halts and returns Yes if they have a common state. Otherwise, Rn+1 through R2n1 are checked, and No is returned if no final state appears in this group. This method is easily adaptable to nondeterministic finite automata by setting Roto be the set of all start states and adjusting the definition of Ri+l to conform to NDFA notation. The involved arguments presented in Lemma 12.1 and the proof of Theorem 12.2 are necessary to justify that the above efficient recursive algorithm correctly answers the question of whether a finite automaton accepts an infinite number of strings. However, if we were simply interested in justifying that it is decidable whether L (A) is infinite, it would have been much more convenient to simply adapt the result of Theorem 12.1. In particular, we could have easily built a DFA that accepts all strings of length at least n, form the "intersection" machine, and apply Theorem 12.1 to the new machine. Specifically, if A is an n -state deterministic finite automaton, consider the DFA An = <I, fro, r.. rz, ... .r.}, ro,8n , [r.}'> , where 8n is defined by 410 Decidability Chap. 12 (Vi = 0, 1, ... ,n)(Va E I)(8n(r;, a) = rmax{i+1, n}) It is easy to show that L (An) = {x E I * I Ix I ~ n}, and building An as specified in Lemma 5.1 produces a DFA for which Lf/c") = {x EL(A)llxl ~ n}. The question of whether L(A) is infinite now becomes the question of whether L(An) is nonempty, which was shown to be decidable by Theorem 12.1. An indication of the nature of the automaton An is given in Figure 12.1. The above argument provides a much shorter and clearer proof of Theorem 12.2, but it should not be construed to be the basis of an efficient algorithm. Forming the intersection of A and Aninvolves well over n Z states, and thus applying the technique described in Theorem 12.1 to An may involve more than n Z iterations. For our purposes, we will henceforth be content to discover whether various tasks are merely possible and not be concerned with efficiency. Figure 12.1 The automaton An The following theorem answers a major question about DFAs: "Are two given deterministic finite automata equivalent?" At first glance, this appears to be a hard question; an initial strategy might be to check longer and longer strings, and answer "No, they are not equivalent" if a string is found that is accepted by one machine but is not accepted by the other. As in the proof of Theorems 12.1 and 12.2, we would again be faced with the task of determining when we could confidently stop checking strings and answer "Yes, they are equivalent." Such a strategy can be made to work, but an easier method is again available. We are essentially checking whether the start state of the first machine treats strings differently than does the start state of the second machine. This problem was addressed in Chapter 3, and an algorithm that accomplished this sort of checking has already been presented. This observation provides the basis for the proof of the following theorem. V Theorem 12.3. Given any alphabet I and two DFAs A1= <I, 51,SOl' 81, Fl.> and Az= <I, 5z,SOz' 8z,F2>, it is decidable whether L (A1) = L (Az). Proof. Without loss of generality, assume that 51 n 5z= 0, and construct a new DFA defined by A = <I, 51 U 5z,SOl' 8, Fl. U F2>, where (Vs.51U 5z)(Va E I) 8(s, a) = {~:~:: :~: ~: ~t Corollary 3.5 outlines the algorithm for constructing EAfor this machine, and it should be clear from the definition of A that sOIEASOz ~ L (A1) = L (Az). Sec. 12.1 Decidable Questions About Regular Languages 411 EXAMPLE 12.2 Consider the two machines Al and A2 displayed in Figure 12.2. The machine A constructed according to Theorem 12.3 would look like the diagram inside the dotted box shown in Figure 12.3. This new machine is very definitely disconnected, and in this example sal is not related to S02 by EA since these two states treat ab differently (ab is accepted by Al and rejected by A2) . The reader is encouraged to generate another example using two equivalent machines, and verify that the two original start states would indeed be related by EA. r I I I I I ~ b a,b \ I I I I I ~ Figure 12.2 The automata discussed in Example 12.2 Figure 12.3 The composite machine discussed in Example 12.2 The following theorem explores the relationship between the complexity of a given regular expression and the size of the corresponding minimal DFA. V Theorem 12.4. Given any alphabet I and a regular expression R over I, it is decidable whether there exists a DFA with fewer than five final states that accepts the language described by R. Proof. Given R, Lemma 6.2 indicates the algorithm (generated by the constructions presented in Theorems 5.2,5.4, and 5.5) for building some NDFA that accepts the regular set corresponding to R. Definition 4.5 outlines the algorithm for 412 Decidability Chap. 12 converting this NDFA into a DFA. Theorem 3.7 and Corollary 3.5 indicate the algorithms for minimizing this DFA. Counting the number of final states in this minimal machine will allow the question to be answered correctly. Ll The careful reader may have noticed that the minimal machine described in Chapters 2 and 3 was only advertised to have the minimum total number of states and has not yet been guaranteed to have the smallest number oi final states (perhaps there is an equivalent machine with many more nonfinal states but fewer final states). An investigation of the relationship between the final states of the minimal machine and the equivalence classes comprising the right congruence generated by this language will show that no equivalent machine can have fewer final states than the minimal machine has (see the exercises). The proofs of Theorems 12.3 and 12.4 are good examples of using existing algorithms to build new algorithms. This technique should be applied whenever possible in the following exercises. It is certainly useful in resolving the following question about grammars. Given two right linear grammars G1 = <01>I,S1>P1> and G2= <02,I, S2,P2>, it is clearly decidable whether G2 is equivalent to G1. An algorithm can be formed that simply: 1. Uses the construction presented in Lemma 8.2 to find AG1 and AG2• 2. Converts these NDFAs to two DFAs called A1 and A2. 3. Appeals to the algorithm presented in Theorem 12.3 to correctly answer the question. A trivial extension of this idea proves the following theorem. V Theorem 12.5. It is decidable whether two given regular grammars G1 = <01> I, S1> P1> and G2= <02,I, S2,P2> are equivalent. Proof. See the exercises. Most of the decidability questions we have asked about languages recognized by finite automata or described by regular expressions can also be answered for languages generated by grammars through a similar transformation of existing algorithms. Such algorithms are generally not the most efficient ones available, and it can often be instructive to develop a new method from scratch. This is especially true of the following question, which has no analog in the realms of finite automata or regular expressions. V Theorem 12.6. It is decidable whether a given right-linear grammar G = <0, I, S, P> contains any useless nonterminals. Sec. 12.1 Decidable Questions About Regular Languages 413 Proof. Recall that a nonterminal is useless if it can never appear in the derivation of any valid terminal string. Essentially, only two things can prevent a nonterminal X from being effectively used somewhere in a valid derivation: either X can never appear as part of a partial derivation that begins with only the start symbol (no matter how many productions we apply), or, once X is generated, it can never lead to a valid terminal string. Finding the members of il that can be produced from S is a simple recursive procedure: Begin with Zo= {S} and form ZI by adding to Zo all the nonterminals that appear on the right side of productions that are used to replace S. Then form ~ by adding to ZI all the nonterminals that appear on the right side of productions that are used to replace members of Z), and so on. More formally: Zo= {S} and for i ~ 1, Zi+1 = Z, U{YE ill(3x E I*)(3TE Z;) , T-,)xY is a production in P} Clearly, Zõ ZI ~ ... ~ Z; ~ ... ~ il, and as was shown for similar collections of nested entities (such as EoA, EIA, ... in Chapter 3), after a finite number of steps we will reach the point where Zm = Zm+1 and Zm will then represent the set of all nonterminals that can be reached from the start symbol S. In a similar fashion, we can generate another nested sequence of sets Wo, W), ... , where Wi represents the set of all nonterminals that can produce a terminal string in i or fewer steps. We are again guaranteed to reach a point where W n = W n+ ), and W n will indeed be the set of all nonterminals that can ever produce a valid terminal string. Zm n W n is thus the set of all useful members of il, and il - (Z; n W n) is therefore the set of all useless nonterminals. t:. EXAMPLE 12.3 G4 = <{S, A, B, C, V, W, X},{a, b, e},S, {S-,) abAlbbBleeV, A-,) scjex, B-,) ab, C-,) AIeS, V -,) aV IeX, W-,) aa laW, X-,) bV IaaX}> contains three useless. nonterminals, V, W, and X. Recursively calculating the sets described in the above proof yields: z, = {S} ZI={S,A,B,V} ~={S,A,B,V,C,X} ~={S,A,B,V,C,X} WO= { } WI ={C,B, W} Wz={C,B,W,A,S} W3={C,B, W,A,S} Thus W cannot be generated from the start symbol, and V and X cannot produce terminal strings. The useful symbols are ~ n W z = {S,A, B, C}. 414 Decidability Chap. 12 The techniques employed here should look somewhat familiar. They involve iteration methods similar to those developed in Chapter 3. In fact, it is possible to apply the connectivity algorithms for nondeterministic finite automata to this problem by transforming the right-linear grammar G into the NDFA AG, as defined in the proof of Lemma 8.1. The automaton corresponding to the grammar in Example 12.3 is shown in Figure 12.4. Note that the state labeled <W> is inaccessible, which means that it cannot be reached from <S>. This indicates that there is no sequence of productions starting with the start symbol S that will produce a string containing W. Checking whether a nonterminal such as V can produce a terminal string is tantamount to checking whether the language accepted by Ã is nonempty, where Ã is AG with the start state moved to the state labeled <V>. Since both L (Ã) and L(ÃD are empty, V and X are useless. 12.2 OTHER DECIDABLE QUESTIONS It is fairly easy to find succinct algorithms that answer most of the reasonable questions one might ask about representations of regular languages. For each of the more complex classes of languages, there are many reasonable questions that are not decidable. Several of these will be presented in the following sections. In this section, we consider some of the answerable questions that can be asked about the more robust machines and grammars. V Theorem 12.7. Given any context-free grammar G,it is decidable whether L(G) = 0. Proof. In Theorem 9.3, a scheme was presented that specified how to build several automata that would be used to identify the useless nonterminals in G = <I, I', S, P >. Since L (G) =0 iff the start symbol S is useless, there is an algorithm for testing whether L (G) = 0. a It is also possible to tell whether a context-free grammar generates a finite or an infinite number of distinct words. The proof is based on the same principle that was employed in the pumping theorem proof: the presence of long strings implies that some useful nonterminal A must derive a nontrivial sentential form containing A. The start state S must produce a useful sentential form containing A, and A can then be used to generate an infinite series of distinct strings. V Theorem 12.8. Given any context-free grammar G, it is decidable whether L(G) is infinite. Proof. Let G == <I, I', S, P> be a context-free grammar. By Theorem 9.5, there exists a Chomsky normal form grammar G' = <I, I", S, pi> that is equivSec. 12.2 Other Decidable Questions 415 Figure 12.4 The automaton discussed in Example 12.3 416 Decidability Chap. 12 alent to G. Let n = 21f' 1. By Theorem 9.7, any string in L(G) of length n or greater can be pumped and will imply that L(G) is infinite. An argument similar to that of Lemma 12.1 will show that it is sufficient to check strings in the set {y Iy E I * 1\ n ::s Iy 1< 2n} for membership in L (G). There are only a finite number of derivation sequences that can produce words in this range. The algorithm for determining whether L (G) is infinite will check whether {y Iy EL(G) 1\ n::S Iyl <2n} is empty; if so, L (G) is finite, and L (G) is infinite otherwise. a The exercises explore more efficient methods for searching for a string that can be pumped. The intimate correspondence between context-free grammars and pushdown automata guarantees that similar questions about PDAs are decidable. V Corollary 12.1. Given any pushdown automaton P, it is decidable whether: a. L(P) is empty. b. L (P) is finite. c. L(P) is infinite. Proof. By Theorem 10.3, every PDA P has a corresponding context-free grammar Gp• The algorithms described in Theorems 12.7 and 12.8 can be applied to Gp to determine the nature of L(Gp) , and since L(P) =L(G p) , the same questions about P are likewise decidable. a Given a particular word x and a context-free grammar G, it is decidable whether x can be generated by G. In fact, this question can be decided for contextsensitive grammars, too. The proof heavily relies on the fact that no sentential form longer than x can possibly generate x in the absence of contracting productions in the grammar. V Theorem 12.9. Given any context-sensitive grammar G and any word x, it is decidable whether x E L (G). Proof. Let G = <I, r, S, P> be a context-sensitive grammar and let x E I *. It is possible to construct a (finite) graph and apply existing algorithms from graph theory to answer the question of whether G generates x. The nodes of the graph will correspond to the strings from (I UI')" of length n or less. The (directed) edges from a node representing a given sentential form w lead to the strings (of length n or less) that can be generated from w by applying a single production from P. Both the sentential form x and S will appear in this graph, and the question of whether x E L(G) is equivalent to the question of whether there is a path from S to x. There Sec. 12.3 An Undecidable Problem 417 are many standard algorithms for determining paths and components in a graph, and thus the question of whether x E L (G) is decidable. A The generation of all the edges in the graph generally involves more effort than is needed to answer the question. A more efficient method is similar to the recursive calculations used to find the set of connected states in a DFA. Beginning with {S}, the production set P can be consulted to determine the labels of nodes that can be derived from S in one step. These new labels can be added to the set of accessible sentential forms, and the added nodes can be checked until no new labels are found. The set of sentential forms will then consist of {wE(!'ur)*IS~w 1\ Iwl::5n} and contain all words in L(G) of length ::5n. If we are only interested in the specific word x, then the algorithm can return Yes as soon as x appears in the set of accessible sentential forms and would return No if x did not appear by the time the set stopped growing. The above algorithm will suffice for any grammar that does not contain contracting productions, but can clearly give the wrong answers when applied to type 0 grammars. Since the length of sentential forms can both grow and shrink in unrestricted grammars, the word x may actually be generated by a sequence of productions that at some point generates a sentential form longer than x. Such a sequence would not be considered by the method outlined in Theorem 12.9, and the algorithm might answer No when the correct answer is Yes. We could define a procedure that looked at larger and larger graphs (consisting of more and longer sentential forms), which would halt and answer Yes if a derivation sequence for x was discovered. If x actually can be generated by G, this method will eventually uncover the appropriate sequence. We therefore have a procedure that will reliably tell us if a word can be generated by an unrestricted grammar. Unless we include a specification of when to stop and answer No, this procedure is not an algorithm. In later sections, we will see that it is impossible to determine, for an arbitrary type 0 grammar G, ifan arbitrary word x is not generated by G. The question of whether x EL(G) is not decidable for arbitrary grammars. It turns out that there are many reasonable questions such as this one that cannot be determined algorithmically. We begin our overview of undecidable problems with an analysis of a very reasonable question concerning Pascal programs. Subsequent sections consider undecidable questions concerning the grammars and machines covered in this text. 12.3 AN UNDECIDABLE PROBLEM Having now developed a false sense of security about our ability to produce algorithms for determining many properties about machines and languages, we now step back and see whether there is anything we cannot do algorithmically. A simple 418 Decidability Chap. 12 counting argument will show that there are too many things to calculate and not enough algorithms with which to calculate them all. It may be helpful to review the section on cardinality in Chapter 0 and recall that there are different orders of infinity. A diagonalization argument showed that the natural numbers could not be putin one-to-one correspondence with the real numbers; there are simply too many real numbers to allow such a matching to occur. A similar mismatch occurs when comparing the (countable) number of algorithms to the (uncountable) number of possible yes-no functions. By definition, an algorithm is a finite list of instructions, written over some finite character set. As such, there are only a countable number of different algorithms that can be written. It may be helpful to consider the set of all Pascal programs and view each file that contains the ASCII code for a program, which is essentially a sequence of zeros and ones, as one very long binary integer. Clearly, an infinite number of Pascal programs can be written, but no more programs than there are binary integers, so the number of such files is indeed countable. Now consider the possible lists of answers that could be given to questions involving a countable number of instances. We will argue that there are an uncountable number of yes-no patterns that might describe the answers to such questions. Notice that the descriptions for automata, grammars, and the like are also finite, and thus there are a countable number of DFAs, a countable number of grammars, and so on, that can be described. The questions we asked in the previous sections were therefore applied to a countable number of instances, and these instances could be ordered in some well-defined way, much as the natural numbers are ordered. If we think of a yes response corresponding to the digit 1 and a no response corresponding to 0, then the corresponding series of answers to a particular question can be thought of as an unending sequence of Os and Is. By placing a decimal point at the beginning of the sequence, each such pattern can be thought of as a binary fraction, representing a real number between .00000 ... = 0 and .111111... = 1. Conversely, each such real number in this range represents a sequence of yes-no answers to some question. Since there are an uncountable number of real numbers between 0 and 1, there are an uncountable number of answers that might be of interest to us. Some of these answers cannot be obtained by algorithms, since there are not enough algorithms to go around. Thus, there must be many questions that are not decidable. It is not immediately apparent that the existence of undecidable questions is much of a drawback; perhaps all the "interesting" questions are decidable. After all, there are an uncountable number of real numbers, yet all computers and many humans seem to make do with just the countable number of rational numbers. Unfortunately, there are many simple and meaningful questions that are undecidable. We discuss one such question now; others are considered in the next section. Just about every programmer has had the experience of running a program that never produces any output and never shows any sign of halting. For programs that are fairly short, this is usually not a problem. For major projects that are Sec. 12.3 An Undecidable Problem 419 expected to take a very long time, there comes an agonizing moment when we have to give up hope that it is on the verge of producing a useful answer and stop the program on the assumption that it has entered an infinite loop. While it would be very nice to have a utility that would look over a program and predict how long it would run, most of us would settle for a device that would simply predict whether or not it will ever halt. It's a good bet that you have never used such a device, which may at first seem strange since a solution to the halting problem would certainly provide information that would often be useful. If you have never thought about this before, you might surmise that the scarcity of such programs is a consequence of anyone of several limiting factors. Perhaps they are inordinately expensive to run, or no one has taken the time to implement an existing scheme, or perhaps no one has yet figured out how to develop the appropriate algorithms. In actuality, no one is even looking for a "halting" algorithm, since no such algorithm can possibly exist. Let us consider the implications that would arise if such an algorithm could be programmed in, say, Pascal. We can consider such an algorithm to be implemented as a Boolean function called HALT, which looks at whatever program happens to be in the file named data.p and returns the value TRUE if that program will halt, and returns FALSE if the program in data.p would never halt. Perhaps this function is general enough to look at source code for many different languages, but we will see that it is impossible for it to simply respond correctly even when looking solely at Pascal programs. The programmer of the function HALT would likely have envisioned it to be used in a program such as CHECK, shown in Figure 12.5. We will use it in a slightly different way and show that a contradiction arises if HALT really did solve the halting problem. Our specific assumptions are that: 1. HALT is written in Pascal. 2. HALT always gives a correct answer to the halting problem, which means: a. It always returns an answer after a finite amount of time. b. The answer returned is FALSE if the Pascal program in data.p would never halt. c. The answer returned is TRUE if the Pascal program in data.p would halt (or if the program in data.p will not compile). Consider the program TEST in Figure 12.6, which is structured so that it will run forever if the function HALT indicates that the program in the file data.p would halt, and simply quits if HALT indicates that the program in data.p would not halt. Some interesting things happen if we run this program after putting a copy of the source code for TEST in data.p. If HALT does not produce an answer, then HALT certainly does not behave as advertised, and we have an immediate contradiction. HALT is supposed to be an algorithm, so it must eventually return with an answer. Since HALT is a Boolean function, we have only two cases to consider. 420 program CHECK; { envisioned usage of HALT function HALT: boolean; begin { marvelous code goes here end { HALT } Decidability Chap. 12 begin { CHECK } if HALT then writeln( 'The program in file data.p will halt') else writeln( 'The program in file data.p will not halt') end { CHECK }. Figure 12.5 A possible usage of HALT Case 1: HALT returns a value of TRUE to the calling program TEST. This has two consequences, the first of which is implied by the asserted behavior of HALT. i, If halt does what it is supposed to do, this means that the program in data.p halts. We ran this program with the source code for TEST in data.p, so TEST must actually halt. The second consequence comes from examining the code for TEST, and noting what happens when HALT returns TRUE. ii. The if statement in the program TEST then causes the infinite loop to be entered, and TEST runs forever, doing nothing particularly useful. Our two consequences are that TEST halts and TEST does not halt. This is a clear contradiction, and so case 1 never occurs. Case 2: HALT returns a value of FALSE to the calling program TEST. This likewise has two consequences. Considering the advertised behavior of HALT, this must mean that the program in data.p, TEST, must not halt. However, the code for TEST shows that if HALT returns FALSE we execute the else statement, write one line, and then stop. TEST therefore halts. TEST must again both halt and not halt. Whichever way we turn, we reach a contradiction. The only possible conclusion is that the function HALT does not behave as advertised. It must either return no answer, or give an incorrect answer. It should be clear that the problem cannot be fixed by having the programmer who proposed the function HALT fiddle with the code; the above contradiction will be reached regardless of what code appears between the begin and end statements in the function HALT. We have shown that any such proposed function is guaranteed to behave inappropriately when fed a program such as TEST. In actuality, Sec. 12.3 An Undecidable Problem program TEST; { to be placed in the file data.p var FOREVER: boolean; function HALT: boolean; begin { marvelous code goes here end; { HALT} begin { TEST } FOREVER: = false; if HALT then repeat FOREVER: = false; until FOREVER else writeln ( I This program halts I ) end { TEST }. Figure 12.6 Another program incorporating HALT 421 there are an infinite number of programs that cause HALT to misbehave, but it was sufficient to demonstrate just one failure to justify that no such function can solve the general problem. The above argument demonstrates that the halting problem for Pascal programs is undecidable or unsolvable. That is, there does not exist a Pascal program that can always decide correctly, when fed an arbitrary Pascal program, whether that program halts. If we were to define an algorithm as "something that can be programmed in Pascal," we would have shown that there is no algorithm for deciding whether an arbitrary Pascal program halts. One might suspect that this is therefore not a very satisfying definition of what an algorithm is, since we have a concise, well-stated problem that cannot be solved using Pascal. It is generally agreed that the problem does not lie with some overlooked feature that was inadvertently not incorporated into Pascal. Clearly, all programming languages suffer from similar inadequacies. For example, an argument similar to the one presented for Pascal would show that no C program can be devised that can tell which C programs can halt. Thus, no other programming language can provide a more robust definition of what an algorithm is. There are variations on this theme that likewise lead to contradictions. Might there be a Pascal program that can check which C programs can halt? If you believe that every Pascal program can be rewritten as an equivalent C program, the answer is definitely no; a Pascal program that checks C programs could then be rewritten as a C program (which checks C programs), and we again reach a contradiction. It is generally agreed that the limitations do not arise from some correctable inadequacy in our current methods of implementing algorithms. That is, the limitations of algorithmic solutions seem to be inherent in the nature of algorithms. Programming languages, Turing machines, grammars, and all other proposed sys422 Decidability Chap. 12 tems for implementing algorithms have been shown to be subject to the same limitations in computational power. The use of Turing machines to implement algorithms has several implications that apply to the theory of languages. These are explored in the following sections. 12.4 TURING DECIDABILITY In the previous section, we saw that no Pascal program could always correctly predict when another Pascal program would halt. A similar statement was true for C programs, and Turing machines, considered as computing devices, are no different; no Turing machine solves the halting problem. Each of us is probably familiar with the way in which a Pascal program reads a file, and hence it is not hard to imagine a Pascal program that reacts to the code for another Pascal program. As long as the input alphabet contains at least two symbols, encodings can be defined for the structure of a Turing machine, which allows the blueprint for its finite state control to be placed on the input tape of another Turing machine. A binary encoding might be given for the number of states, followed by codes that enumerate the moves from each of the states. Just as we are not presently concerned about the exact ASCII codes that define the individual characters in a file containing a Pascal program, we need not be concerned with the specific representation used to encode a Turing machine on an input tape. Consider input tapes that contain the encoding of a Turing machine, followed by some delimiter, followed by an input word. Assume there exists a Turing machine H that, given such an encoding of an arbitrary machine and an input word, always correctly predicts whether the Turing machine represented by that encoding halts for that particular word. This assumption leads to a contradiction exactly as shown in the last section for Pascal programs. We would be able to use the machine H.as a submachine in another Turing machine that halts exactly when it is not supposed to halt, and thereby show that H cannot possibly behave properly. V Theorem 12.10. Given a Turing machine M and a word w, it is undecidable whether M halts when the string w is placed on the input tape. Proof. The proof is essentially the same argument that was presented in the last section. ~ We will see that the unsolvability of the halting problem will imply that it is not decidable whether a given string will cause a Turing machine to halt and print Y. If a word is accepted, this fact can eventually be discovered, but we cannot reliably tell which words are rejected by an arbitrary Turing machine. If we could, we would have an algorithm for computing the complement of any Turing-acceptable language. In the next section, we will show that there are Turing-acceptable languages Sec. 12.4 Turing Decidability 423 that have complements that are not Turing-acceptable, which means that a general algorithm for computing complements cannot exist. A problem equivalent to the halting problem involves the question of whether an arbitrary type 0 grammar accepts a given word. This can be seen to be almost the same question as was asked of Turing machines. yo Theorem 12.11. Given a type 0 grammar G and a word w, it is undecidable whether G generates w. Proof. If this question were decidable, it would provide an algorithm for solving the halting problem, which is known to be undecidable. That is, if there existed an algorithm for deciding whether w E L (G), there would also be an algorithm for deciding whether w is accepted by a Turing machine. The Turing machine algorithm would operate as follows: Given an arbitrary Turing machine M, modify M to produce M I , an equivalent machine that halts only when it accepts. Use Definition 11.8 to find the corresponding type 0 grammar GM " which is also equivalent to M. The algorithm that predicts whether w EL(G M,) can now be used to decide whether M halts on input w. This scheme would therefore solve the halting problem for an arbitrary Turing machine, and hence the algorithm that predicts whether w E L (G) cannot exist. Thus, w E L (G) is undecidable for arbitrary type 0 grammars. ~ Given the intimate correspondence between Turing machines and type 0 grammars, it is perhaps not surprising that it is just as hard to solve the membership question for type 0 grammars as it was to solve the halting problem for Turing machines. We now consider a question that may initially appear to be more tractable than the halting problem. However, it will be shown to be unsolvable by the same reasoning used in the last theorem: if this question could be decided, then the halting problem would be decidable. yo Theorem 12.12. Given an arbitrary Turing machine T, it is undecidable whether T accepts A. Proof. Assume that there exists an algorithm for deciding whether T accepts A. That is, assume that there exists a Turing machine X that, when fed an encoding of any Turing machine T, halts with Y when T would accept A and halts with N whenever T rejects A. X can then be used to determine whether an arbitrary Turing machine M would accept an arbitrary word x. Given a machine M and a string x, it is easy to modify M to produce a new Turing machine TMx , which accepts A exactly when M accepts x. TM x is formed by adding a new start state that checks whether the read head is initially scanning a blank (that is, if A is on the input tape). If not, control remains in this state, and TM x never halts. However, if the initially scanned symbol is a blank, new states are used to write x on the input tape and return the 424 Decidability Chap. 12 read head to the leftmost symbol of x. Control then passes to the original start state of M. In this manner, TM. accepts A exactly when M accepts x. This correspondence makes it possible to use the Turing machine X as a submachine in another Turing machine XH that solves the halting problem. That is, given an input tape with an encoding of a machine M followed by the symbols for a word x, XH can be easily programmed to modify the encoding of M to produce the encoding of TM. and leave this new encoding on the input tape before passing control to the submachine X. XH then accepts exactly when TM. accepts A, which happens exactly when M halts on input x. XH would therefore represent an algorithm for solving the halting problem, which we know cannot exist. The portion of the machine that modifies the encoding of M is quite elementary, so it must be the submachine X that cannot exist. Thus, there is no algorithm that can accomplish the task for which X was designed, that is, determining whether an arbitrary Turing machine T accepts the empty string. d The conclusion that X was the portion of XH that behaves improperly is akin to the observation in the previous section that the main part of the Pascal program TEST was valid, and hence it must be the function HALT that behaves incorrectly. 12.5 TURING-DECIDABLE LANGUAGES We now consider languages whose criteria for membership is related to the halting problem. Define the language D to be those words that either are not encodings of Turing machines or are encodings of machines that would halt with Y when presented with their own encoding on their input tape. The language D is Turing acceptable, since a multitape machine could copy the input word to a second tape, check whether the encoding truly represented a valid Turing machine, and then use the "directions" on the second tape to simulate the action of the encoded machine on the original input. The multitape machine would halt with Y if the encoding was invalid or if the simulated machine ever accepts. On the other hand, the complement of D is not Turing acceptable. Let U be the set of all valid encodings of Turing machines that do not halt when fed their own encodings. Then U = D, and there does not exist a machine T for which L (T) = U. If such a machine existed, it would have an encoding, and this leads to the same problem encountered with the HALT function in Pascal. This encoding of T is either a word in U or is not a word in U; both cases lead to contradictions. If the encoding of T belongs to U, then by definition of U it does not halt when fed its own encoding. But the assumption that L (T) = U requires that T halt with Y for all encodings belonging to U, which means T must halt when fed its own encoding. A similar contradiction is also reached if the encoding of T does not belong to U. Therefore, no such Turing machine T can exist, and U is an example of a language that is not Turing-acceptable. Sec. 12.5 Turing-Decidable Languages 425 We have finally found a language that is not type O. A counting argument would have shown that, since there are only a countable number of type 0 grammars and an uncountable number of subsets of 1*, there had to be many languages over I that are not in ~I ( = f'J~J We now see that some of these unrepresentable languages are meaningful sets for which it would be quite desirable to be able to recognize or generate. V Theorem 12.13. If 11111 === 2, then f'JI is not closed under complementation. Proof. Encodings of arbitrary Turing machines can be effectively accomplished with only two distinct symbols in the alphabet. The Turing-acceptable language D described above has a complement U that is not Turing-acceptable. b. Our original criteria for belonging to the language L accepted by a Turing machine M implied that M would eventually halt when presented with any word in L, but we had no guarantees about how M will behave when presented with a word that is not in L. M may halt with N on the tape, or M may run forever. Indeed, we have just seen a Turing-acceptable language for which this will be the best we can hope for. Turing machines therefore embody procedures, which are essentially a deterministic set of step-by-step instructions. We now consider the languages accepted by the subclass of Turing machines that correspond to algorithms, procedures that are guaranteed to eventually halt under all circumstances. V Definition 12.2. Let I be an alphabet. Define '1eI to be the collection of all languages that can be recognized by Turing machines that halt on all input. b. Languages in '1eI are called Turing-decidable languages. A trivial modification shows that L E '1eI if there exists a Turing machine that not only halts upon placing a Y after the input word on an otherwise blank tape for accepted words, but similarly preserves the input word and prints N for each rejected string. Such devices will be referred to as halting Turing machines. V Theorem 12.14. '1eI is closed under complementation. Proof. Let L be a Turing-decidable language. Then there must exist a Turing machine H for which L(H) = L and that halts with Y or N for all strings in 1*. The finite-state control of H can be easily modified to produce a Turing machine H' for which L (H') = L. All that is required is to replace every transition in H that prints N with a similar transition that prints Y and likewise make sure that N will be printed by H' whenever H prints Y. b. This result has some immediate consequences. 426 Decidability Chap. 12 V Corollary 12.2. There is a language that is Turing acceptable but not Turing decidable. That is, 'ZItI C ?II. Proof. Definition 12.2 implies that 'ZItI C,?II. By Theorems 12.13 and 12.14, these two classes have different closure properties, and thus they cannot be equal. Therefore, 'ZItI C ?II' Ll Actually, we have already seen a language that is Turing acceptable but not Turing decidable. D was shown to be Turing acceptable, but if D were Turing decidable, then its complement would be Turing decidable by Theorem 12.14. However, ~D = U, and U is definitely not Turing decidable since it is not even Turing acceptable. 0I, the context-sensitive languages, is another subclass of ?II. It is possible to determine how 'ZItI relates to 0I and thereby insert 'ZItI into the language hierarchy. V Corollary 12.3. Every context-sensitive language is Turing decidable. That is, 0I C, 'ZItI. Proof. This is actually a corollary of Theorem 12.9. Given a type 1 language L, there is a context-sensitive grammar G that generates L. The proof of Theorem 12.9 presented an algorithm for determining whether an arbitrary word can be generated by G. This algorithm can be implemented as a Turing machine TG that can determine whether a given word can be generated by G and always halts with the correct answer. Thus, L is Turing decidable. Ll These implications provide the missing element in the proof of Theorem 11.10, as stated in the next corollary. V Corollary 12.4. The class of context-sensitive languages is properly contained in the class of Turing acceptable languages. That is, 0I C ?II. Proof. By the previous corollaries, 0Ic' 'ZItI and 'ZItI C ?II. Actually, the context-sensitive languages are properly contained in 'ZItI. This will be shown by exhibiting a language that is recognized by a Turing machine that halts on all inputs,but that cannot be generated by any context-sensitive language. The following proof, based on diagonalization, should by now look familiar. V Theorem 12.15. Let I be an alphabet for which IIIII ;::: 2. There is a language that is Turing decidable but not context sensitive. That is, 0IC 'ZItI' Proof. By Corollary 12.3, 0I C, 'ZItI, and it remains to be shown that there is a member of 'ZItI that is not a member of 0I' By the technique described in Theorem Sec. 12.5 Turing-Decidable Languages 427 12.9, every context-sensitive grammar can be represented bya halting Turing machine, and each such Turing machine has an encoding of its finite-state control. Define L to be the set of all encodings of Turing machines that: 1. Represent context-sensitive grammars. 2. Reject their own encoding. Providing a scheme for encoding the quadruple for a context-sensitive grammar is .left for the exercises. Any reasonable encoding scheme will make it a simple task to determine whether a candidate string represents nonsense or a valid contextsensitive grammar. L can therefore be recognized by a halting Turing machine that: 1. Checks if the string on the input tape represents the encoding of a valid context-sensitive grammar. 2. Calculates the encoding of the corresponding Turing machine. 3. Simulates that Turing machine being fed its own encoding. This process is guaranteed to halt, since the Turing machine being simulated is known to be a halting Turing machine. Thus, L E ~};. However, if LEO};, we find ourselves in a familiar dilemma. If there is a context-sensitive grammar GL that generates L, then this grammar would have a corresponding Turing machine TL, which would have an encoding XL' IfXL did not belong to L, then by definition of Lit would be an encoding of a machine (Ti.) that did not reject its own encoding (XL)' Thus, TL recognizes XL, and therefore the corresponding grammar GL must generate XL' But then XL E L (Gd = L, contradicting the assumption that ~L did not belong to L. If on the other hand XL belongs to L, then, by definition of L, TL must reject its own encoding (xd, and thus XL $. L (Td = L (Gd = L, which is another contradiction. Thus, no such context-sensitive grammar GL can exist, and L is not a contextsensitive language. Ii. The above diagonalization technique can be generalized; given any enumerable class '€ of languages whose members are all represented by halting Turing machines, there must exist a halting Turing machine that recognizes a language not in'€ (see the exercises). The following theorem summarizes how the other classes of languages discussed in the text fit in the language hierarchy. V Theorem 12.16. Let!' be an alphabet for which 11!'11 ;:::*2. Then ~}; = W};= ~}; = '§}; C sth C '€}; = <;p}; C :£}; = V}; C ~}; C ~}; = '!J}; C pC!'*) Proof. The relationship between the type 0, type 1, type 2, and type 3 languages was outlined in Theorem 11.10. Theorem 10~8 showed that sth properly 428 Decidability , . , Chap. 12 lies between the type 3 and type 2 languages. Corollary 12.2 and Theorem 12.15 show that 'lItt. properly lies between the type 1 and type 0 languages and also show that the type 1 languages are a proper subset of the type 0 languages. The existence of languages that are not Turing acceptable shows that <fJt. is properly contained in p(I*). A counting argument shows that proper containment of <fJt. in p(I*) also holds even if I is a singleton set. a The relationships between six distinct and nontrivial classes of languages are characterized by Theorem 12.16. Each of these classes is defined by a particular type of automaton. The trivial class of all languages, p(I*), was shown to have no mechanical counterpart. We have seen that type 3 languages appear in many useful applications. Program design, lexical analysis, and various engineering problems are aided by the use of finite automata concepts. Programming languages are always defined in such a way that they belong to the class alt., since compilers should operate deterministically. The theory of compiler construction builds on the material presented here; syntactic analysis, the translation from source code to machine code, is guided by the generation of parse trees for the sentences in the program, which in turn give meaning to the code. The type 0 languages represent the fundamental limits of mechanical computation. The concepts presented in this text provide a foundation for the study of computational complexity and other elements of computation theory. EXERCISES 12.1. Verify the assertions made in the proof of Theorem 12.1 concerning Theorem 2.7. 12.2. Prove Lemma 12.1. 12.3. Given an FAD language L, the minimal DFA accepting L, and another machine Bfor which L(B) = L, prove that the number of nonfinal states in the minimal machine must be equal to or less than the number of nonfinal states in B. 12.4. Given two DFAs Al = <I, Sb so" ~h, F;> and Az= <I, Ss, S02' ~lz, ~>, show that it is decidable whether L(AI ) ~L(Az). 12.5. Given any alphabet I and a DFA A = <I, S, se,8, F>, show that it is decidable whether L(A) is cofinite. (Note: A set Lis cofinite ifjits complement is finite, that is, ifjI* L is finite.) 12.6. Given any alphabet I and a DFA A= <I,S,so,8,F>, show that it is decidable whether L(A) contains any string of length greater than 1228. 12.7. Given any alphabet I and a DFA A= <I,S,so,8,F>, show that it is decidable whether A accepts any even-length strings. 12.8. Given any alphabet I and regular expressions R I and R, over I, show that it is decidable whether R I and R, represent languages that are complements of each other. 12.9. Given any alphabet I and regular expressions R I and R, over I, show that it is decidable whether R I and R, describe any common strings. Chap. 12 Exercises 429 12.10. Given any alphabet I and a regular expression R 1 over I, show that it is decidable whether there is a DFA with less than 31 states that accepts the language described by R1 • 12.11. Given any alphabet I and a regular expressions R 1 over I, show that it is decidable whether there is a DFA with more than 31 states that accepts the language described by R1 • (You should be able to argue that there is a one-step algorithm that always supplies the correct yes-no answer to this question.) 12.12. Given any alphabet I and a regular expression R over I, show that it is decidable whether there exists a NDFA (with A-moves)with at most one final state that accepts R. 12.13. Given any alphabet I and a DFA A = <I, S, sO, 8, F>, show that it is decidable whether there exists a NDFA (without A-moves) with at most one final state that accepts the same language A does. 12.14. Given any alphabet I and regular expressions R1 and Rz over I, show that it is decidable whether R1 = Rz. 12.15. Given any alphabet I and regular expressions R1 and Rz over I (which represent languages L1 and Ls, respectively), show that it is decidable whether they generate the same right congruences (that is, whether RL 1 = RLz) ' 12.16. Prove Theorem 12.5. 12.17. Outline an efficient algorithm for computing {3(so,Y)[Y EI* /\ n s; Iyl <2n} in the proof of Theorem 12.2, and justify why your procedure always halts. 12.18. Consider intersecting the set {3(so,y) Iy E I* /\ 5n ::; Iy 1< 6n} with F to answer the question posed in Theorem 12.2. Would this strategy always produce the correct answer? Justify your claims. 12.19. Show that it is decidable whether two Mealy machines are equivalent. 12.20. Show that it is decidable whether two Moore machines are equivalent. 12.21. Given any alphabet I and a regular expression R, show that it is decidable whether R represents any strings of length greater than 28. Give an argument that does not depend on finite automata or grammars. 12.22. Given any alphabet I and a right-linear grammar G, show that it is decidable whether L(G) contains any string of length greater than 28. Give an argument that does not depend on finite automata or regular expressions. 12.23. Refer to the proof of Theorem 12.6 and prove that Zõ Z, ~ ... ~ Z; ~ ... ~ n. 12.24. Refer to the proof of Theorem 12.6 and prove that if (3m E N)(Zm = Zm+l) then Zm will then represent the set of all nonterminals that can be reached from the start symbol S. 12.25. Refer to the proof of Theorem 12.6 and prove that (3m E N)(Zm = Zm+l)' 12.26. (a) Refer to the proof of Theorem 12.6 and give a formal definition of Wi. (b) Prove that Wõ W I ~ •.. ~ Wn ~ ••• ~n. 12.27. Refer to the proof of Theorem 12.6 and prove that if (3m E N)(W m = Wm+l) then Wm will represent the set of all nonterminals that can produce valid terminal strings. 12.28. Refer to the proof of Theorem 12.6 and prove that (3m E N)(Wm = Wm+l). 12.29. Let A be an arbitrary NDFA (with A-moves). A string processed by A may successfully find several paths through the machine; it is also possible that a string will be rejected because there are no complete paths available. 430 Decidability , . , Chap. 12 (a) Show that it is decidable whether there exists a string with no complete path in A. (b) Show that it is decidable whether there exists a string that has at least one path through A that leads to a nonfinal state. (c) Show that it is decidable whether there exists a string accepted by A for which all complete paths lead to final states. (d) Show that it is decidable whether all strings accepted by A have the property that all their complete paths lead to final states. (e) Show that it is decidable whether all strings have unique paths through A. 12.30. Given two DFAs Al = <z, 51,so" 31,FI>and A2= <~, 52,so" 32,F2>: (a) Show that it is decidable whether there exists a homomorphism between Al and A2 • (b) Show that it is decidable whether there exists an isomorphism between Al and A2 • (c) Show that it is decidable whether there exist more than three isomorphisms between Al and A2 • (Note: There are examples of disconnected DFAs for which more than three isomorphisms do exist!) 12.31. Given any alphabet ~ and a regular expression R I over ~, show that it is decidable whether R I describes an infinite number of strings. Do this by developing an algorithm that does not depend on the construction of a DFA, that is, does not depend on Theorem 12.2. 12.32. Given a Mealy machine M and a Moore machine A, show that it is decidable whether M is equivalent to A. 12.33. Given any alphabet ~ and regular expressions RI and R2 over ~, show that it is decidable whether the language represented by R2 properly contains that of RI • 12.34. It can be shown that it is decidable whether L(A) = f/J for any NDFA A by first finding the equivalent DFA Ad and applying Theorem 12.1 to that machine. (a) Give an efficient method for answering this question that does not rely on the conversion to a DFA. (b) Give an efficient method for testing whether L(A) is infinite for any NDFA A. Your method should likewise not rely on the conversion to a DFA. 12.35. Given a DPDA M, show that it is decidable whether L(M) is a regular set. 12.36. (a) Refer to Theorem 12.9 and outline an appropriate algorithm for determining paths in the graphs discussed. (b) Give the details for a more efficient recursive algorithm. 12.37. Prove that 'Je}'. is closed under: (a) Union (b) Intersection (c) Concatenation (d) Reversal 12.38. Let L = L2(T) for some Turing machine T that halts on all inputs. That is, let L consist of all strings that cause T to halt with Y somewhere on the tape. Prove that there exists a halting Turing machine T" for which L = LlT) = L(1'). T' must: 1. Halt on all input. 2. Place a Y after the input word on an otherwise blank tape for accepted words. 3. Place an N after the input word on an otherwise blank tape for rejected words. 12.39. (a) Assume there is a Turing machine M'll that determines whether an encoding of a Turing machine T belongs to some set X. Let the class of languages recognized by Chap. 12 Exercises 431 Turing machines with encodings in X be denoted by '€. Prove that if every encoding in X represents a halting Turing machine then there must exist a halting Turing machine that recognizes a language not in '€. (b) Apply part (a) to prove Theorem 12.15. 12.40. (a) Outline a scheme for encoding the quadruple of context-sensitive grammars suitable for use by a Turing machine. You may assume that there are exactly two terminal symbols, but note that your scheme must be able to handle an unrestricted number of nonterminals. (b) Outline the algorithm that a Turing machine might use to decide whether an input string represented the encoding of a valid context-sensitive grammar. 12.41. Show that it is undecidable whether L (X) = 0for: (a) Arbitrary Turing machines X (b) Arbitrary halting Turing machines X (c) Arbitrary context-sensitive grammars X (d) Arbitrary linear bounded automata X 12.42. Show that it is undecidable whether L(X) = ~* for: (a) Arbitrary Turing machines X (b) Arbitrary halting Turing machines X (c) Arbitrary context-sensitive grammars X (d) Arbitrary linear bounded automata X (e) Arbitrary context-free grammars X (0 Arbitrary pushdown automata X • 12.43. Consider the set E of all encodings of Turing machines that halt on input A.. Prove or disprove: (a) EE~}; (b) Ee~}; (c) EeO}; 12.44. Consider the set N of all encodings of Turing machines that do not halt on input A.. Prove or disprove: (a) NE~}; (b) N.~}; (c) NEO}; , . , REFERENCES [ALEK] I. ALEKSANDER and F. HANNA, Automata Theory: An Engineering Approach. Crane Russak, New York, 1975. [BARW] J. BARWISE and J. ETCHEMENDY, Turing's World: A Computer-Based Introduction to Computability Theory. Kinko's Academic Courseware Exchange, Santa Barbara, 1986. [BOOT] T. BOOTH, Sequential Machines and Automata Theory. Wiley, New York, 1967. [BAVE] Z. BAVEL, Introduction to the Theory ofAutomata. Prentice Hall, Englewood Cliffs, N.J., 1983. [DACR] F. DA CRUZ, KERMIT Protocol Manual. Columbia University Press, New York, 1984. [DENN] P. DENNING, J. DENNIS, and J. QUALITZ, Machines, Languages, and Computation. Prentice Hall, Englewood Cliffs, N.J., 1978. [FERR] D. FERRARI, Computer Systems Performance Evaluation. Prentice Hall, Englewood Cliffs, N.J., 1978. [GIN1] A. GINZBURG, Algebraic Theory ofAutomata. Academic Press, New York. [GIN2] S. GINSBURG, Introduction to Mathematical Machine Theory. Addison-Wesley, Reading, Mass. [HART] J. HARTMANIs and R. E. STEARNS, Algebraic Structure Theory of Sequential Machines. Prentice Hall, Englewood Cliffs, N.J. [HOPe] J. E. HOPCROFT and J. D. ULLMAN, Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading, Mass., 1979. [KAIN] R. KAIN,Automata Theory: Machines and Languages. McGraw-Hill, New York. [KOHA] Z. KOHAVI, Switching and FiniteAutomata Theory. McGraw-Hill, New York, 1978. [LEWI] H. LEWIS and C. PAPADIMITRIOU, Elements of the Theory of Computation. Prentice Hall, Englewood Cliffs, N.J., 1984. [MCNA] R. McNAUGHTON, Elementary Computability, Formal Languages, and Automata. Prentice Hall, Englewood Cliffs, N.J., 1982. [MINS] M. MINSKY, Computation, Finite and Infinite Machines. Prentice Hall, Englewood, N.J., 1967. [MOOR] E. F. MOORE, editor, Sequential Machines, Selected Papers. Addison-Wesley, Reading, Mass., 1964. [NELS] R. J. NELSON, Introduction to Automata. Wiley, New York. [SALOl A. SALOMAA, Theory ofAutomata. Pergamon Press, New York, 1969. [SAVI] W. SAVITCH, Abstract Machines and Grammars. Little, Brown, Boston, 1982. [SHIE] M. SHIELDS, An Introduction to Automata Theory. Blackwell Scientific Publications, London, 1987. [TANE] A. S. TANENBAUM, Computer Networks. Prentice Hall, Englewood Cliffs, N.J., 1981. 432 INDEX .s41:,358 Absorption laws, 3 Accept, 31, 35, 122 by empty stack, 330, 335 by final state, 41 circuitry, 51-53 Pascal implementation, 37 Acceptor. (see Finite automaton) Accepting state, 23 Accessible state, 87 Addition automaton, 238-40 Algorithm, 86, 406 for finding connected states, 100, 107-8, 114 for finding the state equivalence relation: in a DFA, 103-7 in a FST, 224-25 in a MSM, 236 for reducing a DFA, 107 for reducing a transducer, 225, 236 for solving language equations, 185,189 Alphabet, 24 auxiliary, 366, 384 binary, 24 input, 30, 119,211,329,366, 384 output, 211 stack,329 Ambiguous context-free grammar, 290 Ambiguous context-free language, inherently, 296 Applications of finite automata, 54, 123, 129-31,138-39 of finite-state transducers, 237 ASCII alphabet, 41-42, 47, 51 Associative laws, 3, 181 Automaton: induced by a right congruence, 69,71-72 (see Finite automaton, Linear bounded automaton, Mealy sequential machine, Moore sequential machine, Pushdown automaton, Turing machine) Auxiliary tape of a PDA, 330 Backus-Naur form, 18,44, 127, 253 Balanced parentheses language, 372 Basic machines, 182-83 Basis step, 16 Biconditional, 2 Bijection, 11,94 Binary alphabet, 24 Binary operator, 146 Black box model, 28 Blank symbol, 366 BNF. (see Backus-Naur form) Bottom of the stack, 328 symbol, 329 '€1:,296 Clanguage,40,421 Canonical form, 260, 276-78, 301 Chomsky normal form, 308 Greibach normal form, 310 principle disjunctive, 22 Cardinality, 13 Cartesian product, 5, 151,352 Characteristic function, 9 Chomsky normal form, 308 Circuit diagram, 3, 48, 50, 53, 133,140,244 Circuit implementation: of deterministic finite automata,46 of nondeterministic finite automata,131 of nondeterministic finite automata with lambda-moves, 139-40 of finite-state transducers, 242-44 Closure, 146-47 effective, 151 Kleene, 158 of language classes, 146,202-3, 274-75,319-23,352-60, 399-401 reflexive and transitive, 255, 262,335,370 unit, 306 Closure properties: of context-free languages, 319-23 of deterministic context-free languages, 359-60 of finite automaton definable languages, 148-69 of Turing-acceptable languages, 399-401 under language homomorphism, 164-65,320, 359-60,399-400 CNF. (see Chomsky normal form) Coalescence of states, 88, 222, 235 Codomain of a function, 8 Cofinite, 21, 170 Cognitive device, 257 Comment recognition, 56 Commutative laws, 3, 181 Complementation, 142, 148, 162, 322,359,425 Complete parse tree, 285 Composition of functions, 11 Concatenation: of languages, 153-57,275,321, 400 of strings, 25 Configuration: of a pushdown automaton, 335 of a Turing machine, 370 Congruence. (see Right congruence) Congruence modulo n, 6 mod 2 languages, 43 modular arithmetic, 78 Connected machine, 87, 220 Connected version ofa DFA, 98 of a transducer, 220 Connectives, 1 Context-free grammar, 260 ambiguous, 290 decidable questions, 414 equivalence with pushdown automata, 339 unambiguous, 296 pure, 260 Context-free language, 260, 284 closure properties, 319-23 deterministic, 358 inherently ambiguous, 296 unambiguous, 296 Context-sensitive grammar, 258 decidable questions, 416 433 434 Context-sensitive grammar (cont.) pure, 258 Context-sensitive language, 258-59,386 Contracting production, 257 Converse relation, 12 Correspondence, one-to-one, 11, 94 Countable set, 15,418 Countably infinite set, 14 Counting automaton, 328-29, 338 Cross product, 5, 151,352 2/)>;,147 DCFL. (see Deterministic context-free languages) Dead state, 117 Decidability, 405, 407 of equivalence of DFAs, 410 of equivalence of regular grammars, 412 of emptiness of regular set, 407 of context-free language, 414 of finiteness of regular set, 409 of context-free language, 414 of membership in CSL, 416 8 function. (seeExtended state transition function) 8 function. (see State transition function) DeMorgan's laws, 3, 151 Denumerable, 15 Derivation, 262, 284-85 leftmost, 289 rightmost, 289 Derivation Tree, 285 Deterministic context-free languages,358 closure properties, 359-60 Deterministic finite automaton, 28 circuit implementation, 46 homomorphism, 93 induced by a right congruence, 69,71-72 isomorphism, 87,94 minimization, 97 software implementation, 32-40 Deterministic pushdown automaton, 354 Deterministic Turing machine, 366,380 DFA. (see Deterministic finite automaton) D flip-flop, 47, 131 Diagonalization technique, 418, 427 Directly derives, 254, 262 Disjoint sets, 7 Disjunctive normal form. (see Principle disjunctive normal form) Distinguishable states, 88-89, 222 Distributive laws, 3, 181 DO loop lookahead, 301 Domain of a function, 8 DPDA. (see Deterministic pushdown automaton) Duality, principle of, 3 Editors, text, 54-56 Effective closure, 151 Empty language, 41 Empty set, 41,179 Empty stack criteria for PDA acceptance, 330, 335 Empty string, 27 Empty word, 27 Encoding of states and alphabets, 47-52 End-of-file (EOF) packet, 57-58 End-of-string <EOS>, 47-49, 131 End markers for LBA, 384 Enumerate, 15 Enumeration of move sequences, 381 EOF packet, 57-58 <EOS>, 47-49,131 e, 27, 179 e-move,134 Equality: of sets, 4 of strings, 26 Equation system, 187 algebraic, 185, 188 derived from automaton, 191, 200 derived from grammars, 26970 solution of, 185, 189, 198, 199 Equipotent, 13 Equivalence. (seeEquivalent) Equivalence class, 6, 66 Equivalence relation, 5, 65 rank of, 68 right congruence, 65-68 between states (see State equivalence relation) Equivalent CFG corresponding to PDA,346 Equivalent CSG corresponding to a strict LBA, 395 Equivalent DFA corresponding to an NDFA, 124-25, 130-31 Equivalent finite automata, 74, 97, 123, 127 Index Equivalent finite-state transducers, 217-23, 228-31, 232-36 Equivalent FST corresponding to a MSM, 228 Equivalent grammars, 263 Equivalent logical expressions, 2, 16 Equivalent MSM corresponding to a FST, 229-30 Equivalent NDFA without lambda-moves, 136 Equivalent PDA corresponding to CFG,339-40 Equivalent pushdown automata, , 336,338,349,351 Equivalent regular expressions, 181 Equivalent representations, 264 Equivalent states, 88-89, 222 Equivalent Turing machine corresponding to a type 0 grammar, 391 Equivalent type 0 grammar corresponding to a Turing machine, 389 Evaluation of computer performance,56-57 Existential quantifier, 4 Extended output function, 214, 227 Extended state transition function, 33, 120, 136, 214, 227 C implementation, 40 Pascal implementation, 35 »; 339 Factorial function, 17 FAD. (seeFinite automaton definable language) Fetching instructions, 56 Final state, 30,119,211,329 criteria for PDA acceptance, 335 Finite automata, equivalent, 74, 97, 123, 127 Finite automaton, 23, 28 C implementation, 40 derived from regular expression, 182-84 deterministic, 28, 30 minimal deterministic, 87 nondeterministic, 119 Pascal implementation, 35 Finite automaton definable language,41 closure properties, 148-69 Finite rank, of a relation, 68, 70, 80 Finite set, 14, 170, 209, 409, 416 .' 436 Language (cont.} hierarchy, 261, 401, 427 infinite, 171 inherently ambiguous, 296 homomorphism, 163, 164-65, 320,359-60,399-400 operators, 148-69,201 parsing, 285-301, 355 programming, 32-40, 56, 60, 63,65,138,163,355, 417-22 regular, 179 reverse of, 82,128-29,172, 206-7 Turing-acceptable, 257, 370 Turing-decidable, 424-25 type k, 257,259,260,269 unrestricted, 257 (see Context-free language, Context-sensitive language, Regular language, Turing-acceptable language) Language acceptor. (see Finite automaton) Language generator. (see Grammar) LBA. (see Linear bounded automaton) Left-linear grammar, 267 Left-linear set equations, 199 Leftmost derivation, 289 Left recursion, 310 elimination of, 311-15 Length: of a derivation, 310 of a path, 75 of a string, 25 with respect to b, 26 preservation, 164, 214 Letter. (see Symbol) LEX, 138 Lexicographic order, 381, 402 Linear bounded automaton, 376, 384 strict, 392 LL(k) grammar, 301, 356-57 Logic gates, 2, 46 Lookahead. (see LL(k) grammar) M modulo its state equivalence relation, 100,222,235 Machine. (see Automaton) Mathematical induction. (see Induction) Mealy sequential machine. (see Finite-state transducer) Minimal deterministic finite automaton, 54, 74, 95, 86-87 Minimization of deterministic finite automata, 97 Minimization of finite-state transducers, 217 Minimization of logic circuitry, 2, 50 Minimization of Moore sequential machines, 234-36 Minimum-state machine, 102, 225,236 Modulo. (see Congruence modulo n, M modulo its state equivalence relation) Moore sequential machine, 225 minimization, 234-36 MSM. (see Moore sequential machine) Multihead Turing machine, 377-78 Multiple transitions, 118, 380 Multitape Turing machine, 379 Multitrack Turing machine, 376-77 Mutual exclusion laws, 3 .N...,162 Natural numbers, 4, 15 NDFA. (see Nondeterministic finite automaton) NDPA. (see Nondeterministic pushdown automaton) Nerode's Theorem, 70 Nondeterministic finite automaton, 116,266,303-6,338 circuit implementation, 131, 139-40 with lambda transitions, 134 Nondeterministic pushdown automaton, 329 Nondeterministic Turing machine, 380 Nongenerative production, 306 Nonterminal, 255 left-recursive, 310 useful, 302 useless, 302 Normal forms, 260, 276-78, 308, 310 (see also Canonical forms) 0... ,386 Ogden's lemma, 318 n. (see Output alphabet) lii function. (see Extended output function) eo function. (see Output function) One-to-one correspondence, 11, 94 One-to-one function, 10 Onto function, 10 Opcodes,56 Index Operator: binary, 146 language, 148-69,201 unary, 147 Order, lexicographic, 381, 402 Ordered pair, 335 Ordered quadruple, 256, 258, 260, 262, 267 Ordered quintuple, 30, 119, 366 Ordered septuple, 329 Ordered sextuple, 211, 226 Ordered triple, 31, 335 Output alphabet, 211 Output circuitry, 244 Output function, 211 extended, 214, 227 13'...,339 Packets, in Kermit protocol, 57 Pairwise disjoint, 7 Palindrome, 82 center-marked, 82 Parenthesis checker, 372 Parse trees, 284 complete, 285 Parsing, 285-301, 355 Partial state equivalence relation, 103,224,236 Partition, 7, 70 induced, 7 induced by a DFA, 68 Pascal, 65, 417-21 begin-end pairs, 163 comments, 56, 63 DFAimplementation,32-39 Path: in the graph corresponding to a CSG,416 through an automaton, 75-77 Pattern matching, 55-56, 123, 129, 130 PCNF. (see Pure Chomsky normal form) PDA. (see Pushdown automaton) PDNF. (see Principle disjunctive normal form) Performance evaluation, 56-57 PGNF. (see Pure Greibach normal form) Pop off stack, 328 Postfix arithmeticlanguage, 63 Post system, 374 Power set, 13, 124 Precedence rules, 180 Predicate, 3-4 Prefetching instructions, 56-57 Prefix language, 172 Preorder traversal of parse tree, 289-90 Prime length language, 82, 175 Index Finite state control, 28-29, 211, 366 Finite-state transducer, 211 circuit implementation, 242-44 homomorphism, 218, 232 isomorphism, 218, 232 minimization, 217 Finite transducer definable, 216 Flip-flop: D, 47,131 SR,131 V,4 Formula, statement, 1, 16-17 FORTRAN identifier grammar, 253-54 FORTRAN identifier language, 41-42 FORTRAN lookahead problem, 301 FST. (see Finite-state transducer) FTD. (see Finite transducer definable) Function, 8 codomain, 8 composition, 11 characteristic, 9 domain, 8 factorial, 17 one-to-one, 10 onto, 10 homomorphism, 93, 218, 232 identity, 13,65,81,165,203 injective, 10 inverse, 12 isomorphism, 94, 218, 232 output, 211 extended, 214, 227 recursive, 17, 33, 374 range, 10 state transition, 30, 33, 119, 211,366,384 extended, 33, 120, 136,214, 227 surjective, 10 translation, for transducer, 215 well-defined, 8, 69, 96 '9>:,264 Garbage state, 117 Gates, logic, 2, 46 GCD. (see Greatest common divisor) Generation of a language, 257 Generative device, 257 GNF. (see Greibach normal form) Godel, K., 374 Grammar, 253 ambiguous, 290 context-free, 260 context-sensitive, 258 decidable questions, 412-16 hierarchy, 261, 401, 427 left-linear, 267 LL(k), 301, 356-57 pure, 258, 260 regular, 267, 269 right-linear, 262 unambiguous, 290 unrestricted, 256 Graph, 31, 416 Greibach normal form, 310 'iJe>:,425 Halting problem, 417 in C, 421 in Pascal, 419 for Turing machines, 374 Halt state, 366 Head: multiple, 377-78 read, 28, 210 read/write, 330, 365 write, 210 Head recursion, 58-59 Height of a parse tree, 316 Hierarchy: of grammars, 253 of languages, 261, 401, 427 Homomorphism: between deterministic finite automata,93 between finite-state transducers,218 between Moore sequential machines, 232 language, 163 closure properties, 164-65, 320,359-60,399-400 Identity element, 180 Identity function, 13,65,81,165, 203 Identity relation, 5, 91 Ill-defined. (see Well-defined) Implementation: of deterministic finite automata: with hardware, 46 with software, 32-40 of nondeterministic finite automata,131 of nondeterministic finite automata with lambda-moves, 139-40 of finite-state transducers, 242-44 Implies, 4 Increasing condition, 313 Induced partition, 7 Induced relation, 67 435 Induction, 15,33-34, 126 strong, 21 Inductive step, 16 Infinite automata, 80 Infinite set, 14,409,416 Infix language, 84 Inherently ambiguous language, 296 Initial set, 68, 90,115,199 Initial state, 30, 119,211 Injective function, 10 Input alphabet, 30,119,211,329 Input tape, 28 Instance of a question, 407 Integer, 4 Intersection: of two languages, 151-53, 163, 322,359,399-400 with a regular set, 352-53 Inverse function, 12 Inverse homomorphism, 165-67 Isomorphic automata, 95-97, 219 Isomorphism: between deterministic finite automata, 87, 94 between finite-state transducers,218 between Moore sequential machines, 232 ith partial state equivalence relation, 103,224,236 ith partial state set relation, 107 Kermit protocol, 57-58, 237-38 Kleene closure, 158,274,321, 359-60,400 :£>:,386 A. (see Empty string) A-calculus,374 A-closure, 135 A-move, 134 A-transition, 134 Language, 37 accepted by a deterministic finite automaton, 41 accepted by a nondeterministic finite automaton, 122 accepted by a pushdown automaton: via empty stack, 335 via final state, 335 ambiguous, 290 cofinite, 170 context-free, 260, 284 context-sensitive, 257 equations, 185,269-74 FAD, 41 finite, 170,209 generation of, 257 Index Principle disjunctive normal form, 2,50 Procedure,86,405 (see Algorithm) Production, 18,254-55 A-rule, 261 contracting, 257 nongenerative, 306 unit, 306 useful, 302 useless, 302 Programming language, 32-40, 56,60,63,65,138,163, 355,417-22 Protocol, Kermit, 57-58, 237-38 Pumping lemma, 75 Pumping theorem, 315 Pure: Chomsky normal form, 308 context-free grammar, 260 context-sensitive grammar, 258 Greibach normal form, 310 Pushdown automaton, 327 closure properties, 352-60 configuration, 335 decidable questions, 416 deterministic, 354-60 equivalence with context-free grammars, 339, 346 equivalence with other PDAs, 336,338,349,351 nondeterministic, 329 two-stack, 352 Pushdown stack, 328-9 Push onto stack, 328 Quadruple, 256, 258, 260, 262, 267 Question, 407 Quintuple, 30, 119,366 Quotient: of two languages, 169-70 with a regular language, 169 Ill;, 182 Range of a function, 10 Rank of a relation, 68 Rational number, 4 Read head, 28, 210 Read/write head, 330, 365 Real number, 4 Recognizer. (See Finite automaton) Recursion, 17,58-59 left, 310 elimination of, 311-15 Recursive function, 17, 33, 374 Reduced deterministic finite automaton,91 Reduced machine, 91, 222 Reduced transducer, 222 Reduced version ofa DFA, 100 of a transducer, 222, 235 Refinement, 7-8, 71, 104,224-25 Reflexive relation, 5, 65-66 Reflexive and transitive closure, 255,262,335,370 Regular expression, 179 identities, 181 Regular expression grammar, 254-55,286-87,340,345 with unique delimiters, 355 deterministic, 356-57 Regular grammar, 267, 269 (see Right-linear grammar, Left-linear grammar) Regular language, 201 (see Regular set) Regular set, 179 closure properties, 201-3 decidability problems, 405-13 derived from DFA, 184 Reject, 31, 35, 122 Relabelling of states, 91 (see Homomorphism, Isomorphism) Relation, 5 converse, 12 equivalence, 5, 65 identity, 5, 91 induced by a language, 67 induced by a machine, 68 refinement, 7-8, 71, 104, 224-25 reflexive, 5, 65-66 rank, 68 right congruence, 66 state equivalence, 88-89, 222 ith partial, 103,224,236 symmetric, 5, 65-66 transitive, 5, 65-66 Relatively prime language, 78 Replacement rule. (seeProduction) Reset circuitry, 51 Reverse: ofalanguage, 128-29, 172, 206-7 of a string, 82, 172 tape processing for addition, 239 Right congruence, 65 corresponding to a DFA, 71-72 induced by a language, 67 induced by a DFA, 68 Right-linear grammar, 261 constructed from a DFA, 264 equivalence of left-linear grammar, 269 yielding a NDFA, 266 Right-linear set equations, 185-99 437 Rightmost derivation, 289 Roman numeral language, 64 Sack and stone automaton, 328 Scientific notation language, 43-46 Semantics, 287 Sentence, 25 Septuple, 329 Sequential machine. (see Finitestate transducer) Set, 4 cardinality, 13 countable, 15,418 denumerable, 15 empty, 41, 179 equations (seeLanguage equations) finite, 14 infinite, 14 uncountable, 15,418 Sextuple, 211, 226 Ship transmission example, 123 :r. (seeAlphabet) :r\ 27 :r + ,27-28 :r*,27-28 Simulating machine behavior, 376-81 <SOS>, 51,131 Stack, 328-29 alphabet, 329 bottom symbol, 329 Start-of-string <SOS>, 51, 131 Start state, 30, 119, 211, 329, 366, 384 Start symbol, 255 State: accessible, 87 accepting, 23 active, 122, 131 dead, 117 disconnected, 88 distinguishable, 88-89, 222 final, 30, 119,211,329 garbage, 117 halt, 366 inaccessible, 88 initial, 30, 119,211 reachable, 87 start, 30, 119,211,329,366, 384 unreachable, 88 State equivalence relation: decidability application, 410 for DFAs, 88-89 for transducers, 222 partial, 103,224,236 Statement, logical, 1 State transition diagram, 31-32 438 State transition function, 30, 33, 119,211,366,384 extended, 33, 120, 136,214,227 State transition table, 31-32 Stone and sack automaton, 328 String, 25 concatenation, 25 empty, 27 matching, 55-56, 123, 129, 130 reverse of, 82 Submachine, 368,372 Subset, 4 proper, 5 Substitution: regular set, 201 closure properties, 202-3, 320, 359-60, 399-400 context-free language, 319 Substring, 27 Subtraction grammar, 291-96, 319 Suffix language, 172 Surjective function, 10 Symbol, 24, 28 blank, 366 buffer, 349 end markers for LBA, 384 nonterminal,255 terminal, 255 Symmetric relation, 5, 65-66 Syntax, 18 correctness, 287 diagrams, 18 '!f>:,381 Tail recursion, 33, 58-59 Tape: auxiliary, 330 blank,366 input, 28 multitrack,376-77 output, 211 stack,328 two-dimensional,379 Tape head, 28, 210, 330 Terminal set, 90,115,187,191, 199 Terminal symbol, 255 3,4 Top of a stack, 328 Traffic signal emulation, 240-42, 248-49 Transducer. (see Finite-state transducer, Moore sequential machine) Transition function. (see State transition function) Transitive relation, 5, 65-66 Translation function for transducer, 215 Tree. (see Derivation tree, Parse trees) Truth tables, 1-2,50,53, 132, 243-44 Turing-acceptable language, 257, 370 closure properties, 399-401 Turing-decidable language, 424-25 Turing, A., 374 Turing machine, 366 acceptance criteria, 370-71 blank symbol, 366 bounded: on one end, 376 on both ends (see Linearbounded automaton) configuration, 370 corresponding grammar, 389 deterministic, 366 encoding, 422 halt state, 366 halting problem, 374 linearly bounded, 376, 384 strict, 392 moves, 366-67 multihead,377-78 multitape,379 multitrack,376-77 nondeterministic, 380 submachines, 368, 372 two-dimensional,379 undecidable problems, 422 Turing's World, 24, 367, 372, 375 Type 0 grammar. (see Unrestricted grammar) Type 0 language. (see Turingacceptable language) Type 1 grammar. (see Contextsensitive grammar) Type 1 language. (see Contextsensitive language) Type 2 grammar. (see Contextfree grammar) Type 2 language. (see Contextfree language) Index Type 3 grammar. (see Regular grammar) Type 3 language. (see Regular set) OU>:,296 Unambiguous context-freegrammar, 296 Unary operator, 147 Uncountable, 15,418 Undecidable problems: Pascal halting problem, 417-22 Turing machine halting problem, 422 word acceptance by a Turing machine, 423-24 " word generation in a type 0 grammar, 423 Union, 148-50,274,321,359, 399-400 Unique minimum-state machine, 102,225,236 Unit closure, 306 Unit production, 306 Universal quantifier, 4 UNIX, 138 Unrestricted grammar, 256 Unsolvable problem. (see Undecidable problems) Useful nonterminal, 302 Useful production, 302 Useless nonterminal, 302 Useless production, 302 Vending machine, 29, 54, 212 W>:,162 Well-defined function, 8,69, 96 Word. (see String) Write head, 210 \f>:,325 Yield in one step, 254, 262 Yield in k steps, 255, 262 ~>:,