Simple Is Not Easy Edison Barrios Forthcoming in Synthese DOI 10.1007/s11229-015-0843-9 Abstract I review and challenge the views on simplicity and its role in linguistics put forward by Ludlow (2011). In particular, I criticize the claim that simplicity-in the sense pertinent to science-is nothing more than ease of use or "user-friendliness", motivated by economy of (cognitive) labor. I argue that Ludlow's discussion fails to do justice to the diversity of factors that are relevant to simplicity considerations. This, in turn, leads to the neglect of crucial cases in which the rationale for simplification is unmistakably epistemic, as well as instances where simplicity is part of the content of substantive, empirical hypotheses. I illustrate these points with examples from the history of generative linguistics, such as: (a) the shaping influence exerted by simplicity, via its involvement in the notion of "linguistically significant generalization", (b) its methodological and substantive contribution to the goal of explanatory adequacy, and (c) its central role in the Minimalist Program's search "beyond explanatory adequacy". 1 Overview and introduction Simplicity is very often cited as a factor in theory choice in linguistics, as it is in many other sciences. This practice raises important questions regarding the nature, role and rationale of simplicity considerations. Ludlow (2011, ch. 7) has recently proposed specific answers to these and other related questions, in the context of a critical discussion of "best-theory" criteria in linguistics and in science in general. There Ludlow expresses skepticism about substantive views of simplicity in science, that is, approaches that construe simplicity as a property that can be legitimately ascribed to a science's subject matter. More generally, he finds fault with views according to which simplicity can be assessed in an objective, non-question-begging manner. Instead, Ludlow argues that "simplicity is in the eye of the theorist" and that its role in science is eminently practical and its value purely instrumental. Thus, he concludes that the rationale behind its pursuit is, in essence, the reduction of cognitive burden (151).1 In its defense, Ludlow advertises his view (which he sees as prefigured in Peirce's work, among others) as "the most reasonable way to think about simplicity" and claims that the "ease of use" notion of simplicity is in fact the one that "is at play in the more established sciences." (158) Moreover, Ludlow denies that there is any genuine notion of simplicity in science apart from the notion of being "simple to use" (159). Thus, simplicity considerations-or at least those that are legitimate-are exhausted by (what we could call) ergonomical matters. This certainly disqualifies aesthetical views of simplicity as well as a priori metaphysical views that portray it as truth-conducive because of some presumed feature of the world as a whole. However, it also implicitly rules out views that conceive of simplicity as constitutive of the very notion of systematization and theorization, and thus of the basic goals of rational inquiry, as well as views in which simplicity is hypothesized to be truth-conducive relative to a specific subject matter, due to local, empirical assumptions about the domain of phenomena in question. 1Page references are to Ludlow (2011) unless otherwise noted. 1 A small but significant qualification to these claims is that they are only meant to apply to those informal notions of simplicity that are appealed to in arguments for theory choic among competing frameworks (152). Ludlow distinguishes between these from technical, and usually formalized simplicity-complexity metrics that draw on the resources of a well-defined theoretical system-such as the the ones that, in the early days of the generative program, were part of the "evaluation procedure" for ranking grammars, and which will be discussed in section 4.2 below. He exempts the latter group from his criticisms, since,"within a particular well-defined theory it makes perfect sense to offer objective criteria for measuring the simplicity of the theoretical machinery" (153). These simplicity measures are objective but carry no pretensions of neutrality, and so are unsuited for intertheoretical comparison, whereas the more widely applicable, informal notions of simplicity typically involved in such comparisons are supposed to be vague and subjective. In any case, Ludlow assumes that the two kinds of notions are sufficiently distinct to merit completely different treatments. In this essay I offer an assessment and a critique of Ludlow's proposals on the nature of simplicity and the rationale behind its pursuit. My targets are two interrelated theses, corresponding to what I take to be the two main themes of Ludlow's discussion. The first one, in order of presentation, forms the basis of Ludlow's critique of "best-theory" arguments from simplicity, namely, that simplicity criteria are inherently subjective and non-neutral. This means that simplicity is not a property of the object of study, but is tied to the idiosyncrasies of the researcher, and in consequence, can legitimately vary among scientific communities. The second one-to which I devote more space-is that the rationale behind simplicity in science is (exclusively) the reduction of cognitive burden, and so that the only genuine notion of simplicity in a scientific context is that of being "simple for us to use". I argue that Ludlow's discussion misconstrues the role and significance of simplicity in science, and in generative linguistics, in particular. The source of this is a pervasive neglect of the diversity of factors involved in simplicity ascriptions, which is itself a direct result of trying to reduce simplicity to ease of use. For instance, against the first thesis, I will point towards examples of simplicity properties that play explanatory roles in scientific theories, in virtue of which they enjoy the same claim to objectivity as any other posited explanatory factor. Against the second, I will stress the fundamental epistemic and conceptual nature of several simplicity-related desiderata, which can be independent from, and sometimes even adverse to the "usability" of theories. I will also argue that simplicity is a constitutive part of the very enterprise of scientific theorization, not a bonus or a mere tie-breaker. Moreover, the same factors that resist Ludlow's treatment have proven pivotal in the development of generative linguistics, the very discipline whose workings Ludlow wants to elucidate. Thus, I will show that, throughout the history of generative linguistics, key theoretical developments have arisen, to a great extent, out of simplicity considerations, of a non-instrumental sort. Consequently, skepticism about the substantive or epistemic significance of simplicity must lead to a corresponding skepticism about many of the basic theoretical commitments of the program. In what follows, I will start by reviewing Ludlow's criticism of simplicity-based "best theory" arguments in linguistics. This step is necessary because a great part of Ludlow's views seem to be driven by his objections to this kind of case. Subsequently, I will present his views about the nature and role of simplicity in science. The remainder of the paper will consist of a critical examination of these proposals, starting with a few preliminary remarks, as well as a series of critical distinctions that must be kept in mind in discussing the nature and use of simplicity (section 3). Next, (section 4), I will provide illustrations of the centrality of simplicity in the development of generative linguistics, through its role in the formulation and pursuit of adequacy criteria for theories. I will then proceed to examine three of Ludlow's main claims (section 5), after which I will concentrate on his "user-friendliness" proposal about the nature of simplicity, emphasizing the challenges it faces in accommodating some of the most important roles played by simplicity in scientific decision-making. 2 Ludlow's Targets and Proposals The critical part of Ludlow's discussion has two main targets. One of them is a certain "common wisdom" about simplicity in science, constituted by views concerning its nature, its function, and its rationale. Among these are portrayals of simplicity as an aesthetical attribute or as an indicator of truth, probability or some other like notion. The other target is comprised by arguments that enlist simplicity as a reason for preferring one theory over its rivals, and which typically adhere to this pattern: (i) theory T is empirically equivalent to its competitors, but (ii) T has less theoretical machinery than them, therefore,(iii) there are strong 2 prima facie reasons for favoring T , or at least for placing the burden of proof squarely on its opponents' shoulders. 2.1 "Best theory" arguments The main example that Ludlow discusses is an argument by Postal (1972) in favor of an alternative to the "Extended Standard Model" (EST) of generative-transformational grammar (see Chomsky, 1965, 1970). Postal referred to his proposal as "The Best Theory", though the "official" name he gave it was "Homogenous I". The argument is premised on an apparently unobjectionable methodological maxim, concerning "a priori logical and conceptual properties" of theories. The maxim says that: "[w]ith everything held constant, one must always pick as the preferable theory that proposal which is most restricted conceptually and most constrained in the theoretical machinery it offers [...]" (Postal, 1972, p. 153). The second premise is the claim that Homogenous I is more "conceptually restricted" than EST, because it eliminates one of the levels of representation contained in EST architecture (namely, "Deep Structure"), and also posits only one kind of rule, thereby dispensing with a whole class of rules for mapping semantic representations onto syntactic ones (i.e. deep structure). The conclusion-which Postal characterizes as based on "a priori", "methodological" grounds-is that this theory should be preferred, and that "it should be abandoned, if at all, only under the strongest pressures of empirical disconfirmation". Because of this, Postal's theory is supposed to have "[. . . ] a rather special logical position vis-á-vis its possible competitors within the generative framework, a position that makes the choice of this theory obligatory in the absence of direct empirical disconfirmation" (153). Ludlow correctly notes that, although the methodological maxim sounds reasonable in the abstract, its application to concrete cases is far from straightforward. Any such application requires a criterion of simplicity and a method for measuring it, which in its turn involves a series of crucial decisions and assumptions at a theoretical level, concerning the identification, individuation and ponderation of "machinery" components. However, Ludlow doubts that such decisions can be based on neutral grounds. First, he argues that there is no standard, neutral way of identifying theoretical machinery, since "machinery can be defined any way you choose" (156). Second, there is no neutral way of counting theoretical components, since in any particular instance it may be unclear whether we are in the presence of one or more of them. Third, there is no neutral way of weighing components so as to measure the increase in machinery incurred by different proposals. Moreover, for any metric that ranks a given theory as simpler than a rival, it is possible to come up with an alternative metric that inverts the ranking. Ludlow concludes, then, that there is no fact of the matter as to which strategy leads to a more parsimonious theory, and thus, that "[t]here is no answer to the question of which has the most machinery" (156). In short, simplicity comparisons are crucially dependent on stipulation, and thus have a significant, and ineliminable, amount of arbitrariness. Therefore, Ludlow says, any argument predicated on such grounds must be inconclusive. 2.2 Subjectivity and ease of use Should we then renounce all hopes for an objective notion of simplicity? Ludlow says 'yes', and goes on to claim that "simplicity is not a genuine property of the object under investigation [. . . ] but is a property that is entirely relative to the investigator" and can vary through time and across scientific communities (153, emphasis mine). In consequence, he objects to viewing simplicity as an objective feature of reality or as a truth-conducive property of theories, since we lack satisfactory reasons for thinking that "reality is simple, or eschews machinery, etc."(159).2 Instead, he argues that "[s]implicity is in the eye of the theorist", where by "theorist" he means, not an individual researcher, but a whole scientific community (161). In addition, Ludlow has little sympathy for vindications of simplicity in terms of beauty or elegance, since aesthetical judgment is a matter of taste and "there is no accounting for aesthetic preferences" (154). Ludlow's own view is that the justification of simplicity in scientific inquiry is eminently practical, and its role instrumental: we want simpler theories in order to ease the (cognitive) burden involved in scientific research. Thus, Ludlow proposes that simplicity is the property of being "simple to use and understand" (152), a property that depends exclusively on what investigators find "perspicuous and user friendly"(153). 2Other global justifications of simplicity do not associate it with truth but other desirable properties, such as likelihood (Wrinch and Jeffreys, 1921), or empirical content and falsifiability (Popper, 1959), or systematization (Goodman, 1943; 1951; 1958), for instance. 3 Moreover, in his view "we cannot suppose that there is a genuine notion of simplicity apart from the notion of 'simple for us to use' " (159). According to Ludlow, this position is preferable to its competitors because it affords us the clearest way of making sense of simplicity and also reflects "a notion that is at play in the most established sciences"(158). In this last respect, Ludlow recognizes the relevance of episodes of scientific change in which- according to traditional historical accounts, including the scientists' own testimonies-simplicity considerations of a metaphysical, epistemological or aesthetic kind seem to have played a key role, but claims that these episodes can be seen, on closer inspection, to have been motivated by practical concerns such as (e.g.) the simplification of calculations or the streamlining of notation. In presenting his position, Ludlow approvingly cites C.S. Peirce, who wrote that "the simplest hypotheses are those of which the consequences are more readily deduced and compared with observation" and that "the best theory is that [which] allows for the simplest calculations and the greatest ease of use" (quoted in Ludlow, 158; emphasis mine). In this same spirit, Ludlow characterizes simple theories as ones that are "perspicuous enough to know what is predicted", or that provide a notation that is "easy for us to use"(160), or which allow us "to simplify our calculations and theorizing"(162, emphasis mine). He also mentions, as a precursor of his view, the "economy notion of simplicity", proposed by Lindsay (1937), according to which the simplest theories are those whose manipulation "leads in minimum time to successful [. . . ] prediction" (p. 166, emphasis mine). Thus, in the view proposed by Ludlow, the question 'which theory is simpler?' is answered by determining which theory is easier to use, where the relevant considerations relate to notational perspicuity, computational expedience, learning difficulty, etc. Moreover, the answer must be relativized to users. In short: simplicity is a community-relative dimension of appraisal of scientific tools and products (including conceptual ones), whose value is purely instrumental, and whose import concerns operativemethodological issues, rather than substantive ones. Interestingly, the claims endorsed by Ludlow in the last section are strongly reminiscent of those historically put forward by instrumentalist authors, such as Ernst Mach (even though Ludlow explicitly wants to dissociate himself from that kind of view). I will devote section 4 to a discussion of the role played by simplicity in generative research throughout the discipline's history, and section 5 to the discussion of Ludlow's two main theses mentioned above (on section 1); but first-on section 3-I will offer some remarks concerning the way I will approach discussions about simplicity. 3 Varieties of Simplicity 3.1 Basic attributes Here I will propose a series of distinctions to which our discussion of simplicity should be sensitive.3 Let us start with a claim of the sort "x is simple". In a scientific context, this claim prompts further questions, such as the following: 1. What sort of thing is this x that is being evaluated for its simplicity? 2. What kind of property is the simplicity that is being attributed to x? Now, suppose that x-that is, what is being directly evaluated for simplicity-is a theory, or a hypothesis. Then we could also pose the following questions: 3. Is the attribution claim intended to say something about the reality that x is about? Or is the focus on x's simplicity, qua structure or system, regardless of its content? 4. If degree of simplicity of x is being used as a reason for some theoretical choice regarding x, then, what desiderata are being served by that choice? 3There is no dearth of distinctions regarding simplicity in the literature, so I will only introduce those distinctions that are necessary for the purposes of this paper. My classification scheme has some commonalities with distinctions proposed in the literature (such as Rudner (1961); Bunge (1961, 1962); Hesse (1967); Baker (2011); Schulz (2012), among others), but given the different purposes and contexts, it is clearly different from them. 4 I will use these questions as a guide to our discussion. The first question has to do with the bearers of simplicity, that is, the objects to which simplicity is directly attributed.4 These include: a) objects that constitute the scientist's vehicles of theorization (i.e. description, prediction and explanation)-or simply 'vehicles'-including: i) notational schemes and particular formulations of theoretical contents, and ii) theoretical/conceptual elements (theories, hypotheses, laws, postulates, concepts, etc.) b) the domain of phenomena that constitute the subject matter of theories, that is, natural objects, systems, properties or processes (as opposed to scientific products or artifacts involved in such study) including hypothetical posits as well as observable phenomena. When simplicity properties are directly attributed to entities in (b) what is at issue is substantive or subject matter simplicity. To use a (literally) toy example, this is the sense at issue when we judge that a column consistent of a single Lego piece is simpler than a column built out of two pieces, or when we have two columns composed of the same number of Lego pieces, but the first contains more piece types, in which case the second would likely count as simpler than the first. Likewise, we say that a motorcycle is more complex than a skateboard, because it contains more parts, and a homogenous pile of sand would be typically judged less complex than one which has the same number of parts, but is more heterogeneous (such as a mix of sand, pebbles and shells). We can also cite less trivial examples. For instance, all the vertebrae in the vertebral column of a fish, from one end to the other, are similar to each other. However, in the mammalian column there are five different types (cervical, thoracic, lumbar, sacral and caudal). So, in this dimension one can say that the column of a fish is simpler (i.e. less complex) than the mammalian column (McShea and Brandon, 2010). Historically significant examples of substantive simplicity are not hard to find either in science or in philosophy5 and of course, attributions of subject matter simplicity are also commonplace in contemporary science. In contemporary evolutionary biology, for instance, some researchers have treated the simplicity/complexity axis as an objective, substantive dimension. Thus, McShea and Brandon (2010) have argued that increase in complexity (understood as the "number of part types or degree of differentiation among parts") has the status of a "first law" of biology, analogous to Newton's First Law in physics. In the study of cognition, phenomena of perceptual organization, such as Gestalt laws, have also been interpreted as revealing a preference for simplicity. This can be formulated as a "Simplicity Principle", which says that "from among all possible interpretations of a stimulus, the visual system selects the one defined by a minimum number of parameters" (Van der Helm, 2014, ch. 2). Indeed, a bias towards simplicity has been hypothesized to be a widespread feature of cognition, whenever a task involves finding patterns in stimuli (Chater and Vitányi, 2003). On the other hand, the bearer of simplicity may fall in category (a), that is, "vehicles of theorization". Here, then, we are dealing with vehicle simplicity. At this point (and leaving question 2 for later) we can ask question 3 from our previous list. If the focus is the vehicle per se-such as a notation scheme or a formulation of a theory-rather than its represented contents, then we have a case of (mere) formal simplicity ("mere" because it's not necessarily reflected in any corresponding subject matter simplicity). We can also think of this as a special case of subject matter simplicity, when the vehicle itself is taken to be the subject matter of the discussion. Thus, suppose that we claim that a given coordinate system simplifies (e.g. by making it more succinct) the statement of a geometrical function, or that a given formulation of classical mechanics is simpler than its alternatives with respect to its application to a particular problem. These claims are distinctively concerned with the features of the notations or formulations themselves, where such features do not necessarily correspond to 4The range of entities for which simplicity may be a pertinent evaluation criterion is quite wide, and includes, inter alia, "theories"-and the kinds of things that go under this label are also diverse, ranging from very general conceptions about a subject matter to specialized systems of equations and computational models-as well as particular formulations of theoretical ideas, hypotheses, computational methods, data analysis techniques, notational systems, measurement systems, data-recording instruments and all sorts of artifacts associated with scientific research, regardless of their ontological status and epistemic role. 5Plato, Descartes, Leibniz, Locke and Kant provide multiple instances, too many to cite here. A casual look at the OED also reveals examples of this use among botanists and physicists, such as the following: "Simple Leaf . . . is that which is not divided to the middle in several Parts, each resembling a Leaf it self, as in a Dock" (1793. N. Bailey Universal Etymol. Eng. Dict. II ) or this 'Simple Stem, one that is undivided ; or, only sending out small branches," (1796 W. Withering Arrangem. Brit. Plants (ed. 3) I. 82). We can also include this one, from Newton: 'The Light whose Rays are all alike Refrangible, I call Simple, Homogeneal and Similar." (1704 I. Newton Opticks i. i. 3). 5 "content" complexity: a circle will continue being a type of ellipse, regardless of notational choice, and the same relations between force, mass, and motion will be expressed by different formulations. Alternatively, our interest in the simplicity of a vehicle, such as a theory, can be driven by our interest in what the theory says about the world. In this case our ultimate focus is not on the simplicity of the theory itself, but on the kind and degree of simplicity it portrays its domain as having. This is what we can call attributed simplicity. Again, this can be conceived as the simplicity of a certain subject matter according to theory T , or alternatively, the simplicity a domain of phenomena would have if T were true. This is a case that combines features of the two previous cases, in that the immediate bearer of simplicity is the vehicle, but the ultimate import of such simplicity is substantive. For example, in phylogenetic systematics-the study of evolutionary relations among groups of living beings-the number of required evolutionary events is an important aspect of a hypothesis, expressed as a phylogenetic tree, in that, given two proposals that account for the data (i.e. the distribution of characters across groups) equally well, the one that postulates the smaller number of events tends to be preferred (Sober, 1988; Haber, 2008).6 Another distinction that is often made, especially in discussions of generative linguistics, is that between so-called "theory-internal" and "theory-external" criteria of simplicity. Typically, theoryinternal metrics are based on technical, explicitly defined notions that intuitively capture some aspect of some simplicity property (such as compactness, description size, number of free parameters, etc.) They are usually framed in a canonical notation that permits the unambiguous individuation and counting of theoretical components, as well as a unique method for ranking hypotheses. Thus, they allow for objective, quantitative comparisons along well-defined dimensions. Such measures, however, are only defined for hypotheses expressed in a particular formalism or notational system, or, at the very least, only apply to proposals that embody certain nontrivial theoretical assumptions. In consequence they cannot be used to compare different frameworks. A clear example is the simplicity criterion used for ranking rule systems that was proposed by Chomsky in the 1950's, which will be discussed in section 4.2. As we'll see, this criterion is a formal, notation-dependent one, where 'formal' alludes both to the fact that the direct bearer of simplicity is a vehicle, and to the criterion's dependence on a formal notational system.7 Theory-external notions of simplicity, on the other hand, tend to be informal and format-independent. These comprise the notions of simplicity that are overtly or tacitly at work on theory choice in general, along with other usual desiderata, such as predictive accuracy, explanatory power and heuristic potential, among others.8 Now let's address question 2. Up until now we have only considered cases in which a property called 'simplicity' is being predicated of a given object. However, we must consider the different "kinds" of simplicity, or, rather, the different yet related properties to which that label is attached. I will refer to these properties as "simplicity properties" (or "S-properties", for short). These include parsimony and uniformity, among others.9 Uniformity obtains whenever it is possible to view a range of apparently heterogenous phenomena as being fundamentally of the same kind. This is usually the case when the phenomena across the domain tend to partake in a relatively small set of relevant properties, which makes them susceptible to the same (or similar) kind of theoretical treatment. Uniformity is typically mentioned in the context of attributed simplicity. For example, Newton's universal gravitation portrayed its domain as highly uniform, by bringing together ostensively disjoint categories of phenomena-the orbits of the planets, the trajectories of comets, the tides, the precession of the equinoxes, the motions of pendulums, and others-as instances of the same general (type of) phenomenon. Another important property is parsimony, usually discussed in the guise of Ockham's Razor, the injunction not to postulate more entities than is necessary to explain the phenomena at hand. 6This is not to say that there aren't alternative measures of parsimony of phylogenetic hypotheses-such as the number of parameters in a model-which, moreover, may yield incompatible results. Different measures may be appropriate for different purposes, or may incorporate different assumptions, but in any case they all aim to implement some version of Ockham's Razor. 7Nevertheless, the notions of formality and "theory-internality" and not identical. Simplicity criteria may be theoryinternal to the degree that their applicability or relevance depends on certain theoretical assumptions, which may or may not be directly expressed in terms of formal characteristics or notational features. 8It should be pointed out that the external-internal dimension must be to some degree relativized to particular theoretical contexts: "theory-external" does not necessarily mean external to or independent from any theory, but rather "external to the theory(ies) at issue". 9One can also find different aspects simplicity discussed under the various headings of (e.g.) "cohesiveness", "elegance","economy", "optimality" and "symmetry" (which are not always used everywhere in the same sense). Determining exactly what the members of the list are and how they are related to each other are interesting issues, but I won't deal with them here. 6 Again, under this characterization, matters of parsimony-construed as a measure of explanatory economy-arise in the context of attributed simplicity, since we are talking about the relation between a theoretical proposal and a domain of phenomena to be explained. The degree of parsimony of a theory or notation can depend on a variety of factors: it might be expressed in terms of the number of causes or components posited by a theory, or it might be, in the case of a formalized theory, expressed in terms of the number of axioms or primitives. We have already mentioned examples of parsimony attributions, in our discussion of the criteria for choosing phylogenetic trees. In sum, given the differences highlighted by the previous discussion, there is no a priori reason to expect that we will find a single, universal characterization of simplicity. This is not only applicable to specific notions, but also to very general ones, from "being conceptually restricted" and "having less machinery" to "easy to use" and "'easy to understand". 3.2 Simplicity and Theory Choice: Patterns Involving Simplicity Nevertheless, by focusing on the different kinds of simplicity properties, we can identify certain recurring types or patterns of reasoning. These types capture crucial features of paradigmatic applications of simplicity properties, and provide the kind of answer we might be looking for in question 4 above, that is, about how the simplicity of a theory-typically of the attributive kind-is used in decision making We can think of these patterns as types of theoretical developments in which an S-property plays a central role. A characterization of each of these patterns must mention the simplicity property S, as well as the states of affairs that accompany the attainment of S, the kinds of contexts in which S is attainable and the reasons why S would be desirable in such contexts. The latter constitutes the generic rationale for the pursuit of S. It must make reference to a set of standard theoretical desiderata and indicate how the attainment of S leads to their satisfaction. The pattern is also characterized by the availability of certain kinds of simplicity claims, involving some particular kind of bearer and import, and it is in many cases associated with a programmatic role for S. Correspondingly, the pertinence of S in any particular scientific situation can be motivated by the match between the goals of the program at hand and the outcomes typically associated with S, and by the extent to which the situation constitutes a context in which S is feasible. The patterns are always imperfectly instantiated in actual episodes of scientific research: some cases may not show all of the characteristics of a particular pattern, and the patterns may overlap in practice. Nonetheless, they constitute useful idealizations, in that, by singling out clusters of features that distinctively co-occur with specific simplicity properties, they exhibit certain regularities that throw light on the role of appeals to simplicity. Also, the use of a particular pattern may characterize the practice and development of particular traditions, programs or theories, to the extent that its significant episodes of theoretical change contain key features that match those in the pattern. Furthermore, we can say that the patterns also characterize certain styles of theorization exhibited by different programs at different stages, in that the goals and moves associated with the pattern are also reflected in the program's leading research strategies. Of course, the application of the pattern- and, in consequence, the adoption of associated styles-comes with no guarantee of success, since the ultimate benefits (e.g. greater explanatory power) will only obtain provided the theoretical framework is sound, the empirical status of the theories involved is well-founded, and the theoretical situation is actually one in which the pattern is appropriate. Thus, there are delicate matters involved, which require sober judgment, and there is no safeguard against (e.g.) spurious uniformity or parsimony. Here I will only focus on two frequently exemplified patterns, the Uniformity-Unification Pattern (U-Pattern), and the Parsimony-Superfluity Pattern (P-Pattern). Each one is characterized (nonexclusively) by theoretical moves involving a specific (kind of) S-property, which lead to particular outcomes, and whose relevance and adequacy tend to receive a similar kind of justification. Before we delve into the discussion of the patterns, I would like to call attention to another aspect that is quite relevant to our topic: the general role or function that simplicity is called to play in scientific decision-making. This aspect depends on the relative centrality assigned to simplicity considerations, the stage at which they are invoked, and, in general, the degree to which they shape the (theoretical) outcomes of research. One kind of role that is frequently alluded to in the literature is typified by indifference scenarios. Here simplicity functions as a last court of appeal, with the duty of adjudicating among competing theories, once all substantive considerations have been exhausted and no reason has been found to 7 prefer one over the others. Given this hypothetical deadlock, it is allegedly necessary to invoke non-empirical methodological arbiters, such as simplicity (in the form of, say, Ockham's Razor). In this scenario simplicity plays a rather peripheral role, in that it enters the picture under special circumstances, and at a relatively advanced stage in the life of a theory or hypothesis, namely, when it has already been subjected to substantial amounts of empirical testing, conceptual evaluation and elaboration. Here the contribution of simplicity is reduced to arbitration at an advanced state, and its role in shaping the outcomes of research is relatively marginal. This situation is different from one in which simplicity plays a more central, directive role in the development of theories within a particular research program.10 In these cases simplicity is an important guiding factor in the formulation of hypotheses, in the determination of their content and testing priority, and in their acceptance, rejection or elaboration. In some instances simplicity is incorporated into the program's choice of explanatory targets, into the shape of its explanatory ideals and patterns, and in its implicit criteria of intelligibility, among other distinctive thematic elements. In this case, simplicity would play a programmatic role. 3.2.1 Uniformity-Unification Pattern At first phenomena of nature were roughly divided into classes, like heat, electricity, mechanics, [. . . ], etc. However, the aim is to see complete nature as different aspects of one set of phenomena. That is the problem [. . . ] today-to find the laws behind experiment to amalgamate these classes. Richard Feynman Six Easy Pieces The kind of simplicity involved in this pattern is the uniformity of phenomena in a given domain. This obtains when the phenomena in question (which may be quite diverse in appearance) tend to share certain theoretically significant attributes, and in consequence can be treated, for explanatory purposes, as equivalent or similar. The pursuit of this kind of simplicity is motivated whenever systematization, generality and explanatory depth are priorities, and its feasibility depends on the assumption that there are significant patterns and invariances to be found in the phenomena. But this, however, is nothing more than the assumption that the domain is susceptible to theorization. The outcomes that are associated with successful instances of this pattern are multiple and densely interconnected, and some can only be appreciated at certain levels and from certain perspectives. For instance, some outcomes have to do with properties of the phenomena themselves, whereas others are better characterized as involving properties of theories or relations among them. The pattern we are discussing can be called "uniformity-unification" (or 'U-pattern', for short) because of the central, intertwined roles these two elements play. For instance, theoretical unification can increase the uniformity ascribed to the phenomena in a given field. The process of unification, and its results, can be described either in terms of relations among "conceptual entities" (theories, laws, hypotheses, generalizations, concepts, posits, etc.) or in terms of phenomena.11 On the conceptual side, unification may consist in an episode of nomic subsumption, of a series of generalizations under a more general theory. The clearest examples of this kind of subsumption are those in which the generalizations are shown to be particular consequences of the more general law.12 Unification-from the perspective of phenomena-can be of the ontological and causal varieties. In the ontological case, apparently dissimilar kinds of phenomena are amalgamated into a unified 10For instance, simplicity may be a one of the factors that determine the viability of a hypothesis or how seriously it is to be considered (see Weinberg (1992), Quine (1966, ch. 24), Hesse (1974, ch. 11), Harman (1999)). 11I am using this term in a very general way to designate all sorts of entities, properties, processes or regularities comprising the subject matter of the field. 12Derivability is the case to which philosophers of science have devoted the most attention-especially in the context of intertheoretical reduction-though not necessarily the only one. 8 super-domain, or become assimilated as subtypes of more general categories. In the causal version, the uniformity derives from the fact that a range of different phenomena are caused or otherwise explained by a smaller, more basic, set of factors. As the number and variety of phenomena treated as equivalent increases, the uniformity or homogeneity in the domain also grows, as does the proportion of facts under general nomic coverage. This also entails an overall decrease in "bruteness", in that that previously unexplained, brute facts are shown to be nomically necessitated. This agrees with a conception in which one of the aims of explanation is "reducing the total number of independent phenomena that we have to accept as ultimate or given", because "[a] world with fewer independent phenomena is, other things equal, more comprehensible than one with more " Friedman (1974, p. 15, emphasis mine). See also Kitcher (1981). Likewise, this process can be accompanied by an expansion in the explanatory power and overall empirical support of the subsuming theory. Finally, a notable result of all these changes is that, as the domain of phenomena in a given field becomes more uniform, the level of systematization achievable in the field increases, as does the generality of its leading explanatory theories, while the availability of more fundamental principles and nomic interrelationships makes it possible to obtain a more illuminating account the field's domain. Among the cases (processes, episodes, outcomes) that instantiate this pattern are "textbook examples", such as Newton's universal gravitation, mentioned above, or Maxwell's unification of magnetism, electricity and light, through the re-conception of light as a kind of electromagnetic radiation, or Mendeleev's periodic law in chemistry. We can identify a style of theory construction and development, which we can call 'U -theorization', in which uniformity is a central explanatory ideal, and where the search for unification constitutes a research-guiding goal of the highest priority. The benefits of significant unification and uniformity may certainly include a reduction of cognitive burden (cf Jones 2008), but they go beyond that, as they involve concerns and outcomes that are irreducibly epistemic, that is, they are conducive to the fulfillment of an epistemic desideratum, such as explanatory depth, theoretical unification, or in general to a greater understanding of one's subject matter, without necessarily furthering aesthetic, practical or other kinds of desiderata. For instance, we saw how the drive towards uniformity and parsimony can rid the theory of excessive ad hocness, redundancy, stipulations and "brute facts", and place tighter constraints in theories, even when this leads to theories that are more abstract and more removed from observation, and less "friendly" when it comes to relate it to data (because the path from theory to data is more intricate, for instance) 3.2.2 Parsimony-Superfluity Pattern Another kind of path of theoretical development in which simplicity plays a key role is the search for parsimonious theories, in the context of what we can call the Parsimony-Superfluity Pattern (Ppattern). Some, but not necessarily all, applications of "Ockham's Razor" (or Principle of Parsimony) conform to this pattern, since the use of Ockham's Razor may have different kinds of justifications in different kinds of situations.13 However, the kind of rationale I will focus on here is based on the assumption that parsimonious theories are less likely to contain superfluous elements (see Barnes, 2000). Here, 'superfluity' means explanatory superfluity, where an element in a theory T is superfluous in this sense if it contributes nothing to T 's explanatory capacity. One illustrative type of application of this pattern-but not the only one-is in the investigation of causal processes. For example, suppose that the goal of research under a given theory T is to identify a set of significant causal factors behind a given phenomenon. So, if e is among the initial theoretical posits in the explanation of a phenomenon x, and the subtraction of e does not affect the theory's ability to explain x, then e is an idle component within T, with respect to x. In this case it is likely that e does not belong to the set of relevant causal factors. 13Ockham's Razor is usually stated as an injunction against multiplying entities beyond necessity. Some readings of the maxim emphasize its quantitive aspect (i.e. 'do not multiply entities'), whereas in other readings the focus on the injunction against going beyond what is strictly essential. Relatedly, Barnes (2000) recognizes two different principles associated with Ockham's Razor: the "anti-quantity" principle, and the "anti-superfluity principle", corresponding to the two readings just mentioned. The type of theoretical move that is relevant for our purposes is the one that appeals, explicitly or implicitly, to the second principle. Also, notice that freedom from superfluous posits is different from sheer paucity of posits-even if the respective injunctions to eliminate superfluity and to reduce the number of posits often have identical implications. Thus, consider a theory A, containing n basic posits, all of which are essential, and another theory B with m ∠ n posits, some of which turn out not to be essential. There is one important sense then, in which A is more parsimonious than B. (See Barnes, 2000, 354ff). 9 At least under some views about the relation between explanation and confirmation (e.g. those that may be appealed to in a defense of "inference to the best explanation")14 theoretical posits are thought to derive empirical support from the phenomena in whose explanation they participate. Going back to our example above, there is no reason to think that e-and the truth of the hypotheses in which it plays a central part-derives any empirical support from the theory T 's ability to explain x. A generalization that identifies e as a causal factor will be inaccurate, inasmuch as e is erroneously portrayed as making a difference in the causal history of x. In cases like these, considerations of parsimony are frequently expressed in terms of hypotheses or theoretical proposals (number of free parameters, predicates, etc.), but those considerations quite often have substantive import, given the ontological commitments associated with the hypotheses. Determining superfluity is no trivial matter, of course, nor is it always straightforward to determine the contribution of each of any of a theory's components15 Nevertheless, and no matter how difficult its application may be for intertheory comparisons, parsimony has an important programmatic role to play. In this role, the pursuit of parsimony helps shape theories, as reflected in certain policies or strategies of theory development, such as those in which the elimination of unnecessary components is a priority. If an element is superfluous, then it can be eliminated without loss of explanatory power or descriptive coverage. For instance, we can have cases in which two components of a theory overlap in some manner, such as accounting for a particular phenomenon. Then we have an undesired redundancy (unless we have very good evidence that the phenomenon in question does involve such redundancy in reality-as biological systems often do- or if we have no qualms about overdetermination). Sometimes the decision will depend on which component has more independent support, favoring posits that are needed to account for other phenomena. There are many episodes in the history of generative linguistics that illustrate this sort of reasoning. One of them arose out of the redundancy between the Phrase Structure component of the grammar (broadly speaking, the syntactic rules, used to generate sentences, which were of of the form A → B C, read as "rewrite A as B followed by C") and the lexicon (see Lasnik and Uriagereka (1988)). It is common for linguists to distinguish between the lexicon and the grammar, as two different components of linguistic competence. The second one is taken to express the regular patterns in language, whereas the second contains the particular features of individual linguistic expressions. In the 60's, the base of the grammar was given by phrase structure rules, which were of the form 'rewrite A as B followed by C', or, alternatively, as 'A consists of B followed by C', where A, B, C stood for phrases or lexical items. The phrase structure component contained rules such as: (1) A verb phrase consists of a verb (2) A verb phrase consists of a verb followed by a noun phrase These were supposed to be at work in the generation of sentences such as, respectively: (3) Al sleeps (4) Al loves Beth The lexicon is composed of entries that contain information about specific lexical items (such as words), including how the item is pronounced, which grammatical category it belongs to (verb, noun, etc.) and what it means. Thus, the lexicon will contain a specification of verbs such as 'sleep' or 'love'. Such a specification will say that 'sleep' is an intransitive verb.16 Now, notice that such a specification will entail the availability of structures like the ones that (1) generates, including the one corresponding to (3). So it looks like the rule isn't adding anything new, and the same problem 14See Lipton (1991). 15Under which conditions can we said that an element is "superfluous"? First, a particular decision will depend on the relevant context, such as the purpose at hand. Suppose that the goal is to design a circuit. Then, given that the NAND operator is functionally complete (which is not true of the other truth functional connectives, except NOR) and that NAND gates are relatively easy or convenient to implement, then, in the Boolean (combinational) logic used to represent the circuit, all the other operators would be strictly speaking superfluous (you won't gain any processing speed or reliability, or computational power by including them.) Now, if we have a very good psychosemantic theory that were committed to negation, conjunction and operation all being "psychologically real" irreducible, basic, operations, then if our goal is to provide a faithful characterization of human semantic competence they would not be superfluous. So, from an electronics perspective the system would have unjustified redundancies, but from the psychological one those redundancies would be warranted. 16That is, a verb that does not take any objects, in contrast with 'love', that does take direct objects, as illustrated by (4) in the text. 10 extends to any phrase rule we would care to examine. We have, then, an unnecessary duplication, which in the absence of cogent reasons for countenancing such kind of redundancy in the theory of language, seems to offend against parsimony. So the most obvious approach would be to get rid of one of the redundant components. Should it be the lexicon or the phrase structure rules? To learn English, speakers minimally have to learn the lexical properties of the language's expressions, so it seems that we can't eliminate lexical entries. It follows that the lexicon is independently needed. Moreover, the phrase structure rules are unsuited for taking over the lexicon's job, since trying to capture all of this idiosyncrasy in the phrase structure rules would lead to a senseless multiplication of rather ad hoc rules. So, if we have to choose between discarding the lexicon and discarding the phrase structure rules, the second make a much more likely candidate.17 In sum, in this episode a redundancy was detected, one of the components was declared superfluous, and was, in consequence, undermined in its theoretical status. This elimination of redundancy is an aspect of the search for economy under the banner of parsimony, and has always been one of the main themes in the development of the generative program. More examples of this tendency will be offered later in this paper. 3.2.3 Two points and a reply Before we finish this section we must consider two important points, as well as a possible reply. The first point is that, as already mentioned, the patterns can overlap or succeed each other in different situations, and research in Generative Grammar-as we'll see below-provides examples of this. The second important point to note is that simplicity considerations, far from being "supraempirical" or merely accessory, are seamlessly integrated with substantive matters of explanation and prediction, to the extent that the border between the two kinds of consideration can be hard to draw. Thus, the search for uniformity is at the same time the search for generality, depth and systematization, whereas the pursuit of parsimony is also the pursuit of accuracy and explanatoriness.18 Now to a possible reply. Earlier on we remarked on the Machian flavor of Ludlow's proposals on simplicity. A follower of Mach may protest that the Principle of Economy can furnish a more down-to-earth account of the rationale for unification and uniformity. Thus, these properties may be advantageous because they afford us a more comprehensive summary of the relevant facts, and this allows us to expand the range of phenomena susceptible to description and prediction with a minimum of cognitive effort. Thus, a more unified theoretical apparatus and a more uniform view of a domain may lead to more efficient predictive devices (cf Mach, 1907; 1898; Jones, 2008, 493–4). It cannot be doubted that unification and uniformity often save precious cognitive resources. But that does not mean that the former are exclusively sought for the sake of economizing time, energy or resources, or even less that such economization alone can provide a legitimate justification. This is because the desirable outcomes of the U-pattern cannot be reduced to data compression and increased efficiency in information management, since they often result in a qualitative change in our picture of the world. For one thing, significant unification and uniformity tend to bring new information into the picture, which at the very least includes knowledge of hitherto unnoticed or unexplored commonalities and connections in the domain of phenomena, as well as novel perspectives and conceptual systems that differ significantly-sometimes to the point of incompatibility-from those previously available. Thus, they contribute additional information, not merely a more efficient way of deploying or accessing 17Especially since quite general regularities in phrase structure can already be captured by X-bar theory, which will be discussed in section 4.3 below. 18This point has often been made by Nelson Goodman. For instance: Nothing could be much more mistaken than the traditional idea that we first seek a true system and then, for the sake of elegance alone, seek a simple one. We are inevitably concerned with simplicity as soon as we are concerned with system at all [. . . ] Thus simplicity [. . . ] is not a consideration applicable after truth is determined but is one of the standards of validity that are applied in the effort to discover truth (Goodman, 1958, p.1064). 11 old information.19 But here the Machian could dig in his heels and protest that it may also be possible to conceive of the benefits provided by those additional conceptual elements in terms of facilitation of use and understanding or reduction of cognitive labor. I will explore this possibility in section 5.2.1 below, but I will first proceed to re-examine Ludlow's conclusions against the background provided by the discussion in this section. I will argue that Ludlow's claims about the subjectivity of simplicity considerations founder on the abundance of simplicity appeals of a clearly substantive, empirical nature. In addition, it will become evident that the significance of simplicity properties such as uniformity and parsimony (in some versions) cannot be accommodated within Ludlow's account, both because their instantiation may occasion a decrease in overall user-friendliness, and, more importantly, because they involve goals and outcomes of an irreducibly epistemic nature, that is, those pertaining essentially to matters of explanation and understanding, and cannot be adequately captured by considerations of userfriendliness (so long as the latter notion is used in any of its familiar senses). In what follows I will discuss the uses of simplicity in generative grammar, so as to provide a useful background for our discussion. Then, I will directly address some of Ludlow's theses. 4 Simplicity and Generative Grammar The motives for seeking economy in the basis of a system are much the same as the motives for constructing the system itself. Nelson Goodman, 'On the Simplicity of Ideas' 4.1 Simplicity and the goals of linguistic theory Simplicity has always played both a methodological and a substantive role in generative linguistics. Its methodological/heuristic role is reflected in the particular conception of scientific explanation and progress that underlies the generative program. This approach prizes economical theories, as well as general, unified accounts, which yield a uniform picture of the phenomena in the domain. This is the approach that we earlier called 'U-theorization', of which the generative transformational tradition is a clear example. I will first comment on the methodological significance of simplicity, and then briefly discuss its substantive repercussions. The reason why simplicity and explanation in generative linguistics have been intimately associated is that simplicity-under a variety of manifestations-has played a determining role in both the content and structure of linguistic explanation, by being a central component of the program's explanatory ideals and a fertile source of hypotheses. 19These kinds of changes are related to the ones mentioned by Whewell (1840/1984; 1858) in his discussion of the "colligation of facts", such as those leading to the consilience of inductions effected by Kepler's and Newton's laws. Thus, in colligation: Facts are not only brought together, but seen in a new point of view. A new mental element is superinduced; and a peculiar constitution and discipline of mind are requisite in order to make this Induction" (Whewell, 1858, 71). Furthermore, [T]he Facts that the planets revolve around the sun in certain periodic times and at certain distances, are included and connected in Kepler's Law, by means of such Conceptions as the squares of numbers, the cubes of distances, and the proportionality of these quantities. Again the existence of thus proportion in the motion of any two planets, forms a set of Facts which may all be combined by means of the Conception of a certain central accelerating force, as was proved by Newton. (Emphasis in the original Whewell, 1840/1984, 206). 12 In what follows I will briefly identify some of the changing conceptions and uses of simplicity properties in generative research, and how they relate to the explanatory agendas of its successive stages. I will also identify the commitments, regarding the relation between simplicity and explanation, that have remained constant throughout both substantive and methodological changes, and which have contributed to the thematic and stylistic identity of the discipline. My goal here is not to provide a general account of the field's development, but to trace the evolution of uses of simplicity in the pursuit of linguistically significant generalization, explanatory adequacy, and principled explanation, respectively. These last terms have a specific, technical meaning, in the context of generative linguistics, so a few words about so-called levels of adequacy are in order. The weakest requirement on grammars is observational adequacy (see Chomsky, 1964). This means that a grammar of a language L must be able to generate exactly the set of grammatical sentences of L, as they are represented in observed corpora (in this case, this could in principle be trivially achieved by listing the recorded sentences). This basic requirement can be strengthened, of course, by treating corpora as samples of languages-as populations of sentences-which poses the additional challenge of projecting to the right infinite set. In any case, the main constraint observational adequacy imposes on grammars is the avoidance of both overand under-generation. Observational adequacy provides the minimum requirement on grammars, but it is descriptive and explanatory adequacy that have posed more significant challenges for generative research. In his classic statement of the aims of linguistic theory, Chomsky (1965) proposed that a successful grammar or linguistic theory must be both descriptively and explanatorily adequate. Descriptive adequacy is met by a grammar that "correctly describes the intrinsic competence of the idealized competent speaker"(p.24). This is achieved when "[t]he structural descriptions assigned to sentences by the grammar, the distinctions that it makes between well-formed and deviant, and so on must correspond to the linguistic intuition of the native speaker (whether or not he may be immediately aware of this) in a substantial and significant degree of cases" (p.24).20 For instance, a theory that merely records the fact that sentences (5) and (6) below are part of the speaker's language could qualify as observationally adequate, but would fail to account for the intuitive connection between the sentences, and would thus be missing an important generalization. (5) Al loves Beth (6) Beth is loved by Al A descriptively adequate grammar, however, would accommodate this phenomenon by exhibiting the relationship between their underlying forms. Explanatory adequacy requires answering what Chomsky (1986) dubbed "Plato's Problem". This problem was posed by the contrast between: (a) the impoverished-even distorted-character of the evidence to which children are exposed in acquiring a language, and (b) the complexity and abstractness of the outcome of the acquisition process, as well as the regular schedule exhibited by that process. Plato's problem, then, consists in explaining how children manage to bridge the gap between the poverty of the available primary linguistic data and the richness and intricacy of mature grammatical competence.21 Once it was thought that the conceptual resources for a solution for the problem of explanatory adequacy were available, theorists began to set their sights beyond explanatory adequacy, seeking to provide a principled explanation for the basic features exhibited by human languages, by exploring the possible role of optimality and least-effort considerations in the design of the faculty of language. Throughout the history of the field, considerations of simplicity have arisen in two forms. One is an "imprecise but not vacuous" notion of simplicity that is shared by all branches of empirical inquiry. This is the notion that is at stake when we say that a given theory, of any topic, is more or less simple (Chomsky, 1995, p.8). The other is referred to as "theory-internal", where simplicity is regarded as a technical notion, defined only for a certain kind of formalism. The second form received much attention during the early stages of the generative program, since it was part of the "evaluation procedure" for grammars, as systems of rules (though of course, the first notion was also assumed to be important at the level of the general theory of linguistic form, as would be the case with any scientific theory). However, as the Principles and Parameters approach began to take hold, the theoryinternal notion started to lose relevance, since the study of language-specific rule systems-the things 20Correspondingly, a linguistic theory or theory of grammar is descriptively adequate if it makes "descriptively adequate grammars available for each language" (p.24). 21Chomsky (1976, 1980, 1986); Hornstein and Lightfoot (1981); Guasti (2004); Crain et al. (2005); Berwick et al. (2011). 13 the simplicity criterion was designed to evaluate-were no longer part of the agenda. Still, general, intuitive considerations of simplicity were a central concern during that period, since the replacement of rule systems with abstract principles was itself perceived as a simplification. Nevertheless, with the advent of the Minimalist program, what seem like theory-internal simplicity comparisons began to reappear-alongside the usual theory-external ones-only now in the guise of economy considerations on derivations, by barring "non-optimal" ones (again, in a theory-internal sense of "optimal"), or on representations (see Chomsky, 1995, pp.8-9). Now, although the theory-internal and theory-external strands of simplicity are clearly distinct, they are, at bottom, manifestations of the same kind of theoretical concerns. I will show that simplicity notions (whether internal or external, substantive or methodological) have not been relegated to the role of accessory considerations, to be taken into account after matters of description and explanation proper have been addressed, but are central to such matters. (cf. Goodman (1943, 1951, 1955, 1958); Chomsky (1951, 1975/1955)). Moreover, the pursuit of simplicity was not necessarily motivated by the reduction of effort. Thus, given the deep influence exerted by simplicity considerations on the content of linguistic proposals, skepticism about the legitimacy of those considerations leads to a corresponding skepticism about generative-transformational theories in general. So, in what follows I will discuss some of the deep interconnections among the notions of simplicity, description and explanation that have always played a prominent part in generative research. In particular, I will focus on these aspects: First, on how the theory-internal notion of simplicity at play in the 50's and 60's was inseparable from considerations of linguistically significant generalization, and was itself motivated and shaped by the same kinds of concerns that drove the search for "theory-external" simplicity. Thus, I will devote the first section to the relationship between simplicity, generality and systematization, as reflected in early works in which formal simplicity criteria are discussed, such as The Logical Structure of Linguistic Theory (LSLT) (Chomsky, 1975/1955), The Morphophonemics of Modern Hebrew (MMH)(Chomsky, 1951), Aspects of the Theory of Syntax (ATS), (Chomsky, 1965) and The Sound Pattern of English (SPE)(Chomsky and Halle, 1968). Second, I will focus on the way in which concerns for "external" simplicity were essential to the formulation and proposed solution of the problem of explanatory adequacy. Here I will discuss the role of simplicity considerations in the search for explanatory adequacy, as witnessed in the process of transition from the Standard Theory (Chomsky, 1965) to the "Government and Binding" version of the P&P approach (Chomsky, 1980, 1981, 1986). Third, I will discuss how the simplifying developments-both "external" and "internal"-that took place during the early phases of the Principles and Parameters approach are intimately tied to the kinds of questions pursued within the Minimalist Program: questions of principled explanation that take us "beyond explanatory adequacy" (Chomsky, 1995). 4.2 Simplicity and Linguistically Significant Generalizations In the early stages of generative grammar, research goals were articulated at two distinct yet interdependent levels. One level concerned the construction of grammars of individual languages (such as English).22 The other level concerned the development of a general theory of linguistic structure (Chomsky, 1975/1955, 63), which was to provide an abstract characterization of the notions of grammar and linguistic structure, by supplying the descriptive apparatus available for the characterization of linguistic structure-thereby determining the kinds of permissible grammars-and an "evaluation procedure", used to decide among grammars that were "equally consistent with the data" (e.g. a corpus)(Chomsky, 1975/1955, 63). A theory-internal notion called 'simplicity', was defined within this framework, and it referred to a formal property of grammars, qua rule systems. (See MMH, LSLT (ch. 4), and SPE (ch. 8), and also Halle, 1961). The simplicity of a grammar G was defined as inversely related to the number of symbols in the statement of G, when G was expressed in a minimal normal form. (LSLT, 123-4, SPE 335). This notion must be distinguished from the more general notions of simplicity that were used to judge the theory of linguistic form, which are the same that apply to any scientific theory. In order 22Taking a language L to be a set of finite strings ("grammatical sentences"), a grammar G is a set of rules that specifies the sentences of L, and assigns to each a structural description-that is, a full account of its elements and their organization. 14 to distinguish this theory-internal notion from other more general "external" notions, let us call it g-simplicity. The notion of g-simplicity, then, although well-defined, was quite narrow in scope, since it was only applicable to the grammars allowed by a particular theory of linguistic structure, and thus was not intended to be directly used in comparisons between different theoretical frameworks. Nevertheless, g-simplicity was clearly intended as a domestic, well-behaved analogue of the more general and less clearly defined notions of simplicity, such as uniformity and cohesiveness. Thus, gsimplicity was in part motivated by fundamentally epistemic reasons, since those simplicity properties of which it was an analogue were closely linked with the conception of what it meant to give a systematic account. Such motivation would be unintelligible from a viewpoint in which descriptive correctness and explanatoriness are orthogonal to simplicity, and in which the evaluation of the latter comes after the evaluation of the former (cf Harris, 1951, p. 9, n. 8.). Instead, we are dealing here with a conception of linguistic (and in general scientific) theorization in which simplicity is embedded in the descriptive and explanatory goals of theorization, as a constituent of, or requirement for, a satisfactory account of linguistic phenomena. This is particularly clear when we reflect on the role of g-simplicity-and its informal counterparts- in the explication and implementation of the notion of linguistically significant generalization (LSG) (see MMH, LSLT ch, 4). The ability to yield grammars with LSG's when confronted with data was itself a requirement on an adequate theory of linguistic structure and thus was part of the goals of linguistic theory. Indeed, simplicity was claimed to play a more central role within the general linguistic theory itself, with substantive consequences, since many of the theoretical notions defined within the theory, such as morpheme or linguistic categories (like Noun Phrase) were themselves said to be defined by reference to their role in the simplest grammar from a certain set of alternatives.23 Thus, the goal of arriving at the simplest grammar was not motivated or justified (primarily at least) by considerations of expediency or reduction of labor; instead, it was directly motivated by considerations of descriptive adequacy (see LSLT, 118).24 The link between simplicity and LSG was mediated by a series of nontrivial goals, assumptions and hypotheses. The first component, of a clearly epistemological sort, is the view that generality and systematicity-as opposed to ad hoc-ness and accidental generalizations-are essential elements in any descriptive and explanatory effort. The other components concern the way in which descriptive generality is reflected in rule systems, thus determining which kind of grammars the evaluation procedure will rank the highest. One of these assumptions is the following. Take two equally observationally adequate grammars G, G′ of the same language L. Suppose that G succeeds at displaying general invariances in the data-say, by describing patterns in the distribution of the formal elements of sentences in a corpus-whereas G′ fails to reflect such patterns, since it contains less general, more ad hoc rules. In this case, the rules of G, but not those of G′ will "say similar things about elements of various sorts"(Chomsky, 1975/1955, p. 26). In consequence, the rules of G will exhibit a higher degree of formal similarity among themselves than the rules of G′, in that there will be discernible structural parallelisms in the form of their statements. Having assumed an evidential link between the generality of grammars and their internal formal similarity (that is, cohesiveness, or homogeneity among its rules), we need an objective way of measuring this property, a procedure that will systematically associate degrees of similarity with numbers. The metric proposed in LSLT was based on the assumption that structurally homogeneous grammars can be expressed more compactly than its more heterogeneous counterparts. This is because, to the extent that the rules have elements and forms in common, such commonalities can be extracted ("factored out"), since they only have to be expressed once, and thus the redundant symbols can be eliminated. This yields a grammar with fewer, more general rules. At the same time, this formulation will contain fewer symbols than the original one.25 The result is that "grammars with a greater 23This idea is clearly put forward in early writings: "[. . . ] one of the considerations involved in setting up linguistic elements in a particular way, and consequently, in determining what are in fact the grammatical sentences, will be the total simplicity of the grammar in which these elements appear"(MMH, 3). "It seems reasonable [. . . ] to inquire into the possibility of defining linguistic notions in the general theory partly in terms of such properties of grammar as simplicity" (LSLT, 114). 24Thus, Chomsky points out that "[. . . ] it is important to recognize that we are not interested in reduction of the lengths of grammars for its own sake. Our aim is rather to permit just those reductions in length which reflect real simplicity, that is, which will turn simpler grammars (in some partially understood pre-systematic sense of this notion) into shorter grammars. (LSLT 118, emphasis mine.)" 25For reasons of convenience, I've been talking of grammars as if they were sets of rules. However, this is incorrect, 15 degree of similarity among rules become, literally, shorter than others which express the same mapping [. . . ]"(LSLT, 26-7, italics mine). This suggests, then, that the length of a grammar-once it has been stated in its minimal form-is correlated with its potential for consolidation or amalgamation (i.e. homogeneity or compactness). But grammar length can be expressed in terms of the number of symbols in a grammar's minimal form, thus suggesting a way in which the degree of formal similarity among rules in a system can be mapped, in an appropriate (order preserving) way, into the set of positive integers, thus providing a way of measuring that property.26 The overall idea was, then, that grammars that succeed at capturing generalizations (LSG's) are representable in a more economical way than those that fail in this respect.27 So, how does this idea affect the evaluation procedure, qua explication/implementation of the notion of linguistically significant generalization? The result was that the evaluation procedure depended on the hypothesis that the degree to which a grammar captured LSG's was inversely related to the length of its minimal statement. This hypothesis was itself based on a series of assumptions about the relationship between linguistic phenomena (their intelligibility, by virtue of basic, detectable, patterns of organization), the form exhibited by grammars that capture generalizations, and the notational conventions and transformations that allow the desired simplicity properties to be reflected in a quantitative measure-i.e. in which the distribution of grammar lengths is not an artifact of the notation or the operations on it).28 29 Thus, at different points we see that the rationale for a g-simplicity metric, in the context of an evaluation procedure, responds to the same concerns that motivate the search for simplicity in theorization in general. The views on the relations among description, explanation, generality and simplicity that motivate the pursuit of LSG's are the same views that underlie the search for uniformity and unity in any scientific account. It would be a mistake, then, to regard the LSG-criterion as a purely methodological tie-breaker, reflecting differences in elegance or notational compendiousness. Although it is clear that simplicity is here ascribed to specific formulations, and determined by formal considerations (e.g. counting symbols, etc.) its ultimate import goes beyond that. This is because certain formal features of rules are tied to certain properties of their subject matter, regarding the extent to which the language in question exhibits systematic patterns of organization. Moreover, the specific shape taken by the metric is in great part influenced by hypotheses about the best way to give formal expression to certain since the same set of rules but with different order of application can constitute different grammars. Thus, they are more accurately characterized as sequences of rules (see LSLT, 125 ff). 26"What lies behind a generalization or a regularity in a set of data is a pattern that permits a 'compression' in the description of that set, beyond a simple list of its members." (Berwick, 1985, 225) 27In Chomsky's own summary of the role and nature of simplicity in the context of evaluation procedures: The obvious means for selecting among grammars is in terms of the degree of significant generalization that they achieve. In the conventional sense of the term, a generalization is a single rule about many elements. Generalizing this notion, we might measure the degree of generalization attained by a grammar in terms of the formal similarity among its generative rules, the extent to which they say similar things about elements of various sorts. And this in its turn required: [...] a system of amalgamating similar rules, so that grammars with a greater degree of similarity among rules become, literally, shorter than others which express the same mapping from morphophonemic to phonetic representation. The system for amalgamating rules expresses a hypothesis as to the relations among rules that constitute linguistically significant generalizations. (LSLT, 26-7) 28"[...] the evaluation procedure proposed in MMH (similarly, ATS, SPE and other work), constitutes an empirical hypothesis with regard to the "essence of human language", specifically, with regard to the principles of organization that are taken to be fundamental in that the generalizations that express them are defined as "linguistically significant" and contribute to the selection of grammars"(LSLT, 28). 29"Given criteria of adequacy for grammars of certain languages we can arrive empirically at notations with the property that the grammars meeting the criteria of adequacy are in fact the shortest, given these notations. In other words, we define simplicity so that, in certain clear cases, the simplest grammars are in fact the correct ones" (LSLT 118). Furthermore: [...] this procedure [i.e. developing notations that relate theoretical adequacy to simplicity] is no stranger than attempting to define "morpheme" in such a way that what we know to be morphemes in some language turn out to be morphemes when we apply the theory to a corpus of utterances in this language. This can be realized in a nontrivial fashion if we can give a general and abstract definition of "simplicity" (just as of "morpheme") which in the case of particular languages leads to adequate grammars, and if the general theory of grammatical structure in which this definition appears meets certain considerations of significance that apply to any scientific theory" (118-9). . 16 pre-systematic, general simplicity properties. Thus, the process of rule consolidation is a miniature (local), well-behaved counterpart of the unification of particular generalizations by more abstract and general ones. Finally, the motivation for economy of representation as the target simplicity property is the assumption that such economy is aligned with (and is an indication of) simplicity considerations of descriptive and explanatory import. The notational transformations themselves are not merely motivated by a priori matters of mathematical elegance or technological issues regarding the relative usability of different notations. Rather, they depend on empirical considerations, such as the hypothesis that the chosen method is conducive to authentic linguistic generalizations. But it will do so only if human languages have certain characteristics that allow for a compressed statement when expressed in the notation, characteristics that they will only possess as a matter of contingent fact. Take the example of Chomsky's (1957; 1965) account of the English auxiliary system, described by the set of rules in (7)]30. (7) Aux → Tense Aux → Tense Modal Aux → Tense Perfect Aux → Tense Progressive Aux → Tense Modal Perfect Aux → Tense Modal Progressive Aux → Tense Perfect Progressive Aux → Tense Modal Perfect Progressive The choice of notational elements in the system used for stating grammars in their minimal normal form included: (a) connectives, such as braces, brackets, curly brackets and the like; (b) vocabulary items, such as NP, [+ vocalic], Infl, etc., and (c) a set of rules for expanding or compressing expressions built out of the previous elements. The notational conventions introduced in LSLT (pp. 120-4) allow us to reduce (7) to the compact form in (8). (8) Aux → Tense (Modal) (Perfect) (Progressive) In (8), the parentheses indicate optionality, and position in the sequence indicates the linear order of the elements in the rules. Notice that the set of rules in (7), which has 20 symbols, can be compressed by means of the notational conventions into (8), which has 4 symbols (in neither case are we taking Aux into account). The choice of notational system matters because it plays a role in what is judged simple or not simple, and because certain conventions allow us to express generalizations that we would otherwise be unable to capture (and, correspondingly, make it impossible for other kinds of generalizations to be expressed). Now consider the set in (9): (9) Aux → Tense Modal Perfect Progressive Aux → Modal Perfect Progressive Tense Aux → Perfect Progressive Tense Modal Aux → Progressive Tense Modal Perfect (9) cannot be compressed with the notation in use. Nevertheless this does not mean that it cannot be abbreviated at all, since it can be compressed into the form expressed in (10), provided we adopt a different set of notational conventions: (10) Aux → (Tense Modal Perfect Progressive) 30The arrow means 'rewrite as'. 17 For instance, we may use the parentheses to indicate sets of cyclical permutations, and such a notation would allow us to capture the generalization implicit in (9). This shows that the first notation is not the only formal possibility. From a purely mathematical point of view, the adoption of this notational convention would be unobjectionable, and so the choice of the first notational system is due to empirical reasons (See Berwick, 1985, 220 ff, and Lasnik, 2000, 41-2) The first kind of notational convention accords with the claim that there is a linguistically significant generalization in (7) but not in (9), and it commits the theorist to the hypothesis that regularities of the type exhibited in (7) can be found in natural languages, unlike those of the type to which (9) belongs. Thus, this use of the parenthesis notation means that: [. . . ] the difference between four and twenty symbols is a measure of the degree of linguistically significant generalization achieved in a language that has the forms given in list (16) [i.e. our (7) here], for the Auxiliary Phrase, as compared with a language that has, for example, the forms given in list (17) [i.e. (9) here] the representatives of this category (Chomsky, 1965, 43).31 Thus, if a child is exposed to the sentence 'yesterday John arrived' and to 'John arrived yesterday', he or she will not be tempted to conclude that 'arrived yesterday John' is also possible. (Chomsky, 1965, 43-4). In sum, the use of simplicity considerations in this era, when we restrict ourselves to the theory-internal notion, points at least to an epistemic dimension that is irreducible to considerations of ease of use.32 In LSLT and SS the import of the simplicity metric seemed to be mostly epistemic and methodological (but see Chomsky, 2012). However, with the advent of the early "cognitive turn" in linguistics, which crystallized in ATS, the evaluation procedure was to acquire a new dimension. Thus, whereas in Syntactic Structures and LSLT it is portrayed mostly as a theoretical and methodological tool-to be used by the linguist, in the course of the theory-building process-in ATS it was postulated that the evaluation procedure was itself (a description of) part of the innate endowment of the child, and thus that something like the simplicity metric discussed above was part of the actual process of language acquisition (though this was the result of elaborating on, and emphasizing, themes that were already alluded to in previous work-such as the analogy between the language learner and the linguist devising a grammar). 4.3 Simplicity and Explanatory Adequacy Throughout the next period simplicity played important roles in the pursuit of explanatory adequacy, and, in general, the field witnessed the increased prominence of the unification pattern, in the context of "theory-external" simplicity. As mentioned in section 4.1, the problem of "explanatory adequacy" in generative linguistics consists in accounting for the acquirability of human languages, given certain constraints set by the output of the acquisition, as well as by the acquisition process itself (i.e. its uniformity across individual speakers), and the noisy and impoverished circumstances in which such process takes place. The main problem at this stage was to find a way to achieve explanatory adequacy without sacrificing descriptive adequacy. The proposed solution to this problem, embodied in the Principles and Parameters program (see Chomsky, 1981, 1982; Chomsky and Lasnik, 1995), involved essential simplifications at various levels and in various respects, as we'll see below. 31The same point made in more cognitive terms: It is clear, then, that choice of notations and other conventions is not an arbitrary or "merely technical" matter, if length is to be taken as the measure of valuation for a grammar. The criteria and notation are not matter of a priori, topic neutral methodological strictures, but make certain assumptions about the nature of human languages (as opposed to languages in general): human languages are such that the statement of their grammars can be considerably shortened by this particular set of conventions (we could imagine languages that do not yield to this treatment). Thus, when particular notational devices are incorporated into a linguistic theory of the sort we are discussing, a certain empirical claim is made, implicitly, concerning natural languages. It is implied that a person learning a language will attempt to formulate generalizations that can easily be expressed (that is, with few symbols) in terms of the notations of this theory [. . . ] (Chomsky, 1965, 45). 32 Simplicity can occasionally be an indicator of descriptive adequacy, via the notion of LSG, particularly in cases in which competing grammars don't have the same level of descriptive adequacy. In these cases, descriptively adequate rule systems will tend to be more compact, since they capture certain patterns ("felt relations") that would otherwise look accidental or would have to be accommodated in an ad hoc fashion. 18 The hypothesis that determined the course of inquiry was that the gap between the data and the outcome of acquisition was bridged by the child's possession of an innate, species-specific endowment ("Universal Grammar", or UG) which guided and circumscribed the acquisition process. Thus, in the framework of the Standard Theory and its direct descendants the road to explanatory adequacy passed through the specification of a procedure by which the learner could select the right grammar on the basis of primary linguistic data (see Chomsky, 1965, 25-7). The earlier models portrayed UG as a kind of format or template that defined the notions of possible grammar and possible rule. This was a legacy from the approach described in the previous section, in which the theory of linguistic form, the ancestor of (the theory of) UG, provided a specification of the form of possible grammars, their elements, operations, etc. Under this conception, the goal for the child was to find the optimal instantiation of this format consistent with the available primary linguistic data. The tasks carried out by the child-or rather, by his or her Language Acquisition Device- involved: (a) putting forward possible grammars as hypotheses about the target language, and (b) ranking those grammars in order of some technical notion of simplicity. Notice that such notion was to be a "psychologized" descendant-a cognitive/biological interpretation-of the methodologicallyoriented one that we earlier on dubbed 'g-simplicity', which shows that simplicity had by this time acquired a clear substantive role in the theory. Once the ranking was in order, the child was to pick the highest-ranked grammar. Thus, acquisition involved a computational procedure for arriving at the simplest instantiation of the format that fit the linguistic data (see Chomsky, 2012, ch.13). This procedure involved taking into consideration the whole class of possible grammars compatible with the data. However, the vastness of such a class rendered the computational task intractable. In consequence, it wasn't clear how the child could possibly manage to implement the evaluation procedure. The most obvious kind of solution consisted in narrowing down the space of possible hypotheses, and this in fact became the immediate goal for much generative theorizing. (See the introduction in Chomsky (1975/1955), as well as Chomsky (1986, 2012)). However, in the context of the "format/template" view of UG, this strategy turned out to introduce further complications, brought about by efforts to make the format less permissive. This involved incorporating restrictions into the format itself, in the guise of additional specifications, so that the innate component became more and more elaborate and specific.33 But this was undesirable for at least two reasons. The first, of more immediate significance, was that the specificity in the format made it difficult to capture the variety observed across the world's various languages, and this was detrimental to descriptive adequacy. The second was that it seemed to offend against theoretical simplicity as a general scientific desideratum. This situation has also been described as evincing a prima facie "tension" between the goals of descriptive and explanatory adequacy (Chomsky (1986), Hornstein and Lightfoot (1981)), in the sense that these criteria of adequacy seem to impose conflicting demands on linguistic theory, which must describe a system that is (a) simple enough to be acquired by a normal child in a haphazard linguistic environment-that is, the structures postulated must be meager enough so that it is clear how language is learnable-and (b) capable of developing in ways that are as numerous and varied as are human languages, which means that descriptive resources must be rich enough to be able to account for the variety observed. As Fukui and Zushi put it in their introduction to Chomsky (2004): "descriptive adequacy points to complexities, but explanatory adequacy demands simplicity " (p. 7). The way out of this impasse involved the simplification of UG achieved by the Principles and Parameters (P & P) approach. This proposal was partly the result of the tendency, during much of the 70's and 80's, to factor common elements out of sets of language or construction-specific rules, so as to obtain principles of greater generality. These principles were more abstract than the rules they replaced, as well as more concise, and fewer in number. This search for greater generality also offers very clear illustrations of the search for parsimony (frequently referred to as "economy") as a key engine driving theoretical change. One of these examples is provided by the development of X-bar theory, which was one of the factors that precipitated the elimination of phrase structure rules. Thus, the phrase-structure rules in Aspects and earlier allowed for independent series of phrase rules:34 33Though a crucial goal of this strategy was a reduction in elaborateness and specificity of the transformations themselves (see Chomsky, 1973). 34'N', 'V' , 'A', 'P', 'S' can be read as 'noun', 'verb', 'adjective', 'preposition' and 'sentence', respectively. 'NP' is 'noun phrase', and the same principle is applied to 'VP', 'AP', and 'PP'. Again, the arrow is to be read as 'rewrite as', or in the opposite direction as 'is a'. 19 NP → N NP → N PP NP → N S etc. VP → V VP → V PP VP → V NP VP → V S etc. AP → A AP → A NP etc. In hindsight, these systems seem comparatively ad hoc, since they account for the phenomena in piecemeal fashion, by having each rule more or less closely delineate attested structures. The worry at issue was not one lack of empirical coverage, but of lack of system. To begin with, the system is too unrestricted, as it fails to rule out all sorts of structures that are not present in English, such as the following: (11) VP → NP S (12) VP → S V This shows that we need a general account of phrase structure. In addition, there are some facts that the formulation misses, but which can be gleaned from the rules. There are striking parallels among the rules in terms of their internal structure, which suggest the existence of a more abstract generalization. For instance, all types of phrases are associated with a sort of nuclear lexical element, and, correspondingly, each of these elements seems to "project" to its particular kind of phrase. This nuclear element is what is called the "head" of the phrase. Thus, one generalization we are missing is the following: every phrase has a head, and every head has a maximal projection (the phrase it projects). This constitutes the "endocentricity" constraint on phrases (Chomsky, 1970). We can also distinguish other structural positions shared among the rules: for instance, each head can be followed (in English) by a complement (comp), typically a phrase of another category.35 However, in the phrase structure rule formulation, there is no principled way of ruling out headless phrases (such as example (11) above). Moreover, the fact that English verbs, prepositions, nouns and adjectives all tend to precede their complements seems to be a mere coincidence, and there are no principled reasons for ruling out sentences in which this doesn't happen (such as (12)), since what we have is a series of rules tailored to a particular subset of the phenomena. The proposed solution was in some way dictated by the form of the problem: factor structural positions and regularities out of particular rule statements. In this way, we eliminate our panoply of rules in favor of abstract general schema, called "X-bar theory": XP→ X′ X′ → X0 COMP Where X is a variable ranging over categories, XP is a maximal projection, X0 is the projection's head and X′ is an intermediate projection.36 Moreover, in taking the X-bar schema to be a part of the learner's initial endowment we are committing ourselves to theses about the nature of language per se. Among other things we predict that there will be no headless phrase (in any language). We also gain a systematic way of characterizing some of the differences between languages: for instance, in English the heads (whether nouns or verbs or any other) tend to precede their complements, whereas in Japanese exactly the opposite obtains. Here, characteristically, the goal was to increase systematicity and generality, and reduce ad hocness. Missing a generalization is missing on a more uniform conception of the domain of phenomena, and on a more economical set of basic principles. Thus, in general, this period saw the replacement of language-specific rule systems and constraints with the postulation of general, unifying, principles and parameters (more on this below) of the language faculty, as well as the replacement of characterizations of phenomena-such as constructions, 35Another kind of position, not illustrated by the structures above, is what is called specifier, or spec. 36Actually, the X-bar schema is a little more elaborate (X′ includes positions for adjuncts, and allows for recursion): XP → SPEC X′ X′ → X′ X′ → X0 COMP Chomsky (1970) and Jackendoff (1977) are the loci classici for X-bar theory. 20 for instance-in terms of ad hoc rules, with "modular" explanations, that is, in terms of the interactions among a few components (modules) comprising UG (Universal Grammar). This led to the "Principles and Parameters" (P & P) approach to the architecture of the language factuality. In this approach UG contains: (a) a series of principles which characterize human language, and (b) a series of parameters consisting of binary (on/off) options, whose interaction gives rise to the attested linguistic variety. Thus, the work of the more profligate and ad hoc rule systems was now done by those principles and constraints, in tandem with a modular organization of grammar. This brought about a reduction of the redundancy and adhocness that rule systems had gradually accumulated. This "thinning out" took place in two respects: on one side, much of the empirical burden borne by stipulated, grammar-specific constraints was loaded off into the innate component, in the form of general principles of UG. In another respect, there began a process of simplification of the innate component itself, which combined with the effort of finding a principled explanation of the structure of UG. This eventually led to a higher-level pruning of the innate endowment, in which much of the work which was supposed to be done by language-specific principles was taken out of the language faculty and reassigned to a set of extra-linguistic factors (see Chomsky, 2012, 2005). The P & P approach addressed both of the problems associated with the template view. In the first place, it simplified the architecture of UG. The principles of UG were conceived as highly abstract and general, as well as Spartan in their formulation, yet far-reaching in terms of their influence.37 Similar considerations applied to parameters, which were furthermore hypothesized as binary. An example of this is the so-called head-directionality parameter, which regulated the relative positions of the HEAD and COMP positions in particular languages, and could be set in either of two values: headinitial-as in English-and head-final-as in Japanese (see Baker, 2001, ch. 2). Second, it provided a way of accommodating the conflicting demands of explanatory and descriptive adequacy, thanks to the greater flexibility associated with the simplified UG, and the descriptive and explanatory powers resulting from the interaction of a few basic components. These developments were to a great extent made possible by the postulation of unifying (and thus simplifying) grammatical principles that connect apparently disconnected facts, such as the structural relation c-command, which ties together a multiplicity of phenomena, such as variable binding, scope interactions among operators, licensing of polarity items, and others. Other notable unifying posits included the principles of Binding Theory, such as Principle C-which plays important roles in syntax and semantics across a variety of languages-and some of the most prominent of the parameters that were proposed, such as the pro-drop (or null subject), head and subjacency parameters, among many other similar posits. Furthermore, acquirability, the key concern during this period, also pointed in the direction of unifying principles, since, without the aid of such principles, children would be forced to learn the target grammar piecemeal (Crain and Pietroski, 2006). This also illustrates how generative research fits within the Uniformity-Unification pattern of theorization. Earlier on we introduced the notion of U -theorization, as a style of research that is driven by the ideals of simplicity, in particular in the form of unification, uniformity and coherence, whose role is manifest at each stage of research. In this conception, then, S-properties are tightly woven into the fabric of the aims, standards and methods of scientific inquiry, as progress and explanatory power are sought through greater systematicity, depth and generality.38 These are also the ideals to which generative linguists aspire, and such aspirations are reflected in the practice of the discipline, in terms of its most prominent desiderata and the strategies employed to achieve them. Thus, the initial task of extracting structural invariances from various linguistic phenomena, with an eye to formulating accurate generalizations, is in itself a task of simplification, since it aims at a more uniform and patterned conception of the domain. If successful, this in its turn 37Also, parameters are small in number, whereas the number of rules is open-ended, parameters have binary settings, parameters (but not rules) are plausibly seen as innate, etc. (See Baker (2001), Roberts and Holmberg (2005); but also Newmeyer (2004), for a contrasting view). 38Thus, it's no surprise that Chomsky, in an early discussion of the role of simplicity, expresses the following: Such considerations are in general not trivial or 'merely esthetic'. It has been recognized of philosophical systems, and it is, I think, no less true of grammatical systems, that the motives behind the demand for economy are in many ways the same as those behind the demand that there be a system at all (Chomsky, 1951). Also, in the The Logical Structure of Linguistic Theory (Chomsky, 1975/1955): It has been remarked in the case of philosophical systems that the motives for the demand for economy are in many ways the same as those behind the demand that there is a system at all [. . . ] It seems to me that the same is true of grammatical systems, and of the special sense of simplicity that will concern us directly. (Chomsky, 1975/1955, p.114) 21 provides input to the search for more fundamental invariances, which will ideally culminate in the subsumption of empirical generalizations under more abstract and inclusive ones, representing more of linguistic phenomena.39 In this way, the expectation is that, as larger and more diverse classes of phenomena are subsumed under fewer and more systematically interrelated basic categories, we will encounter deeper and more significant uniformities, resulting from fundamental properties of language. In addition, the amount of "bruteness" in the domain will be diminished, as will be the stock of primitive or ad hoc postulations. In this period we can see that theory-external considerations of simplicity played a significant role in the direction taken by theorization. Moreover, the link assumed between simplicity and learnabilty also shows the substantive import of simplicity properties in the search for explanatory adequacy. 4.4 Simplicity beyond explanatory adequacy Once the Principles and Parameters model offered a plausible framework for achieving explanatory adequacy, certain issues started to receive more attention. Among them were questions like: why is UG the way it is, and not otherwise? Why does it incorporate the principles and parameters it does, as opposed to other possible ones? This signaled the demand for a principled explanation of the nature of the human language faculty, thereby posing a deeper level of adequacy for linguistic theory, inviting researchers to try to go "beyond explanatory adequacy", by accounting for the characteristics of language in terms of more general, non-linguistic factors. Thus, just as much of the explanatory burden of language-specific grammars and construction-specific rules was reassigned to the universal innate component, now the goal was to shift the latter's burden over to extra-linguistic factors, such as properties of the interface systems and general considerations of computational efficiency. One way of pruning UG was to determine the minimal empirical conditions of adequacy on linguistic theories, so as to ascertain which posits were actually motivated by such conditions, and which ones were purely stipulative, theory-internal or technologically-driven. The minimal conditions were set by a series of "big facts" which any theory of language must account for. These are: (i) sentences are the basic linguistic units; (ii) sentences are pairings of sounds and meanings; (iii) the number of sentences is potentially infinite; (iv) sentences are made up of phrases; (v) the diversity of languages is the result of interactions among principles and parameters; (vi) sentences exhibit displacement properties. The requirements posed by the need to account for these phenomena-which seem to be essential, unavoidable features of human languages-define a domain of "virtual conceptual necessity" (Chomsky, 1995; see also Boeckx, 2006, Hornstein, 2001). Relatedly, there was an increased focus on general criteria of theory evaluation, such as parsimony, naturalness, economy and elegance. This posed the task of turning these vaguely formulated criteria into concrete research objectives. All of these changes gave rise to the Minimalist Program (Chomsky, 1995), in all its different manifestations. There are different kinds of "minimalist" concerns. Those related to general criteria such as parsimony, elegance and naturalness have been called "methodological"-or "methodological economy"- while those having to do with principled explanation are "substantive" -i.e. "substantive economy"-(Chomsky, 2004; Hornstein, 2001; Hornstein et al., 2005). The search for substantively minimalistic theories has been led, to a great extent, by heuristic least effort principles, which furnish the program with a unifying theme. These "least effort" guidelines are the source for what appear to be "theory-internal" notions of simplicity, whereas concerns of methodological economy are intended to be more of a general or "theory-external" nature. Among the main least effort-inspired computational principles are: "Short steps preclude long strides", "Derivations where fewer rules apply are preferred to those which require more", "Movement only applies when it must", "No expressions occur idly in grammatical representation" (i.e. Full Interpretation) (See for discussion, among many others, Boeckx (2006); Hornstein (2001)). The interplay of substantive and methodological economy led to the reappraisal of the merits of the different theoretical components of the Government and Binding (GB) model (Chomsky, 1981, 1982, 1986)-the last pre-minimalist version of the P&P framework-in the light of the "big facts", thematic least-effort principles, and general considerations of parsimony, elegance, etc.40 Now, given 39 As Crain and Pietroski (2006) put it: Linguists often start by proposing a constrained grammar for a certain range of facts concerning adult competence. Then they try to abstract out regularities across lots of particular examples, across many languages. Then they try to explain these cross-linguistic phenomena in the simplest way. And so on. (65) 40GB assumed what was called the "T model" of the architecture of the language faculty, which consisted in a series 22 that language must connect sound and meaning, the semantic and phonetic interfaces (such as LF and PF) seemed obligatory for any viable account, in contrast with other levels (such as DS and SS) whose motivations were largely theory-internal.41 Thus, the "big facts" provided a lower bound on the complexity of the theory, whereas parsimony and least effort considerations provided an upper bound. Here we see again a crucial role being played by simplicity properties in the content of linguistic theory. Much of the motivation for appealing to virtual conceptual necessity, the postulation of economy conditions on derivations and representations, and the role of computational efficiency (e.g., Chomsky (1998, 2002), Piattelli-Palmarini and Uriagereka (2004), Boeckx (2006)) is the hypothesis that language may be an "optimal" or near-optimal solution to mapping between sound and meaning, in the sense that its organization tends to the maximization of resources, to the absence of redundancy and overdetermination in linguistic principles, and similar factors. This could be considered the main working hypothesis (or bet) of the Minimalist approach. This hypothesis leads to the following research-guiding question: "to what extent does language approximate an optimal solution to conditions that it must satisfy to be usable at all, given extralinguistic structural architecture?" (Chomsky, 2005). But what is perhaps most significant for our purposes is that, given the main working hypothesis underlying the Minimalist Program, is that the respective goals of substantive and methodological minimalism converge, at least in the sense of pointing to the same kinds of theoretical decisions. Thus, economy-driven hypotheses about structure-building operations find their rationale in subject matter-specific assumptions about the design principles behind the language faculty. Nevertheless, it is plausible that general considerations of scientific decision-making will also dictate a preference-other things equal-for theories that entail, say, shorter derivations than their alternatives. Likewise, a parsimonious picture of the structure of UG is motivated both by subject matter-general methodological considerations and by subject matter-specific hypotheses about the nature of the linguistic faculty, which, in their turn, provide concrete goals for the pursuit of principled explanation. Finally, both substantive and methodological reasons conspire to motivate the search for unifying various superficially diverse phenomena (e.g. by reducing construal to internal merge, as in Hornstein (2001), among many other cases). This shows how simplicity considerations have become more and more closely entwined with the content of linguistic theory and with the very criteria for a good linguistic explanation. At this point, one could imagine a reply along these lines: "You have discussed two kinds of simplicity considerations: some theory-internal and some theory external. There is no problem with the first, since, although arbitrary, they involve technical notions and criteria that are only defined for a specific theoretical vocabulary, and thus are not meant as a basis for inter-theoretical comparisons. With respect to general, "theory-external" notions of simplicity, the earlier objections still stand. To begin with, they are too vague to serve as working criteria for comparing different theoretical proposals. However, when they are concretized by reference to a particular theoretical dimension-such as "amount of machinery"-arbitrariness and subjectivity crop up at every step. Thus, the prospect of using simplicity as a factor for choosing among different theories remains as hopeless as ever. Still, simplicity considerations are still legitimate as long as they are motivated by "usability" considerations, of a nature that may not transcend specific scientific communities or theoretical orientations." Here we see again two issues, having to do with Ludlow's main claims, namely the criticism of uses of simplicity as a criterion for choosing among theories, and the conception of simplicity as "ease of use". The first issue has to do with the arbitrariness and lack of neutrality of simplicity considerations, which (unsurprisingly) affect the theory-internal sort, but are more obviously relevant for the theory external ones, since they are at least candidates for being used for inter-theoretical comparisons. The second is related to usability considerations. of levels of representation (four, in total), along with the constraints that they placed on structures, the permissible operations, etc. The levels were comprised by two interfaces, called L(ogical)F(orm) and P(honetic)F(orm)-which interacted with the conceptual/intentional systems and the articulatory/perceptual systems, respectively-as well as two "syntax internal" levels, called D-Structure and S-Structure (earlier known as 'deep' and 'surface' structure). See Lasnik and Uriagereka (1988) and Haegeman, (1994) for overviews of the GB approach. 41Similar re-evaluations applied to other aspects of the theory, including its objects, operations and relations (see Hornstein et al., 2005). 23 Now, simplicity criteria are not neutral, but that doesn't mean they have to be arbitrary, even in cases where there are trade offs among different simplicity properties and each of the options may reasonably be said to involve some kind of simplification. In this case, one's appeals to simplicity in favor of a particular decision may be understood as packing the following kinds of claims, say: 1. Alternative A is preferable to alternative B because it is simpler, when simplicity is understood as x (even though B may be simpler in other senses or respects). 2. Maximizing simplicity, understood as x, is a greater priority-in this context, and other things being equal-than achieving simplicity in other competing senses or respects. 3. Maximizing simplicity, understood as x, is a greater priority than achieving simplicity in other competing senses or respects because it brings us closer to achieving a central goal or is a better instance of the program's explanatory ideals or characteristic, unifying thematic content.42 Assumptions 2 and 3 make explicit a kind of rationale for simplicity appeals that would seem arbitrary when claims like 1 are considered in isolation. This sort of consideration may not be enough to license a meaningful direct numerical comparison (say, by counting the number n of posits of kind k in each theory among different theories). However, it does provide principled grounds for focusing on simplicity in respect x, and therefore for preferring theories that exhibit x-simplicity. The factors involved will be in some cases difficult to assess, since they will involve basic choices about what to study and how to study it. However, the x-simplicity motivated judgment will be justified to the extent that one is justified in adopting those basic theoretical commitments, and thus such judgments are not qualitatively different from other judgments with methodological or substantive import. So, there can be rational but non-neutral uses of simplicity, at least for justifying one's simplicity choices. There are several examples in linguistics in which competing simplicity claims have depended on underlying views about subject matter and explanation, rather than on arbitrary appeals to Ockham's Razor or analogous methodological dicta. Thus, in opposition to Postal's (1972) goal of minimizing the number of components of a grammar, Chomsky (1972) argues, on grounds of acquirability, that what must be minimized is the class of possible grammars (see section 4.3 above). The Minimalist Program, for reasons of principled explanation, among others (as we have seen in this section) seeks to simplify the set of primitives in UG, whereas (e.g.) on Culicover and Jackendoff's (2005; 2006) "simpler syntax" proposal, the goal is to minimize the "amount of structure generated by the grammar" (i.e. its structural descriptions)43 for reasons of processability and integrability with psycholinguistics and computational linguistics, among other reasons. Within the minimalist program, some theoretical projects, based on theoretical considerations, seek to maximize derivational economy, whereas others pursue economy of representation, etc. Also, what goes for inter-theoretical judgments applies (perhaps even more clearly) to intra-theoretical decisions, or choices about the direction in which to take one's theory (though, in this case, it is not too clear exactly when two different proposals can be considered as sharing the same theoretical framework, and when they constitute two different theories). Now for the second issue. One of the main reasons I had for reviewing the use of technical, "theoryinternal", simplicity criteria is the light they throw, even if indirectly, on the conceptions of simplicity at play in generative research, and their role in shaping the development of theories. The notion of simplicity as a measure of "linguistically significant generalization"-in abstraction from its specific formal implementation-although narrower and more focused than other "theory-external ones", is far from disconnected from the latter. In both theory-external and theory-internal cases we can see motivations that are independent of reduction of labor, but which have genuinely epistemic or empirical justifications. First, with respect to the simplicity criteria in LSLT, and SPE, they are not simply "technical"-in the sense of being merely technological proposals-and the notions they implement are not so "internal" that they do not reflect the same kind of motivations as "external" notions of simplicity. So, they shouldn't be seen as discontinuous with the external ones, since they say much about the motivations and conceptions of simplicity in generative linguistics. In particular, they reflect how simplicity is conceived in nonusability terms, that is as having genuinely epistemic or conceptual significance, or as forming part of substantive empirical proposals. 42The same considerations are relevant in cases where simplicity, in the relevant respect, is being played against other potentially conflicting desiderata-such as predictive accuracy, applicability in a certain context, or compatibility with a certain discipline or methodology. 43Where such complexity involves "the extent to which constituents contain subconstituents, and the extent to which there is invisible structure" (Culicover and Jackendoff, 2006, 414). 24 5 Ludlow's Claims 5.1 The subjectivity of simplicity Let us start with the first of the theses I am targeting here: Simplicity is not a genuine property of the object under investigation [. . . ] but is rather a property that is entirely relative to the investigator. The claim seems to be that simplicity obtains only relative to some user or judge (the investigator, in this case)-I assume that by 'genuine property' Ludlow means something like 'intrinsic', or 'inherent' property-since, given the relational nature of simplicity, any sensible simplicity claim must be relativized to some judge or user.44 This characterization fits very well attributions of simplicity of use, involving tools, tasks or activities, since those claims cannot be made in the absence of an intended user, and cannot be justified in the absence of knowledge about the user's background and skills. Indeed, Ludlow's view seems to derive intuitive plausibility from consideration of these kinds of cases. However, these sorts of considerations are inapplicable to the instances, already discussed, in which the bearer belongs to the subject matter and thus may not be regarded as a tool or instrument, or to those cases where the import of a claim is substantive. Consider the kinds of substantive, "theory-internal" simplicity claims we have already mentioned, both in linguistics and elsewhere. In those cases the hypothesis that x has the relevant simplicity property S is an empirical claim, and is of the same type as hypotheses that attribute (say) chemical, biological, geological, or other such properties to natural systems. When we consider substantive simplicity properties we must assign the same degree of objectivity to simplicity as any other property of the subject matter contemplated by the theory. This status is underwritten by the causal profile accorded to the simplicity property. In linguistics, simplicity properties such as the size of the search space (in the early "format" view), the binary character of parameters were suppose to make acquisition process smoother overall, thus contributing towards a solution to the problem of explanatory adequacy. More generally, to the extent that simplicity considerations impact issues such as representability, acquirability, evolvability or usability then we must give the simplicity properties in question the same status vis-a-vis subjectivity/objectivity as other postulated causal factors. When we consider "theory-external" simplicity conceptions, we find notions and justifications that depend crucially-even if implicitly-on the features of rational inquiry and its products. Simplicity properties (such as parsimony and unification) play an epistemic role here, and are directly attributed to theories and similar entities, and the direct import of the attributions is epistemic or methodological. In these cases simplicity properties may be thought of as inherent goals or features of systematization, and therefore of theorization, or as ingredients in any adequate understanding of phenomena, or perhaps as an epistemic value or ideal to which theorists can aspire. 5.2 Simplicity as ease of use We cannot suppose that there is a genuine notion of simplicity apart from the notion of 'simple for us to use'. Ludlow's treatment of simplicity seems to presuppose that all of the diverse uses of simplicity in science can be coherently unified under a single, underlying, global conception, which, in his story, amounts to "ease of use" or "user-friendliness". There is a corresponding account of the motivation for seeking simpler theories. This motivation is, without exception, the reduction of the cognitive burden involved in doing research. Can there be a coherent global notion of simplicity? At least on the surface, this seems unlikely, given our previous discussion of varieties of simplicity that play important roles in science, but that appear to be markedly different from usability. Some processes and outcomes that are regarded as simplifications fit the ease-of-use construal very well. Chief among these are some of Ludlow's examples, such as the streamlining of the derivation of empirical consequences, the reduction of time and effort involved in making calculations and other considerations of this nature. In these contexts, what is simplified is a task, and the simplification is attained through the adoption of new tool or procedure or by the modification of those already in 44I.e., that an attribution of simplicity to an object x may be true when evaluated with respect to a given agent (e.g. a community) and false when applied to others, without x having undergone any change. 25 use.45 But, as our previous discussion reveals, there are strong reasons to think that there are cases in which reduction of effort is not what is at stake, and in these cases the significance of simplicity cannot be fully appreciated from the user-friendliness point of view. In previous sections we discussed the "U-pattern" as a type of application of simplicity properties, as well as the theoretical strategies and research styles associated with it. In particular, we saw that the drive towards unification and uniformity often leads to the formulation of theories that are more general and also more abstract. Highly abstract accounts are typically more removed from empirical data than local, less abstract ones. Because of this, it may be less obvious in the first case how the theory bears on the data, and the derivation of empirical consequences may also be less direct than in the case of its less abstract counterpart. So, this is a case in which we have an increase in simplicity, brought about by a more unified picture of the world, but without a gain in usability or expedience. For instance, consider two ways of accounting for the same domain of phenomena. The first one, U , involves a small number of empirically adequate, tightly interconnected lawlike generalizations of a very broad scope and high level of abstraction, but which can only be related to empirical data via a complex web of theoretical assumptions, abstractions and idealizations. The second option, V , consists of a larger collection of lawlike generalizations, where each of its members (in comparison to those of U) is: (a) largely independent from-or at best loosely connected to-the rest and (b) local in scope of application. Moreover, each of them: (c) is observationally adequate (i.e. it "saves the phenomena"), (d) is couched in phenomenological terms that are low in abstraction, (e) allows observable predictions to be derived in a relatively straightforward fashion. According to the Peirce-Lindsay-Ludlow school of thought, we ought to say that V is simpler than U . This is because the hypotheses in V are "perspicuous enough to know what is predicted", and in comparison with U , they are "more readily deduced and compared with observation". In consequence, they "simplify our calculations" and allow "for the greatest ease of use". In other words, V minimizes cognitive labor, among other valuable resources, since it "leads in minimum time to success in [. . . ] prediction" (See pp. 160-1, among other places). Furthermore, if we subscribe to the view that there is no "genuine notion of simplicity apart from the notion of 'simple for us to use' " then we must conclude that V is simpler than U in every sense relevant to science, and is thus preferable on grounds of simplicity. However, there obviously is a legitimate sense in which U provides a simpler account of the phenomena, because of the greater unity, uniformity and systematicity it confers on the phenomena under study. It is also the case that U and the approach it represents possesses important similarities with instances of the U-Pattern, and is in that respect a candidate for achieving its associated desiderata (whereas V is not). So, we seem to be faced with a trade-off between two desirable states of affairs that can be described as "simple". Nevertheless, we should not construe the opposing virtues of U and V as the outcomes of two mutually incompatible strategies for maximizing a single commodity called 'simplicity', understood as the potential for economization of labor. (For one thing, the adoption of U seems to make empirical testing more intricate, and thus more labor-intensive). The potential trade-offs indicate something else, namely, that very dissimilar notions-different S-properties-go by the name 'simplicity', and that their distinctness is evidenced by the conflicting demands posed by each. Seen in their proper contexts, both theoretical frameworks provide something that we can, with perfect naturality, call 'simplicity'. This situation, however, should not be surprising, since there is no good reason to think that there is a unitary, if protean, desideratum called 'simplicity', which would be sensitive to such diverse considerations as ease of learning and computational expediency as well as (e.g.) uniformity and explanatory depth. 5.2.1 A reply from usability Ludlow will perhaps reply that his views on simplicity can account for the desirability, not only of notational perspicuity, but also of uniformity, and that the objections presented thus far rest on an excessively narrow reading of "simple-to . . . " and user-friendliness. This reply is very hard to evaluate, because it is not obvious how the account would have to be developed to meet the challenge. More specifically, there is no obvious way of cashing out talk of "ease of use" in a way that captures the kind of considerations at issue while also retaining the initial 45Understanding 'tool' in a sense broad enough to include, for instance, notational systems, experimental procedures and theoretical models. 26 selling points of the view (i.e. its relative clarity and its deflationary character) and without watering down the notion of ease of use to the point of dissolution. Ludlow's account of his preferred interpretation of simplicity talk in science involves a use of 'simple' which is close in meaning to 'easy', and has 'difficult' (or 'complicated', and sometimes 'complex') as antonym. This sense applies primarily to tasks or actions, and derivatively to things that are characteristically involved in the task.46 However, the uses of 'simplicity' in the claims made in the context of U -theories make reference to simplicity properties that are not necessarily associated with specific tasks. Thus, to apply to any particular situation Ludlow's "ease of use" and "reduction of cognitive labor" notions of simplicity we must first identify the target task, as well as the relevant simplification criteria. Also, we must be clear about the relevant kinds of simplicity bearers and about how they fit into the tasks in question. More specifically, we need answers to these questions, among others: 1. What is the nature of the item x (the bearer) whose use is being discussed? 2. For which kind of tasks is x used, and which goals do those tasks subserve? 3. What are the specific uses to which x is put in the tasks at issue? 4. What does the simplification of the task consist in, and which were the factors contributing to such outcome? The clues given in Ludlow's text, by way of examples and elaborations of the key notions, are not terribly helpful in answering these questions. Thus, we are told that simplicity, as a property, "turns on the kinds of elements that the investigator finds perspicuous and 'user-friendly' " (153, emphasis mine). This seems pertinent to (1), but a proper answer must specify the relevant kinds of "elements". On this score, Ludlow suggests that simplicity is associated with "representations and rules that are perspicuous to the theorist" (158, emphasis mine), and that simplicity criteria relate to certain desirable traits of a theory, namely "that it be perspicuous enough to know what is predicted, and that its notation be easy to use" (160, emphasis mine). In several places throughout the text it is unclear whether Ludlow is referring to properties of notations, of formulations, or of theories, and he does not sufficiently distinguish the diverse S-properties that different entities can exhibit. But we must avoid conflating theories-theoretical ideas or conceptual elements-with their formulations, that is, the specific ways in which the conceptual material can be encoded and articulated (even if the two are clearly not independent). There are many considerations that enter into the choice of a particular notation or formulation- including the purpose it is meant to serve, whether (e.g.) computational, pedagogical or heuristic- that do not find an obvious counterpart in the choice of a system of ideas. Thus, one and the same theory can be expressed in different ways, depending on the aspect being emphasized, the problem being addressed and the tools available, the intended audience, the discipline's traditions, and many other sundry considerations. As I already noted, these distinctions among simplicity bearers are not clearly observed in Ludlow's discussion, and this makes it hard to pin down his proposals. At any rate, in the kind of case we are considering, what is at issue is not a choice among notations, or even among formulations, but among conceptual frameworks. This is because U and V -and those actual cases that are relevantly similar-were not, by hypothesis, "two ways of saying the same thing", that is, they constitute neither notational variants, nor alternative encodings of the same conceptual system. Rather, we supposed that there were substantial differences among them, involving different conceptual elements. It may be replied that theories, even if distinct from their canonical or standard formulations, are often strongly associated with them in their users' minds. In addition, it can be pointed out that theories often suggest or "come bundled" with previously unavailable notational resources. Nevertheless there is no reason to think that in all (or even most) the cases we are concerned with the theory is chosen in virtue of the usability of its notational apparatus or the handiness of some of its formulations.47 So, we must bear in mind that the crucial factor here is simplicity, as attributed to theories themselves. 46E.g. "The questions in the exam were quite simple" (to answer, solve, etc.). Note that the adjective 'simple', when used in this sense, can be modified by an infinitival clause, so as to form complex (often hyphenated) adjectives, as in: 'simple-to-follow instructions', 'simple-to-operate machinery', etc. This is clearly not the case with other uses or senses of 'simple'. 47Though this might well be the case in other kinds of contexts, in which theories are employed as instruments in the pursuit of practical (as opposed to theoretical) goals. 27 Things are not any easier when it comes to tasks, goals and outcomes. Thus, we are told that "the simplest theory is that theory which they [the scientists] find easier to use for constructing and evaluating hypotheses" (161). But "constructing and evaluating hypotheses" can in principle fit everything from daydreaming-as in Kekulé's anecdotal account of his "discovery" of the ring shape of the benzene molecule-48 to the detection of subatomic particles with the use of high-energy particle accelerators. So, we need specific activities, tasks or outcomes to anchor the discussion. Thus, the target tasks cannot be computation and calculation in the service of prediction, since we assumed that U was not particularly conducive to expediency. Other candidate tasks mentioned by Ludlow include learning the theory (one of the possible senses of "easy to understand") and describing and explaining phenomena by means of it. Given the breadth and complexity of considerations introduced by the latter, I will first consider ease of learning. It is possible that, given a theory that is particularly hard to master, someone can devise a way of formulating it that is didactically superior-say, by eschewing details or suggesting intuitive analogies-than the ones available. But again, this concerns the advantages of a formulation of (part of) the theory, not of the formulated theory itself. So, is it that the simplicity of theories-and the reason for adopting them- consists in the theories themselves being easier to understand? This may be a psychological factor contributing to the theory's acceptance, but the ease with which a theory can be grasped is independent from its truth, empirical adequacy and explanatory force. Thus, if a gentler learning curve is all that recommends U -theories over their alternatives, then it is far from evident how their adoption could by itself constitute epistemic progress, when the latter is conceived in terms of truth, accuracy, explanatoriness and similar aims. But the "understandability" view has deeper problems, since it often happens that the concepts involved in unifying theories are quite abstract and removed from ordinary experience, and their consequences counterintuitive and hard to grasp. So, let us suppose that "easy to understand"-more in keeping with the aims of scientific theorization- is not intended to apply (exclusively) to the intelligibility of theories per se, but to the intelligibility they confer on their respective subject matters (that is, it has to do with what we earlier called "attributive simplicity"). With this understanding in mind, it may be proposed that the effect of unifying theories is to facilitate the description, understanding or explanation of phenomena. In this case, no particular task or outcome can be singled out as the distinct target of simplification, because the increase of simplicity is distributed among all sorts of activities and outcomes and it is their combined effect that makes itself felt in the general simplification of description and explanation. Nevertheless, is it really the case that unifying theories, in virtue of their comparative simplicity, make it easier to describe or explain phenomena in the world? And if so, in which sense? Episodes are not rare in which unifying theories, far from making reality easier to understand, actually tend to produce the opposite effect. These theories may ascribe to the world features that fly in the face of intuition and common sense, and the result is that the world, in a very important sense, has become more difficult for us to comprehend. But there are more significant issues involved. Characterizing the outcome of theoretical change in terms of relative difficulty seems to assume that before the unifying theory became available people already had an explanatory account of the phenomena, and that the only difference is that explanation (or understanding) was more effort-involving prior to the adoption of the U -theory. In many important cases, though, the situation seems rather different, in that the outcome concerns the quality of the understanding. For instance, having arrived at a more unified, simpler account, the greater complexity-complex and disconnected, less uniform-of previous accounts may suggest that the phenomena had not really been previously understood, and that the new theoretical perspective has rendered them intelligible. Thus, saying that the theory made it easier for people to understand the phenomena would be a bit like saying that the invention of the telephone made it easier for people to speak to each other across the Atlantic, or, closer to our topic, that the the availability of the microscope and the telescope afforded people an easier way to observe microorganisms and celestial bodies, respectively. But these would be misguiding ways of describing the significance of these developments. In the case of the telescope the advantages conferred had to do with improved accuracy and range of visibility, rather than ease or convenience, and in the case of the microscope, they involved a whole realm becoming visible for the first time. It is possible that the notions of "easy to use" and "easy to understand" can be made inclusive enough to cover the problematic examples. But this would lead-one can surmise-to either of two possible outcomes. In the best case, after being thus stretched, the notion would become so hazy 48See Hempel, 1966). 28 that it would no longer live up to its promise of furnishing the clearest account of simplicity. In the worst case, the notion would be spread out so thin that it would be too unsubstantial to support any meaningful account. But perhaps the greatest disadvantage of focusing on ease-notions in the discussion of simplicity is that, in so doing, we risk losing sight of crucial matters. Among these are the extent to which simplicity is central in scientific description and explanation, and the fact that, rather than being a desirable but optional feature, it is often an inherent (even if often implicit) goal of rational inquiry. These considerations have great repercussions. For instance, Ludlow (159, footnote 4) claims that his instrumentalism about simplicity does not entail instrumentalism about scientific theories in general, and that his view is consistent with scientific realism. Nevertheless, and in light of the deep, constitutive connections between simplicity and other fundamental properties of scientific theories, it is difficult to see how instrumentalism about simplicity can be kept isolated from instrumentalism about the goals of science in general. Such separation only seems plausible when we see simplicity as orthogonal to explanatoriness, as a "supra empirical" property that is only invoked to break ties between otherwise equally successful theories. In any case, retreating into instrumentalism wouldn't be an option, either. Scientific research, even when undertaken in a descriptive or taxonomic spirit, or when its nature is construed in an instrumentalist fashion, will always have generalization and systematization as inherent goals. But generalizing and systematizing involve all sorts of operations and outcomes that would count, on any reasonable point of view, as objective simplifications, even if it's only data compression that is at issue. So, even the Machian instrumentalist will have to acknowledge that some theory-external, objective notion of simplicity is at the heart of scientific description and explanation. However, unlike the Machian, we argued that the motivation for such simplicity isn't always the reduction of effort, since it often aims at furthering description and explanation. Neglect of these aspects is particularly serious when it comes to generative grammar, whose development has been largely driven by simplicity-encompassing distinct, though not unconnected S-properties. 6 Conclusion There is no doubt that the minimization of effort has always played an important role in the practice of science, as it has done in any other complex human endeavor. It is also true that occurrences in scientific documents of 'simple' and its cognates often allude to 'ease of use' (as in "to simplify, we will . . . ") Nevertheless, it is clear that, pace Ludlow, there are genuine notions of simplicity apart from the notion of "simple for us to use", which are not pursued for the sake of reducing labor. In this paper I focused on two cases involving such notions. The first one was constituted by simplicity appeals that carry substantive import, regardless of whether the simplicity-property is directly imputed to entities in the theory's domain or whether the substantive implications arise indirectly, through the simplicity of theories or hypotheses. I also mentioned other types of simplicity ascriptions, which involve other S-properties and may be local in scope, but which are also substantive in their import, such as optimality, minimality, least effort, symmetry, among others. Among the examples of this kind of simplicity appeal we found the ascription (and pursuit) of uniformity to entities in a domain and the use of parsimony-as the absence of superfluity-in the delineation of causal factors or processes. The second relevant kind of uses were those in which simplicity plays a distinctively epistemic role that bears on crucial aspects of the activity and products of theorization. We saw, for instance, that simplicity was of a piece with systematization, in general, and with scientific theory building, in particular, inasmuch as the former is a defining trait of the latter. It was also noted that simplicity was a requirement for our understanding of phenomena, by making previously opaque phenomena intelligible or by being an essential factor in the search for deeper explanations and more unified accounts. Furthermore, we also saw that none of these kinds of feature is either reducible to ease of use or is necessarily motivated by considerations of reduction of labor.49 Likewise, it is not always the 49There are many views and topics related to simplicity that we didn't discuss here because they were not immediately relevant to our concerns. Among the excluded views are some which assign a global methodological import to simplicity, such as a fundamental part in induction (see Kemeny 1953, Harman 1999, increases prior probability (Jeffreys, 1957, Hesse, 1974), or promotes testability (Popper, 1959). We also left out proposed simplicity metrics (such as those involving the number of free or adjustable parameters in a hypothesis), and we didn't use curve-fitting as a central 29 case that simplicity is "in the eye of the researcher", in any special or interesting way. These inadequacies are compounded by the fact that many of those S-properties that do not fit comfortably in Ludlow's scheme, such as uniformity, (non-practical) economy and cohesion have always played a key role in the development of generative grammar, the discipline that Ludlow seeks to characterize. Moreover, it seems that skepticism about the simplicity-based patterns of reasoning that linguists frequently employ must entail a corresponding skepticism about the conclusions drawn with the aid of these arguments, that is, about the theories and results that constitute linguistic theory in the generative-transformational tradition.50 References Baker, Alan. 2011. "Simplicity." In Edward N. Zalta (ed.), The Stanford Encyclopedia of Philosophy. Summer 2011 edition. Baker, Mark. 2001. The Atoms of Language. NY: Basic Books. Barnes, Eric. 2000. "Ockham's razor and the anti-superfluity principle." Erkenntnis 53:353–374. Berwick, Robert. 1985. The Acquisition of Syntactic Knowledge. MIT press. Berwick, Robert C., Pietroski, Paul, Yankama, Beracah, and Chomsky, Noam. 2011. "Poverty of the Stimulus Revisited." Cognitive Science 35. Boeckx, Cedric. 2006. Linguistic Minimalism: Origins, Concepts, Methods and Aims. Oxford: Oxford University Press. Bunge, Mario. 1961. "The weight of simplicity in the construction and assaying of scientific theories." Philosophy of Science 28:120–149. -. 1962. "The complexity of simplicity." The Journal of Philosophy 59:113–135. Chater, Nick and Vitányi, Paul. 2003. "Simplicity: A unifying principle in cognitive science?" Trends in cognitive sciences 7:19–22. Chomsky, Noam. 1951. The Morphophonemics of Modern Hebrew. Master's thesis, University of Pennsylvania, New York. -. 1957. Syntactic structures. The Hague: Walter de Gruyter. -. 1964. "The Logical Basis of Linguistic Theory." In Proceedings of the ninth international congress of linguists, 914–978. The Hague: Mouton and Co. -. 1965. Aspects of the Theory of Syntax. Cambridge, MA: The MIT Press. -. 1970. "Remarks on Nominalization." In Peter S. Rosenbaum (ed.), Readings in English Transformational Grammar, 184–221. Georgetown Univ School of Language. -. 1972. Studies on Semantics in Generative Grammar. The Hague: Mouton. -. 1973. "Conditions on Transformations." In A Festschrift for Morris Halle, 232–286. New York: Holt, Rinehart & Winston. -. 1975/1955. The Logical Structure of Linguistic Theory. New York: Plenum press. -. 1976. Reflections On Language. Temple Smith. illustration of the role of simplicity in hypothesis testing and formulation, as many discussions of simplicity do. I have also left out of the discussion the possible role of uniformity as a precondition for the practice of induction (as in Newton (1687/1999), Principles 2 and 3, and Hume (1740/2000), book 1 part 3, sect 6 and Abstract), for instance, but only as a desirable outcome of theorization 50I would like to thank Howard Lasnik and Paul Pietroski for discussion, and Howard Lasnik for reading parts of the manuscript. I would also like to thank Matt Haber, Melinda Fagan, Ewan Dunbar and Shannon Barrios, for their help with various parts of this paper. 30 -. 1980. Rules and Representations. London: Blackwell. -. 1981. Lectures on Government and Binding. Dordrecht: Foris. -. 1982. Some Concepts and Consequences of the Theory of Government and Binding. Cambridge, MA: The MIT Press. -. 1986. Knowledge of Language. London: Praeger. -. 1995. The Minimalist Program. Cambridge, MA: The MIT Press. -. 1998. "Some Observations on Economy in Generative Grammar." In Pilar Barbosa (ed.), Is the best good enough?: Optimality and competition in syntax, 115–127. Cambridge, MA: MIT Press. -. 2002. On Nature and Language. Cambridge: Cambridge University Press. -. 2004. The Generative Enterprise Revisited: Discussions with Riny Huybregts, Henk van Riemsdijk, Naoki Fukui and Mihoko Zushi. Walter de Gruyter. -. 2005. "Three Factors in Language Design." Linguistic inquiry 36:1–22. -. 2012. The Science of Language: Interviews with James McGilvray. Cambridge: Cambridge University Press. Chomsky, Noam and Halle, Morris. 1968. The Sound Pattern of English. Harper and Row. Chomsky, Noam and Lasnik, Howard. 1995. "The Theory of Principles and Parameters." In Noam Chomsky (ed.), The Minimalist Program. Cambridge, MA: The MIT Press. Crain, Stephen, Gualmini, Andrea, and Pietroski, Paul M. 2005. "Brass Tacks in Linguistic Theory: Innate Grammatical Principles." In The Innate Mind: Structure and Contents. New York: Oxford University Press. Crain, Stephen and Pietroski, Paul. 2006. "Is Generative Grammar deceptively simple or simply deceptive?" Lingua 116:64 – 68. Culicover, Peter and Jackendoff, Ray. 2005. Simpler Syntax. Oxford: Oxford University Press. -. 2006. "The simpler syntax hypothesis." Trends in Cognitive Sciences 10:413–418. Friedman, Michael. 1974. "Explanation and Scientific Understanding." Journal of Philosophy 71:5–19. Goodman, Nelson. 1943. "On the simplicity of ideas." The Journal of Symbolic Logic 8:107–121. -. 1951. The Structure of Appearance. Bobbs-Merrill. -. 1955. "Axiomatic measurement of simplicity." The Journal of Philosophy 709–722. -. 1958. "The Test of Simplicity." Science 128:1064–1069. Guasti, Maria Teresa. 2004. Language Acquisition: The Growth of Grammar. Cambridge, MA: The MIT Press. Haber, Matt. 2008. "Phylogenetic inference." In Aviezer Tucker (ed.), A Companion to the Philosophy of History and Historiography, 231–242. Oxford, UK: Wiley-Blackwell. Haegeman, Liliane. 1994. Introduction to Government and Binding Theory. Oxford: Blackwell, 2nd edition. Halle, Morris. 1961. "On the Role of Simplicity in Linguistic Descriptions." In Roman Jakobson (ed.), Structure of Language and its Mathematical Aspects, 89–94. American Mathematical Society. Harman, Gilbert. 1999. "Simplicity as a Pragmatic Criterion for Deciding what Hypotheses to Take Seriously." In Gilbert Harman (ed.), Reasoning, Meaning and Mind, 75–92. Oxford: Oxford University Press. Harris, Zellig S. 1951. Methods in structural linguistics. Chicago, Il: University of Chicago Press. 31 Hempel, Carl Gustav. 1966. Philosophy of natural science. Prentice-Hall Englewood Cliffs, NJ. Hesse, Mary. 1967. "Simplicity." In P. Edwards (ed.), The Encyclopedia of Philosophy, Vol. 7, 445–448. New York: Macmillan. Hesse, Mary B. 1974. The structure of scientific inference. University of California Press. Hornstein, Norbert. 2001. Move! A Minimalist Theory of Construal. WIley-Blackwell. Hornstein, Norbert and Lightfoot, David. 1981. "Introduction." In Norbert Hornstein and David Lightfoot (eds.), Explanation in Linguistics, 9–31. New York: Longman. Hornstein, Norbert, Nunes, Jairo, and Grohmann, Kleanthes K. 2005. Understanding Minimalism. Cambridge: Cambridge University Press. Hume, David. 1740/2000. A Treatise of Human Nature. Oxford: Oxford University Press. Jackendoff, Ray. 1977. X-Bar Syntax: A Study of Phrase Structure. Cambridge, MA: MIT Press. Jeffreys, Harold. 1957. Scientific Inference. Cambridge: Cambridge University Press. Jones, Todd. 2008. "Unification." In Stathis Psillos and Martin Curd (eds.), The Routledge Companion to Philosophy of Science, 489–497. London: Routledge. Kemeny, John G. 1953. "The use of simplicity in induction." The Philosophical Review 391–408. Kitcher, Philip. 1981. "Explanatory Unification." Philosophy of Science 48:507–531. Lasnik, Howard. 2000. Syntactic Structures Revisited. Cambridge, MA: The MIT Press. Lasnik, Howard and Uriagereka, Juan. 1988. A course in GB syntax: Lectures on binding and empty categories. Cambridge, MA: The MIT Press. Lindsay, Robert. 1937. "The meaning of simplicity in physics." Philosophy of Science 4:151–167. Lipton, Peter. 1991. Inference to the Best Explanation. London: Routdlege. Ludlow, Peter. 2011. The Philosophy of Generative Linguistics. Oxford: Oxford University Press. Mach, Ernst. 1898. Popular scientific lectures. Open Court Publishing Company. -. 1907. The science of mechanics: A critical and historical account of its development. Open Court Publishing Company. McShea, Daniel W and Brandon, Robert N. 2010. Biology's first law: the tendency for diversity and complexity to increase in evolutionary systems. University of Chicago Press. Newmeyer, Frederick J. 2004. "Against a parameter-setting approach to typological variation." Linguistic Variation Yearbook 4. Newton, Isaac. 1687/1999. Mathematical Principles of Natural Philosophy (Principia Mathematica). Berkeley, California: University of California Press. Piattelli-Palmarini, Massimo and Uriagereka, Juan. 2004. "The Immune Syntax: The Evolution of the Language Virus." In L. Jenkins (ed.), Variation and Universals in Biolinguistics. Oxford: Elsevier. Popper, Karl. 1959. The logic of scientific discovery. New York: Basic Books. Postal, Paul. 1972. "The Best Theory." In Stanley Peters (ed.), Goals of linguistic theory, 131–70. Prentice Hall. Quine, W. V. 1966. The Ways of Paradox and Other Essays. New York, NY: Random Hourse. Roberts, Ian and Holmberg, Anders. 2005. "On the role of parameters in Universal Grammar: A reply to Newmeyer." In Hans Broekhuis (ed.), Organizing Grammar: Linguistic Studies in Honor of Henk van Riemsdijk., 538–553. Berlin: Mouton de Gruyter. Rudner, Richard. 1961. "An introduction to simplicity." Philosophy of Science 28:109–119. 32 Schulz, Daniel. 2012. Simplicity in science. Ph.D. thesis, University of Iowa. Sober, Elliott. 1988. Reconstructing the past: parsimony, evolution, and inference. Cambridge: The MIT press. Van der Helm, Peter A. 2014. Simplicity in vision: A multidisciplinary account of perceptual organization. Cambridge University Press. Weinberg, Steven. 1992. Dreams of a final theory: The scientist's search for the ultimate laws of nature. New York, NY: Random House Digital, Inc. Whewell, William. 1840/1984. "History of the Inductive Sciences." In Yehuda Elkana (ed.), Selected Writings on the History of Science. University of Chicago Press. -. 1858. Novum Organon Renovatum. JW Parker and son. Wrinch, Dorothy and Jeffreys, Harold. 1921. "On certain fundamental principles of scientific inquiry." The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 42:369–390.