Choosing Short: An Explanation of the Similarities and Dissimilarities in the Distribution Patterns of Binding and Covaluation Covaluation is the generalization of coreference introduced by Tanya Reinhart. Covaluation distributes in patterns that are very similar yet not entirely identical to those of binding. On a widespread view, covaluation and binding distribute similarly because binding is defined in terms of covaluation. Yet on Reinhart's view, binding and covaluation are not related that way: binding pertains to syntax, covaluation does not. Naturally, the widespread view can easily explain the similarities between binding and covaluation, whereas Reinhart can easily explain the dissimilarities. Reciprocally, the widespread view finds it harder to explain the dissimilarities, whereas Reinhart finds it harder to explain the similarities. Reinhart and others have proposed more than one explanation of the similarities, but as I argue, these explanations do 1 DR AF T not work. Hence although I adopt Reinhart's view, I propose a new explanation of the similarities and dissimilarities between binding and covaluation: While Reinhart has invoked semantic structure only to explain dissimilarities, I do so to explain both similarities and dissimilarities at once. Finally, I examine in light of this approach the topics of language acquisition, onlyconstructions, the identity predicate, the Partee/Bach/Higginbotham problem, the Dahl puzzle and its recent versions by Roelofsen. 1 Introduction It often seems that we must distinguish between binding and covaluation. Take, for instance, (1): (1) John thinks he is a great swimmer, and so does Jack. (where he refers to John)1 We can read (1) two ways. On reading (2 a), John thinks himself a great swimmer and Jack thinks himself a great swimmer; on reading (2 b), both John and Jack think John a great swimmer. Hence in (2 a) he seems to be bound, whereas in (2 b) it seems to pick up reference freely from the context:2 (2) (a) (Binding) [ John 1 [ t1 thinks he1 is a great swimmer ] ∧ Jack 2 [ t2 thinks he2 is a great swimmer ] ]3 (b) (Covaluation) [ John 1 [ t1 thinks he3 is a great swimmer ] ∧ Jack 2 [ t2 thinks he3 is a great swimmer ] ], where he3 refers to John. 1For further examples, see Gareth Evans (?:356–357). 2This is not uncontroversial; see, e. g., Fiengo and May (?:129–189). Nevertheless, it doesn't matter for now whether we can render the two readings of (1) as (2 a) and (2 b). All I am aiming is to illustrate the distinction between binding and covaluation, and herein we are served quite well by the difference between (2 a) and (2 b) themselves. 3I borrow the logical form notational conventions from Irene Heim and Angelika Kratzer, ?. 2 DR AF T Here, in short, is how binding and covaluation differ. Binding occurs when two DP's are coindexed and one c-commands the other.4 When this happens, the bound DP must (arguably) take the same semantic value as its binder.5 Covaluation, on the other hand, occurs when two DP's receive the same semantic value although neither binds the other (?:301). A special case of covaluation is coreference, which obtains betweenDP's that refer to the same entity. Covaluation, however, can also involve non-referential expressions such as quantifier traces and even bound pronouns (although, by definition, the latter must be bound by something else than the DP's they are covalued with). Distinct as they may be, binding and (intended) covaluation distribute in remarkably similar ways. For one thing, we are usually unable to read pronouns as covalued with c-commanding DPs that are prohibited by Condition B from binding them: (3) * Lucy saw her. (where her refers to Lucy) And for another thing, we are usually unable to read R-expressions as covalued with ccommanding DPs, and this mirrors binding Condition C: (4) * She saw Lucy. (where she refers to Lucy) Let us call these distribution similarities Convergence: (Convergence) When binding is disallowed by Binding Theory, then covaluation is usually also disallowed. Notwithstanding Convergence, it is well-known that covaluation, even when intended, may stray from the distribution patterns of binding. Take for instance (5), where the Rexpression Jane may be covalued with this, although it may not be bound by it: 4This is the concept of binding at S-structure, stemming from ?; a version of this concept, called 'syntactic binding,' is found in ?:260–62. A more recent development identifies a distinct but similar and closely allied type of binding at LF, often called 'semantic binding' (?:115–23; ?:300). Note that whereas Reinhart proposes to replace old with new, Heim and Kratzer consider them complementary. 5We are not concerned with whether this generalization is exceptionless; rather, the only kind of binding we are interested in is the kind that does necessitate sameness of semantic value. 3 DR AF T (5) (Introducing Jane) This is Jane. Let us call the distribution dissimilarities between binding and covaluation Divergence: (Divergence) When binding is disallowed by Binding Theory, covaluation is sometimes nevertheless allowed. It is a challenging problem to explain both Convergence and Divergence at once. And although linguists and philosophers have tried for a few decades, we still have no complete solution to this problem essential to the study of anaphora. But while we have no complete solution, we do have two promising approaches: I will call them the Prevailing View6 and (Tanya) Reinhart's Thesis. On the Prevailing View, binding and covaluation are facets of one and the same phenomenon; indeed, the former is defined in terms of the latter. (More exactly, covaluation is modeled by coindexation, in terms of which binding is defined.) The Prevailing View has its classic expression in Noam Chomsky's Government and Binding theory (reprinted in ?). For a recent version of the Prevailing View, see Irene Heim's ?. Contrary to the Prevailing View, Reinhart's Thesis maintains that binding and covaluation are independent phenomena: binding pertains to syntax, covaluation does not. Naturally, the Prevailing View can explain Convergence and Reinhart's Thesis can explain Divergence; and reciprocally, the Prevailing View finds it harder to explain Divergence and Reinhart's Thesis harder to explain Convergence. Tanya Reinhart herself has proposed three explanations of Convergence-the most recent in ??; the oldest in ??; and the third, together with Yosef Grodzinsky, in ?. Nevertheless, I will argue in (2) that these explanations do not work. Yet although these proposed explanations do not work, I will argue in Section 3 that we can still adopt Reinhart's Thesis to explain Convergence and Divergence. Toward this 6The Prevailing View needn't prevail at present; I call it so for historical reasons. 4 DR AF T new explanation I will exploit the difference in the semantic structures of sentences such as (6 a) and (6 b): (6) (a) [ Jane 1 [ t1 saw herself1 in the mirror ]] (b) [ Jane 1 [ t1 saw her2 in the mirror ]] (where her2 = Jane) By a sentence's 'semantic structure' I mean the way the sentence's content is compositionally determined. Let me emphasize that when I talk of semantic structures I do not mean to talk of structured propositions. Indeed, I am entirely neutral as to whether semantic contents consist in structured propositions or in unstructured ones. By semantic structures I have in mind not the structures of the semantic contents of sentences, but rather the structures of the way these contents are determined. Let us call semantic structures such as (6 a)'s and (6 b)'s 'short,' resp. 'long.' Not that semantic structures have lengths-I choose these words merely because I find them concise and intuitive. Before I define short and long structures, let me say a few words. First, no structure is simply short or simply long; rather, a structure is shorter than some and longer than others. Second, we cannot compare the shortness of any two arbitrary structures; we can only compare structures such as (6 a)'s and (6 b)'s, which involve the same functions (λ-predicates) with the same arguments. Take, now, two such semantic structuresA andB. Assume that one and the same entity (e. g., Jane) plays two different argument roles in each of A and B. Then we say that A is shorter than B if and only if A carries the information that it represents the same entity twice, whereas B does not. Notice, for instance, that the semantic structure of (6 a) carries the information that Jane and herself1 take the same value.7 In contrast, the structure of (6 b) does not carry 7(6 a) carries this information by means of binding; we shall find cases, however, well the information is carried in less direct ways. 5 DR AF T the information that Jane and her2 corefer. Hence we say that (6 a)'s structure is shorter than (6 b)'s. Let me clarify the sense in which short semantic structures 'carry' such information. I am not claiming that short structures determine propositions that contain the information; in other words, I am not claiming that sentences with short structures semantically encode the information. Instead, I am claiming that if we grasp the short structures then ipso facto we possess the information. E. g., we cannot grasp the semantic structure of (6 a) unless we understand in the same act that Jane and herself1 corefer. On the other hand, we can grasp the semantic structure of (6 b) without implicitly understanding that Jane and her2 corefer. Indeed, we can even ignore this information altogether, and we do ignore it in Frege cases. It is not accidental, therefore, that Neo-Russellian philosophers have given great attention to sentences with long semantic structures, seeking in them the resolution to Frege's puzzle (??????). Since neo-Russellians maintain that coreferring names have the same semantic values, they must seek the solution not in the semantic values themselves but rather in the ways these values are derived. While Neo-Russellians have been using long structures to illuminate the theory of belief, i. e., while they have been using language to illuminate cognition, I will turn things around and use cognition to illuminate language. Linguists have long recognized the importance of short and long semantic structures, in connection with well-known cases of Divergence originating with Gareth Evans (?:356). According to Reinhart, these Divergence cases occur "when structured meaning matters" (I borrow the phrase from Heim ?:216). Unlike Reinhart, however, I will refrain from invoking structured meaning; I will nevertheless invoke semantic structure. Andwhile Reinhart only uses structure to explain Divergence, I will argue that semantic structure always matters and that we can use it to explain both Divergence and Convergence at once. 6 DR AF T 2 Extant Explanations of the Convergence of Binding and Covaluation If we adopt Reinhart's Thesis, as I propose, then we need to explain Convergence. Reinhart herself has proposed to do so thrice: once in ? and in ?, once together with Yosef Grodzinsky in ?, and once in ?/?.8 As I will argue, however, these explanations of Convergence do not seem to work. 2.1 Reinhart 2006 Let us first examine Reinhart's most recent explanation. In ? and ?, Reinhart proposes to explain Convergence by invoking a principle she calls "minimize interpretative options"- henceforth MIO (?:101–105, 181–86). Reinhart motivates this principle by arguing that we can use it to (partially) explain the possibility of communication: (MIO) An interpretation is blocked "if it is indistinguishable from an interpretation ruled out by principles of the [computational system]" (?:186).9 Here is the motivation behind MIO: It is easy to see why such a principle could be useful . . . . The problem . . . is how to minimize the set of possible interpretations of a given PF. The more options there are, the more mysterious is the fact that speakers manage to understand each other. In the specific case of anaphora resolution, the problem is how to restrict the set of potential antecedents for a given pronoun . . . . If the computational system provides a restriction of that set, it is not cooperative for users to overrule that . . . . (?:185) 8Daniel Büring (?) and Floris Roelofsen (?) have proposed further recent implementations of Reinhart's Thesis. In these papers, however, they are concerned to cover and systematize the data, rather than to explain Convergence. 9Eric Reuland proposes a closely related principle in ?. 7 DR AF T I will argue that the motivation behind MIO makes faulty predictions. And since it is this very motivation that carries the explanatory load, Reinhart's ?/? explanation of Convergence is unsatisfactory. First, let us review the explanation. Reinhart distils Convergence into the following version of Rule I: Rule I . . . α and β cannot be covalued in a derivation D, if a. α is in a configuration to A-bind β, and b. α cannot A-bind β in D, and c. The covaluation interpretation is indistinguishable from what would be obtained if α A-binds β. (?:185) Here is how Reinhart uses principle MIO to explain Rule I and thereby Convergence. Take (7 a): (7) (a) * Elmer tricked him. (b) * Elmer 1 [ t1 tricked him1 ] (bound reading, i. e., Elmer tricked himself) (c) * Elmer 1 [ t1 tricked him2 ] (covalued reading: him2 refers to Elmer) Notice that Clause b of Rule I obtains, i. e., Elmer cannot A-bind him. Let us assume, furthermore, that Clause c also obtains, i. e., let us assume that we get indistinguishable interpretations regardless whether we interpret him as bound (as in 7 b) or as covalued with Elmer (as in 7 c). We now have the premises that i) the covalued interpretation is indistinguishable from the bound one (Clause c), and ii) the bound interpretation is ruled out by syntax (Clause b). If we now apply the principle MIO to Clauses b and c, we can conclude that the covalued interpretation is blocked, i. e., that him cannot be covalued 8 DR AF T with Elmer. And indeed, this is the same conclusion as Rule I's verdict on (7 a). Hence Reinhart concludes that MIO explains Rule I and therefore Convergence. Having seen how Reinhart uses MIO to explain Convergence, let me argue that the motivation behind MIO is faulty. The motivation is, in short, that we need MIO to explain the possibility of communication: "The problem . . . is how to minimize the set of possible interpretations of a given PF. The more options there are, the more mysterious is the fact that speakers manage to understand each other" (?:185). But if so, then consider: (8) (a) Eustachio thinks he can do it. (b) Eustachio thinks Eustachio can do it. Suppose there is in the context only one salient referent for Eustachio, that is, only one person so named. Moreover, suppose there are in the context ten salient candidate referents for he. Notice, then, that in order to minimize interpretative options, the speaker should choose to utter not (8 a) but (8 b). This is because (8 b) has only one interpretation available in the context, whereas (8 a) has ten. Moreover, let us now imagine that the speaker did utter (8 b). If so, then the hearer would be puzzled and would perhaps try to update the context by thinking of a second Eustachio. But this is not what we predict from Reinhart's reasoning. According to the reasoning, we should rather expect the hearer to interpret the two occurrences of Eustachio as coreferential, because this would be the best way to minimize interpretative options. Nevertheless, neither speaker nor hearer behave as we would expect on this line of reasoning. 2.2 Grodzinsky and Reinhart 1993 In ?, Grodzinsky and Reinhart proposed to explain Convergence by assuming that speakers and hearers prefer binding to covaluation on grounds of economy. Later, however, Reinhart herself has convincingly rejected the economy explanation (?:183–184, 211–212). 9 DR AF T Here is why Reinhart rejected the economy explanation: If binding is more economical than covaluation and if this causes us to prefer binding, then we ought to always prefer binding. Nevertheless, "both binding and coreference are possible when binding is permitted" (?:212). We know this because we can give 'strict' readings to elliptic sentences like (9 a): (9) (a) Gala loves her guitar, and so does Harriet. (b) ('sloppy') Gala 1 [ t1 loves her1 guitar ] ∧ Harriet 2 [ t2 loves her2 guitar ] (c) ('strict') Gala 1 [ t1 loves her3 guitar ] ∧ Harriet 2 [ t2 loves her3 guitar ] (her3 = Gala) If we had a preference for binding based on economy, then we should prefer the binding interpretation of (9 a), and this should make the strict reading unavailable. The strict reading, however, is available. This seems to refute explanations based on an economydriven preference for binding. Note, finally, that Reinhart is assuming a certain mainstream view of ellipsis. In our terms, this view has it that we can only elide material with the same semantic structure as its antecedent. I am not relying on any view of ellipsis for most of this paper. Nevertheless, for this subsection and for subsection 4.4.3, I will assume the mainstream view. 2.3 Reinhart 1983 Finally, let us examine Reinhart's ?/? explanation of Convergence, which is a modified version of earlier accounts by David Dowty (?) and Elisabet Engdahl (?). According to Reinhart as well as to the earlier accounts, when a speaker eschews binding the speaker pragmatically conveys that he or she does not intend covaluation either.10 Dowty and En10Although on this view the speaker pragmatically conveys how to interpret the DPs, and although the speaker does so by Grice-like mechanisms, it is important to note that this can't be a matter of con10 DR AF T gdahl had derived this from the maxim 'be unambiguous'; they had maintained that binding achieves anaphora unambiguously, hence that if the speaker had intended anaphora then he or she would have chosen binding. To this, Reinhart objects in ? that binding is not unambiguous except for R-pronouns: (10) (a) She saw herself in the mirror. (R-pronoun, binding unambiguous) (b) She thought she was in London. (non-R-pronoun, binding ambiguous) Instead, Reinhart explains the phenomenon not by themaxim 'be unambiguous' but rather by the maxim 'be explicit' (?:75–76). Unfortunately, Reinhart ultimately faces the same issue as Dowty and Engdahl: whenever we bind a non-R-pronoun, as in (10 b), we could have uttered the self-same explicit words yet left the pronoun free. Hence just as binding is only unambiguous for R-pronouns (10 a), that is also the only place where it is explicit. 3 Choosing Short: A Proposed Explanation of Convergence 3.1 Explaining Convergence To adopt Reinhart's Thesis we must explain Convergence. Yet I have argued that extant explanations do not work. We can therefore only adopt Reinhart's thesis if we find a new explanation of Convergence. To this end, consider again (6 a) and (6 b): (6 a) [ Jane 1 [ t1 saw herself1 in the mirror ]] versational implicature. When hearers compute the conversational implicatures of an utterance, they proceed from the premise of 'what is said,' i. e., from the utterance's semantic content. This means that hearers only compute an utterance's implicatures after they determine its semantic content; therefore, hearers cannot depend on implicatures to compute the semantic content. But notice that, at the stage at which (17)'s hearer decides whether to interpret him as covalued with Bugs, the hearer is still involved in resolution and therefore in computing semantic content. Hence at this stage conversational implicature is out of place. 11 DR AF T (6 b) [ Jane 1 [ t1 saw her2 in the mirror ]] (her2 = Jane) I have called the semantic structures of (6 a) and (6 b) short, resp. long. Notice that although (6 a) and (6 b) have different semantic structures, they have the same truth conditions. According to Reinhart, truth-conditionally equivalent semantic structures are usually indistinguishable; more specifically, semantic structure only matters in cases of Divergence. Yet I maintain that if, instead, we adopt the view that semantic structure alwaysmatters, thenwe can use the distinction between short and long structures to explain not just Divergence but Convergence too. Let me argue that we can explain Convergence if we adopt the principle Choose Short: (Choose Short) When a cooperative speaker chooses whether to express a short semantic structure or a corresponding long one, the speaker will default to the short structure unless the long one is favored by contextually overriding purposes.11 When I say that speakers choose short semantic structures I am not implying that they do so consciously. Rather, they choose these structures unconsciously, as is the case with many other Grice-like phenomena. Here is why we can explain Convergence if we adopt Choose Short. Suppose that a speaker unambiguously fails to express a short semantic structure, e. g.: (11) She thought Else was happy. (notice that Else cannot be bound) (12) He looked at him. (notice that him cannot be bound) By principle Choose Short, when a cooperative speaker chooses whether to express a short structure or a truth-conditionally equivalent long one, the speaker will default to 11By a 'corresponding' semantic structure I mean one that involves the same functions applied to the same arguments. 12 DR AF T the short structure (unless the long structure is favored by contextually overriding purposes). Hence if the speaker unambiguously fails to express a short structure, as in (11) or in (12), then the speaker pragmatically conveys that he or she did not express the truthconditionally equivalent long structure either: had the speaker considered expressing the long structure, he or shewould have defaulted to expressing the truth-conditionally equivalent short one, i. e.: (13) She 1 [ t1 thought she1 was happy ] (14) He 1 [ t1 looked at himself1 ] (This, of course, is unless the speaker is visibly motivated to express the long structure by contextually overriding purposes.) Furthermore, when speakers express short structures they use binding, whereas when they express long structures they use (unbound) covaluation: (15) Bugs voted for himself. (short structure, binding) (16) Bugs voted for Bugs. (long structure, covaluation) Suppose, now, that Elmer utters this sentence: (17) Bugs voted for him. Notice that him cannot be bound; hence Elmer unambiguously eschews binding. Thus, Elmer unambiguously fails to express the short structure that he could have expressed by (15). Therefore, given principle Choose Short, Elmer pragmatically conveys that he did not express the truth-conditionally equivalent long structure either (unless he is visibly motivated by contextually overriding purposes). But if Elmer did not express this long structure, then he did not intend Bugs and him to be covalued. Therefore, when Elmer 13 DR AF T unambiguously eschews binding, he pragmatically conveys that he does not intend covaluation either. This explains why covaluation is so often unavailable when binding is prohibited; that is, this explains Convergence. Notice that, like Reinhart in ?/?, I explain Convergence by pragmatic inferences. Yet unlike Reinhart, I derive these inferences not from the maxim 'be explicit,' which we have seen on page 11 is insufficient, but rather from the principle Choose Short. If we are going to invoke pragmatic inferences, then wemust acknowledge an objection presented recently by Pauline Jacobson (?:216–17). Here is the objection: If (17)'s hearer determines by pragmatic inference that Bugs and him do not corefer, then we expect this to be cancelable. According to Jacobson, this means that we should be able to say: (18) Bugs voted for him, that is to say, for himself. Yet we cannot felicitously say this, hence Jacobson concludes that (17)'s hearer does not work out non-coreference by pragmatic inference. Yet I am not convinced by this objection. I agree that we cannot 'cancel' non-coreference by something like (18). But it seems that we can nevertheless cancel it by something like (19). Hence it seems that we shouldn't take Jacobson's worry as a crushing objection: (19) Bugs voted for him. That is to say, Bugs voted for Bugs! Notice, now, that just like Dowty (?) and Engdahl (?), I employ the concept of (non- )ambiguity. Yet unlike the two, I am not assuming that binding is always unambiguous: indeed, binding is not unambiguous for non-R-pronouns. Instead, I am relying on themore modest claim that speakers can eschew binding in some unambiguous ways. Compare: (20) She said she was smart. (binding ambiguous) (21) She said Mary was smart. (binding unambiguously eschewed) 14 DR AF T Contrast the Dowty–Engdahl approach to the way we explain Convergence in case (21): (Dowty–Engdahl–style explanation) If the speaker had intended she andMary to be covalued, the speaker would have made this unambiguous by using a bound pronoun. (But as we see in (10 b), this explanation doesn't work.) (Our explanation) If the speaker had intended she and Mary to be covalued, the speaker would not have unambiguously eschewed binding. Let me now explain what is going on in simple Condition B and Condition C configurations. We distinguish three cases: Case I. C-commanded R-expression12 (22) She said Carol was smart. Notice, first, that R-expressions cannot be bound. This is because R-expressions are referential, they behave semantically like constants rather than variables, and therefore they cannot be evaluated freely as is needed for binding. (Notice, furthermore, that I am not invoking an independent Condition C. According to Condition C, R-expressions cannot be coindexed with certain other expressions. However, I am not claiming that R-expressions can or cannot be coindexed, but merely that they cannot be bound.) Since R-expressions cannot be bound, the speaker of (22) unambiguously eschews binding. But the speaker would not have eschewed binding, had he or she intended she and Carol to be covalued. In that case, the speaker would have obeyed the principle Choose Short, i. e., the speaker would have expressed the truth-conditionally equivalent short structure thus: 12I am using 'R-expression' as an abbreviation for 'referential expression' and 'R-pronoun' as an abbreviation for 'reflexive or reciprocal pronoun.' Since this is standard practice, I hope it is not too confusing. 15 DR AF T (23) She said she was smart. She 1 [ t1 said she1 2 [ t2 was smart ] ] Since the speaker did not do so, he or she pragmatically conveys that the two DPs are not covalued. This blocks the covalued reading and establishes Convergence for Case I. Case II. C-commanded non-R-pronoun with antecedent in the binding domain (24) Dan saw him. Since non-R-pronouns cannot be bound by antecedents in their binding domains, the speaker of (24) unambiguously eschews binding. But the speaker would not have eschewed binding, had he or she intended Dan and him to be covalued. In that case, the speaker would have obeyed the principle Choose Short, i. e., the speaker would have expressed the truth-conditionally equivalent short structure thus: (25) Dan saw himself. Since the speaker did not do so, he or she pragmatically conveys that the two DPs are not covalued. This blocks the covalued reading and establishes Convergence for Case II. Case III. C-commanded non-R-pronoun with antecedent outside the binding domain (26) Greta thinks she is in Italy. Since non-R-pronouns can be bound by antecedents outside their binding domains, the speaker of (26) does not unambiguously eschew binding. Therefore, the speaker does not pragmatically convey that the two DPs are not covalued. This is just as expected, since in this configuration covaluation is available. 16 DR AF T 3.2 Why Choose Short? Thus far I have treated the principle Choose Short as amere assumption; in this subsection I will try to make the principle plausible. I maintain that speakers and hearers alike prefer short structures because short structures are usually more informative than the truthconditionally equivalent long ones. Recall that by definition a structure A is shorter than a structure B if both involve the same functions with the same arguments, yetA carries the information that it represents one and the same entity twice, whereas B does not (see page 5). Since short structures carry this extra bit of information, we can plausibly expect both speakers and hearers to prefer them: if they can use short structures, then they have no reason to impair their cognition with clumsy, roundabout long ones. This suggests two reasons why speakers will usually choose short structures over their long counterparts. For one thing, if speakers are cooperative, they will provide hearers with useful short structures instead of the less useful and potentially confusing long ones. And for another thing, when speakers choose between short and long structures, they are likely to choose the ones that reflect their own propositional attitudes. And just like sentences have semantic structures, it is rather plausible that propositional attitudes have their own psychosemantic structures which can themselves be short or long. Hence speakers will normally entertain propositional attitudes with short psychosemantic structures, attitudes they will express in sentences with equally short semantic structures. Before I illustrate, let me clarify the sense in which short semantic structures 'carry' extra information. I am not claiming that short structures determine propositions that contain extra information; or in other words, I am not claiming that sentences with short structures semantically encode extra information. Rather more modestly, I am claiming that if we grasp the short structures then ipso facto we possess the extra information. 17 DR AF T Let me now illustrate. Consider: (27) (a) Felix thinks that he is a genius. (?:71) (b) Long: Felix 1 [ t1 thinks that he2 is a genius ] (he2 = Felix) (c) Short: Felix 1 [ t1 thinks that he1 is a genius ] Notice, first, that none of the two structures says that Felix believes de se that he is a genius. Nevertheless, Felix can only lack such a belief de se if he mistakes himself for somebody else, and while this is conceptually possible, it doesn't happen too often in real life. Hence if Felix thinks he is a genius, then it is extremely probable that he does believe de se that he is a genius. And notice that unlike the long structure, the short one makes it obvious that we are talking about one and the same person, hence it makes it obvious that Felix is very likely to believe himself de se a genius. And more often than not, we learn more about someone when we learn that he thinks himself (de se) a genius than when we learn that he thinks Felix a genius. When people think themselves geniuses, they are likely to be self-infatuated, whereas when they think Felix a genius, they need not thereby show any character flaw. And even when it happens to be just Felix who thinks Felix a genius, we still learn more from the short structure directly: from the long structure, we must first derive the short one. But perhaps one can object that there is also something that we learn directly from the long structure but not from the short one. For suppose that Felix is very far from genius. Then people who think Felix a genius show poor judgment; and if Felix thinks Felix a genius, then Felix, too, shows poor judgment. Hence we can learn directly from the long structure that Felix shows poor judgment. But notice that we can also learn this directly from the short one: The short and the long structure alike attribute to the ordered pair ⟨Felix, Felix⟩ the two-place relation of thinking one a genius. Hence we can learn in the same number of steps, from the short and from the long structure alike, that Felix thinks 18 DR AF T Felix a genius. And since it is from here that we learn that Felix shows poor judgment, we can learn this just as well from the short structure as from the long one. Having explained why I think cooperative, rational speakers will usually choose short, recall that we have in principle Choose Short one final clause: 'unless the long [structure] is favored by contextually overriding purposes.' Sometimes speakers and hearers pursue special purposes which they rank higher than keeping track of coreferring expressions, purposes they can only pursue if they choose long. In such special cases we should expect speakers to disobey Choose Short and hearers to disregard it. It is beyond the scope of our inquiry to give a general psychological characterization of such special purposes. Nevertheless, we shall see an example in the next subsection and others later. 3.3 Allowing Strict Readings Recall that some explanations of Convergence wrongly block strict readings of sentences like (28): (28) Gala likes her guitar, and so does Harriet. Gala 1 [ t1 likes her3 guitar ] ∧ Harriet 2 [ t2 likes her3 guitar ] (her3 = Gala) We have seen in subsection 2.2 how this affects Grodzinsky's and Reinhart's explanation of Convergence from ?. But we must make sure that we don't suffer the same problem with our own explanation based on principle Choose Short. Principle Choose Short states that speakers have a preference for short structures. This means that, all else being equal, speakers will prefer to express the short semantic structure (29 a) over the long one: (29) (a) (short) Gala λx ( x likes x's guitar ) (b) (long) Gala λx ( x likes y's guitar ) 19 DR AF T And if, in general, we expect the speaker to choose the short structure, then at first we might also expect the hearer to always assume that the speaker has chosen short. Nevertheless, the hearer clearly does not always assume so, because the hearer can access the strict reading of (28). So it seems at first that we might have the same problem as other explanations of Convergence, i. e., we might be wrongly blocking strict readings. Let me nevertheless argue that we aren't blocking these readings. Recall that principle Choose Short provides an exception for cases when the speaker is motivated by contextually overriding purposes to choose long. To be sure, this does not license us to invoke the 'contextually overriding purposes' clause at will and without specifying what those purposes are. Yet in this case we can tell what the relevant purpose is. We have motivated Choose Short by arguing that short structures contain an extra bit of information, to wit, the information that they concern one and the same entity. But suppose, now, that in our context it is irrelevant that Gala likes her own guitar; rather, the speaker only calls it 'her guitar' in order to single out the guitar referred to (recall Smith's murderer in Donnellan ?:286). Had the speaker already singled out the guitar in prior discourse, then the speaker could have made the same point by saying: (30) Gala likes the guitar, and so does Harriet. (Notice that we replaced her with the.) In this context, what is at issue iswho likes the guitar; answering this question is a contextually overriding purpose that takes precedence over the default rule, hence the speaker is no longer expected to choose short. 4 Data Coverage We have already seen in the previous section how to treat simple Condition B and Condition C configurations. In this section I will address a few other topics: language ac20 DR AF T quisition, only-constructions, the identity predicate, Dahl's puzzle, and the Partee/Bach/ Higginbotham problem. 4.1 Acquisition It is often thought that children can tell when binding occurs in the wrong place, at an age at which they cannot yet tell when covaluation does the same (??). This may suggest, as KennethWexler and Yu-Chin Chien argue, that children master Binding Theory before they master Convergence. In ?, Grodzinsky and Reinhart have proposed a well-known explanation of these acquisition data. To this end they have invoked a key assumption behind Reinhart's explanations of Convergence: on this assumption, short and long semantic structures are usually indistinguishable. Onmy approach, however, short and long structures are always distinguishable; it seems therefore that I am at a disadvantage unless I can argue either that a) Grodzinsky and Reinhart have failed to explain the data, or that b) I can explain the data too. And I could plausibly choose option a and argue that Grodzinsky and Reinhart have failed to explain the data, because there are recent arguments and experiments that seem to show this (??). Nevertheless, I will err on the side of caution and choose option b: I will argue that if Grodzinsky and Reinhart can explain the acquisition data, then so can I. To this end, I will argue that we can reuse, mutatis mutandis, Grodzinsky's and Reinhart's explanation of acquisition from ?. Let me first summarize Grodzinsky's and Reinhart's explanation, after which I will argue that we can reuse its essential insight. According to Reinhart, when we determine whether to allow covaluation in a context, we do so thus: First, we construct two logical form representations, one of which involves binding, the other covaluation. Then, we compare the two representations to see whether they are indistinguishable. If and only if the two representations are indistinguishable, we disallow covaluation. It is because 21 DR AF T children lack the cognitive resources to compare the two representations that they do not master Convergence (Grodzinsky and Reinhart, ?:88). At first it is not obvious how to reuse Grodzinsky's and Reinhart's explanation. On their account, children must compare truth-conditionally equivalent logical forms (hence semantic structures) to tell whether they are indistinguishable; on our account, however, truth-conditionally equivalent structures are never indistinguishable. Nevertheless, I maintain that we can safely abstract from this difference. This is because Grodzinsky and Reinhart maintain-and argue rather convincingly-that children are unable to compare truth-conditionally equivalent structures because they lack knowledge of context (Grodzinsky and Reinhart, ?:88–90). But recall what principle Choose Short states: when a cooperative speaker chooses whether to express a short structure or the corresponding long one, the speaker will default to the short one unless the long one is favored by contextually overriding purposes. In the latter case, Divergence is possible; hence children can only judge whether they are in a case of Divergence or Convergence if they can tell whether to apply the 'contextually overriding purposes' clause. And since children must tell whether to apply the 'contextually overriding purposes' clause, they can only apply principle Choose Short if they have knowledge of context. Hence if Grodzinsky and Reinhart can explain the acquisition data by arguing that children lack knowledge of context, then so can we. 4.2 only Let me now address the Divergence cases we often encounter with only constructions: (31) Despite his assurances, I think Jack missed work today. Nobody saw him at the office. Only Jack saw him at the office. (where him refers to Jack) 22 DR AF T Recall that principle Choose Short states thatwhen a cooperative speaker chooseswhether to express a short structure or a truth-conditionally equivalent long one, the speaker will default to the short one (unless the long one is favored by contextually overriding purposes). But notice that, in cases like (31), the speaker does not choose whether to express a long structure or the corresponding short one. This is because there is no corresponding short one: Were the speaker to replace the covalued non-R-pronoun him with the bound R-pronoun himself, that would yield a short structure truth-conditionally distinct from the original long one. For if Jack's colleague Ben has seen himself at the office too, then (32 a) is false yet (32 b) may still be true: (32) (a) Only Jack saw himself at the office. (b) Only Jack saw him at the office. (where him refers to Jack) I have argued that (31)'s speaker is not choosing between a short structure and a long one. This means that the speaker does not fall under the antecedent of principle Choose Short. The speaker, therefore, is not expected to obey the principle, and we do not get the usual pragmatic inferences. This is why, in cases like (31), we may allow Divergence. 4.3 The Identity Predicate We often encounter Divergence in well-known cases involving the identity predicate: (33) If Cicero speaks Latin and if Tully is Cicero, then Tully speaks Latin. This is unsurprising in light of our motivation for principle Choose Short. I have argued that short structures are usually more interesting than the truth-conditionally equivalent long ones. Usually, but not always: in particular, short structures tend to be less interesting when they concern identity. When they concern identity it is long structures that tend to be more interesting, because they imply the extra bit of information that we are 23 DR AF T representing the same entity in two different ways. And note that we get this implication only with the identity predicate, i. e., 'Tully is Cicero,' and not with regular predicates, e. g., 'Tully admires Cicero.' Because of this special feature of the identity predicate, we can apply the 'contextually overriding purposes' clause of principle Choose Short to explain the Divergence that results. 4.4 Dahl's Puzzle and the Partee/Bach/Higginbotham Problem 4.4.1 Preliminaries In this section I take up two difficult problems in binding theory, one of which stems from Östen Dahl (?), the other from Barbara Partee and Emmon Bach (?), resp. from James Higginbotham (?). I will propose a unitary solution to these two problems based on principle Choose Short. To this end I will introduce a more precise formulation of the principle. Thus far I have formulated the principle in terms of semantic structures. But semantic structures can pertain to many things: relative clauses, DP's, VP's, etc. I maintain, however, that Choose Short applies only to those semantic structures that map onto propositions. This is because the point of Choose Short is to make utterances useful, and the point of utterances is to communicate propositional attitudess. Hence we can expect Choose Short to apply not to arbitrary syntactic structures such as DP's, but only to those phrases that represent propositions. Let us therefore reformulate the principle to deal not with semantic structures in general but only with propositional representations: (Choose Short) When a cooperative speaker chooses whether to express a short propositional representation or a corresponding long one, the speaker will default to the short representation unless the long one is favored by contextually overriding purposes. 24 DR AF T Before we go ahead, let me mention that there is in the literature another promising, unitary approach to the two problems. Irene Heim (?) and Danny Fox (?) have proposed to solve the Partee/Bach/Higginbotham (henceforth PBH) problem, resp. Dahl's puzzle, by positing a constraint dubbed by Fox Rule H:13 A pronoun, α, can be bound by an antecedent, β, only if there is no closer antecedent, γ, such that it is possible to bind α by γ and get the same semantic interpretation. (?:115) Rather plausibly, we can solve the two problems just as well if we adopt Heim's and Fox's approach as if we invoke principle Choose Short. Nevertheless, I think we should prefer the solution based on Choose Short. This is for two reasons. First, unlike Rule H, Choose Short isn't posited merely to solve Dahl's puzzle and the PBH problem. Rather, we can also use Choose Short to explain Convergence and Divergence, andwe canmotivate the principle independently through pragmatic considerations concerning cooperation and rationality. Second, as we shall see in the next section, if we adopted Rule H we would make false predictions about the harder versions of Dahl's puzzle proposed recently by Floris Roelofsen (?). This renders Rule H empirically inadequate. In ?, Daniel Büring has proposed to merge Fox's Rule H and Reinhart's Rule I into a principle he calls 'Have Local Binding' (henceforth HLB): For any two NPs α and β, if α could semantically bind β (i. e., if it c-commands β and β is not semantically bound in α's c-command domain already), αmust semantically bind β, unless that changes the interpretation. (?:270) Unlike the original Rule H, Büring's HLB can account not only for Dahl's puzzle and the PBH problem, but also for Convergence and Divergence. Nevertheless, we have two 13Although Fox attributes Rule H to Heim, she doesn't put it this way; she does, however, seem to formulate a version of Condition B to the same (and further) effect. For additional discussion see ?. 25 DR AF T reasons to find Büring's approach unsatisfactory. As Roelofsen argues in ?:125–26, HLB seems to be too restrictive and to wrongly block strict readings. Moreover, as Roelofsen points out at the same place, even though HLB rather elegantly merges Rule I and Rule H, it does not explainwhy either rule should obtain, let alone why both together should. And this means that, as it stands, HLB can (at most) cover the data, but it cannot explain them. 4.4.2 The Partee/Bach/Higginbotham Problem I will first show, in this subsection, how we can invoke principle Choose Short to solve PBH as a problem about binding; then, I will explain how to solve it as a problem about Convergence and Divergence. Let me first illustrate the PBH problem: (34) Jane said she didn't realize she was looking at her in the mirror. We cannot read (34) as meaning that Jane said she (i. e., Jane) didn't realize she (Jane) was looking at her (i. e., at Jane) in the mirror. If we read she as referring to Jane on its second occurrence, then wemust read her as referring to somebody else. Here, then, is the problem: It seems that if we judged based on Binding Theory alone, we would wrongly allow the impossible reading: (35) Jane 1 [ t1 said she1 2 [ t2 didn't realize she2 3 [ t3 was looking at her2 in the mirror ]]] (36) Jane 1 [ t1 said she1 2 [ t2 didn't realize she2 3 [ t3 was looking at her1 in the mirror ]]] Here, now, is how I propose to solve the PBH problem. I have argued in 4.4.1 that principle Choose Short applies to the semantic structures of those phrases that represent 26 DR AF T propositions. Notice, now, that the speaker of (34) expresses not just one, but three propositions; although the speaker only asserts proposition (37 a), he or she also expresses the simpler propositions embedded in (37 a), i. e., (37 b) and (37 c): (37) (a) that Jane said she didn't realize she was looking at her in the mirror, (b) that she didn't realize she was looking at her in the mirror, (c) that she was looking at her in the mirror. Since (37 c) expresses a proposition and since propositional representations are what principle Choose Short applies to, we expect the speaker to choose a short structure not only with respect to (37 a), but also with respect to (37 c).14 This explains why we cannot read (34) either as (35) or as (36), for in each of these, (37 c) is implemented as a long structure. Having seen how to solve the PBH problem, let us examine how Convergence and Divergence interact with it. We shall first take Convergence. Take a new look at (34) and notice that not only can her not be bound by the second occurrence of she, but that it cannot be covalued with it either. This means we are dealing with a case of Convergence. We can easily explain this case in light of our prior discussion: Since principle Choose Short applies to the embedded propositional representation (37 c), if the speaker had intended her to be covalued with the second occurrence of she, then the speaker wouldn't have used her to begin with, but rather the bound R-pronoun herself : (38) Jane said she didn't realize she was looking at herself in the mirror. Having explained Convergence in PBH cases, notice, finally, that we can also encounter Divergence in similar configurations. Suppose, for instance, that Jane has seen herself in the mirror but didn't realize it was her. Then Jane can later make this divergent utterance: 14More generally, we expect principle Choose Short to apply to all the propositional arguments of intensional operators. 27 DR AF T (39) I didn't realize I was looking at me in the mirror. And our speaker can report: (40) Jane said she didn't realize she was looking at her in the mirror. Jane λx ( x said x λy ( y didn't realize y λz ( z was looking at her in the mirror) (where her refers to Jane) Given Jane's unusual confusion, we may apply Choose Short's 'contextually overriding purposes' clause to explain why (40) is allowed. 4.4.3 Dahl's Puzzle We do not, strictly speaking, need to solve Dahl's puzzle in order to explain Convergence and Divergence. This is because in solving Dahl's puzzle we answer not why covaluation is blocked or allowed, but rather why binding is. Nevertheless, I will argue that we can solve Dahl's puzzle by applying principle Choose Short to embedded propositional representations. Here is an illustration of the puzzle: (41) Dahl said he loved his puzzle, and Frege did so too. And here is the puzzle: We can only read (41) in three of the four prima facie possible ways. To wit, we can only read it thus: (42) (a) Dahl 1 [ t1 said he1 2 [ t2 loved his2 puzzle ] ] ∧ Frege 1 [ t1 said he1 2 [ t2 loved his2 puzzle ] ] (b) Dahl 1 [ t1 said he3 2 [ t2 loved his2 puzzle ] ] ∧ Frege 1 [ t1 said he3 2 [ t2 loved his2 puzzle ] ] (where he3 refers to Dahl) 28 DR AF T (c) Dahl 1 [ t1 said he1 2 [ t2 loved his3 puzzle ] ] ∧ Frege 1 [ t1 said he1 2 [ t2 loved his3 puzzle ] ] (where his3 refers to Dahl) Here is how we cannot read (41): (43) Dahl 1 [ t1 said he3 loved his1 puzzle ] ] ∧ Frege 1 [ t1 said he3 loved his1 puzzle ] ] (where he3 refers to Dahl) Before I show how to solve the puzzle, recall, oncemore, that we do not need to solve the Dahl puzzle in order to explain Convergence and Divergence. Hence if it should turn out thatwe cannot solveDahl's puzzle theway I propose, this would not affect our explanation of Convergence and Divergence. With this in mind, let us address the puzzle. Notice that the speaker of (41) expresses not just the main proposition, but also the embedded proposition that he (Dahl) loved his puzzle. Since Choose Short applies to propositional representations in general, it also applies to the embedded propositional representation [TP he loved his puzzle ]. Hence the speaker is expected to choose short for the embedded propositional representation, i. e., to let his be bound by he. In (43), however, his is bound not by he, but rather by Dahl; hence in (43) the embedded propositional representation is not short but long. This is why we cannot read (41) as (43). Here, though, is a potential objection to our line of reasoning: It may seem that while explaining why we cannot read (41) as (43), I have also predicted (wrongly) that we cannot read (41) as (42 c). This may seem so because in (42 c), his is not bound at all and therefore not bound by he, which means that the embedded propositional representation is long. This seems at first to contradict principle Choose Short as applied to embedded propositional representations. Recall, however, that the principle states that when a cooperative speaker chooses whether to express a short propositional representation or a 29 DR AF T truth-conditionally equivalent long one, the speaker will default to the short representation unless the long one is favored by contextually overriding purposes. Hence if we can argue that, in (42 c), the long representation is favored by contextually overriding purposes, then we are no longer forced to wrongly predict that we cannot read (41) as (42 c). Here is why I believe that, in (42 c), the long representation is favored by contextually overriding purposes: First, although (42 c) is an acceptable reading, it is also, as Dahl himself puts it, "rather dubious" (?:9). This suggests (but doesn't prove) that we are dealing with a special kind of context. And second, recall the phenomenon we have discussed in subsection 3.3. There, I argued that the speaker needn't choose short when in a certain kind of situation. I maintain that (42 c) is a situation of that kind. Notice that (42 c)'s speaker is only concerned with Dahl, with Frege, and with whether they like the puzzle. It seems to make no difference that Dahl happens to like his own puzzle. Rather, the speaker seems to only call the puzzle 'his puzzle' in order to single it out for reference. And indeed, had the speaker already referred to the puzzle in prior discourse, then he or she could have said just as well: (44) Dahl said he loved the puzzle, and Frege did so too. (Notice that we replaced his with the.) Recall, now, why I am arguing that speakers are usually expected to choose short propositional representations over the truth-conditionally equivalent long ones: I am arguing so because short propositional representations encode the extra bit of information that we are dealing with one and the same entity. Nevertheless, in contexts such as ours, where it doesn't quite matter that Dahl and his refer to one and the same entity, the speaker need not choose the short representation and is therefore free to express the 'dubious' but nevertheless acceptable (42 c). 30 DR AF T 4.5 The Roelofsen Variations In ?, Floris Roelofsen has introduced a number of harder versions of Dahl's puzzle, which I will call the Roelofsen Variations. Roelofsen has argued that Danny Fox's extant Rule H approach fails on these harder versions, and he has proposed to solve both the original Dahl puzzle and the variations by positing the principle of Free Variable Economy (FVE). By invoking FVE, Roelofsen can account for the Dahl puzzle as well as for most of the variations; with additional assumptions, he can account for all. I will argue, nevertheless, that FVE is insufficiently motivated and, more importantly, that it makes false predictions about very simple Condition B configurations; furthermore, I will argue that we can use Choose Short to account for the Roelofsen Variations ourselves. 4.5.1 Roelofsen's Free Variable Economy Approach According to Roelofsen's principle FVE, "[a] logical form constituent is illicit if it has a more economical alternative," i. e., if it has an alternative that contains fewer free variables (?:688). In Roelofsen's terms, a variable is the same as a binding index, and a variable is free within a logical form constituent if it has no binder within that constituent. I will argue, first, that FVE is not well motivated, and second, that FVE makes the wrong predictions in Condition B configurations. Let me first illustrate Roelofsen's concept of a free variable. Take (45): (45) Bugs said he voted for himself. Bugs 1 [ t1 said he1 2 [ t2 voted for himself2 ]] Consider the logical form constituents: (46) [ t2 voted for himself2 ] (47) [ he1 2 [ t2 voted for himself2 ]] 31 DR AF T In Roelofsen's sense we have in (46) exactly one free variable, to wit, the binding index 2. In the larger logical form constituent (47) we also have just one free variable, this time the binding index 1. (Notice that we no longer have a free variable in the binding index 2.) Finally, in the matrix sentence (45) we have no free variables at all. Let me now argue that FVE is insufficiently motivated. First, as I have mentioned, Roelofsen identifies a free variable with a free binding index. Moreover, Roelofsen does not assign binding indexes to referential pronouns, hence he does not count referential pronouns as free variables. This is indeed what he needs to say in order to correctly cover the Dahl puzzle. But it is unclear why this should be so; if we have a good reason to economize free binding indices, then it is not clear whywe shouldn't have the same reason to also economize referential pronouns. If, however, Roelofsen were to extend FVE to referential pronouns, this would undermine his solution to Dahl's puzzle. And here is the second reason why FVE is insufficiently motivated. Recall that Roelofsen identifies variables not with pronouns and traces as such, but rather with their binding indices. Hence if in a constituent we have multiple pronouns and traces with the same binding index, they all count as one single variable. Again, this is what Roelofsen needs to say to cover the Dahl puzzle. But it is not clear why we should economize only binding indices and not the pronouns and traces that carry them. It is not clear why we should expect, say, three pronouns to be as cheap as one single pronoun with respect to cognitive load, to ambiguity, or in general to whatever cost it is that we pay for free variables. Nevertheless, Roelofsen is not free to amend this point or to leave it unspecified. It seems, therefore, that FVE is tailor-made to systematize the data and lacks independent motivation and explanatory power. Having seen why I find FVE insufficiently motivated, let me argue that the view also makes mistaken predictions about Condition B configurations. Take the sentence (48): 32 DR AF T (48) Jane said she was hungry. (where she refers to Jane) This sentence admits a 'sloppy' reading and a 'strict' reading; to wit, the pronoun she can either be bound by Jane or it can refer to Jane deictically: (49) sloppy: Jane λx (x said x was hungry) strict: Jane λx (x said she was hungry) (she = Jane) I will argue, nevertheless, that FVE would block the sloppy reading. Hence it seems that Roelofsen must either give up FVE or deny that sloppy readings are possible. The latter would be quite implausible, and a major departure from current theory. Here is the argument. Take the following two LF's: (50) [ Jane 1 [ t1 said she was hungry ]] (51) [ Jane 1 [ t1 said she1 was hungry ]] From (50), take the logical form constituent [ she was hungry ]; from (51), the corresponding constituent [ she1 was hungry ]. Notice that in Roelofsen's sense the former constituent contains zero free variables, whereas the latter contains one-the binding index 1. Even though the former constituent does a contain a pronoun, this pronoun is not a bound one, and therefore according to Roelofsen it has no binding index and it does not count as a variable in the relevant sense. Recall, now, that according to FVE, "[a] logical form constituent is illicit if it has a more economical alternative," that is, an alternative with fewer free variables (?:688).15 This means that the relevant constituent from (51) is illicit, hence we cannot generate the sloppy reading from (51). 15In more detail, "Let Σ and Π be alternatives. Then we say that Π is more economical than Σ if and only if some subconstituent Π' of Π contains fewer free variables than the corresponding subconstituent Σ' of Σ" (?:688). Furthermore, "[t]wo logical form constituents are alternatives if and only if they are (a) semantically equivalent and (b) formally identical modulo binding indices on pronouns" (?:686). Notice, importantly, that it is onlyΣ andΠ that must be alternatives in this sense, notΣ' andΠ'. Moreover, this cannot be revised, because it is essential to Roelofsen's solution to Dahl's puzzle, as we can see, e. g., on p. 689. 33 DR AF T Let us see whether we can generate the sloppy reading from other logical forms. Suppose we allowed the two pronouns to undergo QR: (52) [ Jane 1 [ t1 said she 2 [ t2 was hungry ]]] (53) [ Jane 1 [ t1 said she1 2 [ t2 was hungry ]]] Even so, however, the sloppy reading is still blocked by the strict one. From (52) take the constituent [ she 2 [ t2 was hungry ]]; from (53), take [ she1 2 [ t2 was hungry ]]. Once again, we find that the former constituent contains zero free variables, whereas the latter contains one-the binding index 1. Hence by FVE the sloppy constituent is illicit and (53) is blocked by (52). In order to generate a sloppy LF that doesn't get blocked by the corresponding strict one, we would need to QR the pronouns even higher up. I will argue, nevertheless, that the pronouns may not raise that high. Here is how the pronouns would need to raise: (54) [ Jane 1 [ she 2 [ t1 said t2 was hungry ]]] (55) [ Jane 1 [ she1 2 [ t1 said t2 was hungry ]]] In (54) and (55) we can no longer find any logical form constituents that block each other. And since she1 cannot raise any higher than its binder, (55) is the only LF that will allow Roelofsen to uphold FVE without running afoul of sloppy readings. I maintain, however, that DP's cannot QR as in (55). If they could, they would generate readings that are in fact unavailable. Let us replace she with no one: (56) Jane said no one was hungry. (57) 34 DR AF T If DP's could QR as in (55), then we could access a reading that corresponds to the following LF: (58) [ Jane 1 [ [no one] 2 [ t1 said t2 was hungry ]]] (58) is true if and only if Jane said of no one that he or she were hungry. Yet (56) is true if and only if Jane said that no one was hungry. This means that (58) gets the scope wrong. Here is a model to illustrate: In the universe there are n individuals, a1, a2, . . . an. Jane has never said that no one was hungry. At the same time, for every i ∈ 1, n, Jane has never said that ai was hungry. In this model, (56) is false yet (58) is true. QED. We see thus that we cannot reconcile FVE with the possibility of binding in Condition B configurations. This is a strong reason to reject FVE. 4.5.2 Explaining the Roelofsen Variations Recall that Roelofsen has motivated FVE as the solution to an array of harder versions of Dahl's puzzle. Let me argue, however, that we can solve the Roelofsen Variations in the framework of our principle Choose Short. This passage is work in progress. 5 Conclusion If semantic structure always matters and if speakers default to expressing what I have called short semantic structures, then we can deploy semantic structure to explain not only, like Reinhart, the divergence in the distribution patterns of binding and intended covaluation, but the convergence too. Hence if semantic structure always matters and if 35 DR AF T speakers default to short semantic structures, then we have a promising solution to the problem of anaphora. What we must still determine is primarily this: do speakers default on short semantic structures? I have argued that they do; nevertheless, although I hope to have made this plausible, we still need to investigate whether it is true. 36 DR AF T