Learning theory has frequently been applied to language acquisition, but discussion has largely focused on information theoretic problems—in particular on the absence of direct negative evidence. Such arguments typically neglect the probabilistic nature of cognition and learning in general. We argue first that these arguments, and analyses based on them, suffer from a major flaw: they systematically conflate the hypothesis class and the learnable concept class. As a result, they do not allow one to draw significant conclusions about the learner. (...) Second, we claim that the real problem for language learning is the computational complexity of constructing a hypothesis from input data. Studying this problem allows for a more direct approach to the object of study—the language acquisition device—rather than the learnable class of languages, which is epiphenomenal and possibly hard to characterize. The learnability results informed by complexity studies are much more insightful. They strongly suggest that target grammars need to be objective, in the sense that the primitive elements of these grammars are based on objectively definable properties of the language itself. These considerations support the view that language acquisition proceeds primarily through data-driven learning of some form. (shrink)
In this paper, we explore the possibility that machine learning approaches to naturallanguage processing being developed in engineering-oriented computational linguistics may be able to provide specific scientific insights into the nature of human language. We argue that, in principle, machine learning results could inform basic debates about language, in one area at least, and that in practice, existing results may offer initial tentative support for this prospect. Further, results from computational learning theory can inform arguments carried on within linguistic theory (...) as well. (shrink)
The question of whether humans represent grammatical knowledge as a binary condition on membership in a set of well-formed sentences, or as a probabilistic property has been the subject of debate among linguists, psychologists, and cognitive scientists for many decades. Acceptability judgments present a serious problem for both classical binary and probabilistic theories of grammaticality. These judgements are gradient in nature, and so cannot be directly accommodated in a binary formal grammar. However, it is also not possible to simply reduce (...) acceptability to probability. The acceptability of a sentence is not the same as the likelihood of its occurrence, which is, in part, determined by factors like sentence length and lexical frequency. In this paper, we present the results of a set of large-scale experiments using crowd-sourced acceptability judgments that demonstrate gradience to be a pervasive feature in acceptability judgments. We then show how one can predict acceptability judgments on the basis of probability by augmenting probabilistic language models with an acceptability measure. This is a function that normalizes probability values to eliminate the confounding factors of length and lexical frequency. We describe a sequence of modeling experiments with unsupervised language models drawn from state-of-the-art machine learning methods in natural language processing. Several of these models achieve very encouraging levels of accuracy in the acceptability prediction task, as measured by the correlation between the acceptability measure scores and mean human acceptability values. We consider the relevance of these results to the debate on the nature of grammatical competence, and we argue that they support the view that linguistic knowledge can be intrinsically probabilistic. (shrink)
In this paper we address an important issue in the development of an adequate formal theory of underspecified semantics. The tension between expressive power and computational tractability poses an acute problem for any such theory. Generating the full set of resolved scope readings from an underspecified representation produces a combinatorial explosion that undermines the efficiency of these representations. Moreover, Ebert (2005) shows that most current theories of underspecified semantic representation suffer from expressive incompleteness. In previous work we present an account (...) of underspecified scope representations within Property Theory with Curry Typing (PTCT), an intensional first-order theory for natural language semantics. We review this account, and we show that filters applied to the underspecified-scope terms of PTCT permit expressive completeness. While they do not solve the general complexity problem, they do significantly reduce the search space for computing the full set of resolved scope readings in non-worst cases. We explore the role of filters in achieving expressive completeness, and their relationship to the complexity involved in producing full interpretations from underspecified representations. This paper is dedicated to Jim Lambek. (shrink)
The question of whether grammaticality is a binary categorical or a gradient property has been the subject of ongoing debate in linguistics and psychology for many years. Linguists have tended to use constructed examples to test speakersâ€™ judgements on specific sorts of constraint violation. We applied machine translation to randomly selected subsets of the British National Corpus (BNC) to generate a large test set which contains well-formed English source sentences, and sentences that exhibit a wide variety of grammatical infelicities. We (...) tested a large number of speakers through (filtered) crowd sourcing, with three distinct modes of classification, one binary and two ordered scales. We found a high degree of correlation in mean judgements for sentences across the three classification tasks. We also did two visual image classification tasks to obtain benchmarks for binary and gradient judgement patterns, respectively. Finally, we did a second crowd source experiment on 100 randomly selected linguistic textbook example sentences. The sentence judgement distributions for individual speakers strongly resemble the gradience benchmark pattern. This evidence suggests that speakers represent grammatical well-formedness as a gradient property. (shrink)
We formulate a Curry-typed logic with fine-grained intensionality within Turner’s typed predicate logic. This allows for an elegant presentation of a theory that corresponds to Fox and Lappin’s property theory with curry typing, but without the need for a federation of languages. We then consider how the fine-grained intensionality of this theory can be given an operational interpretation. This interpretation suggests itself as expressions in the theory can be viewed as terms in the untyped lambda-calculus, which provides a model of (...) computation. (shrink)
In this chapter we consider unsupervised learning from two perspectives. First, we briefly look at its advantages and disadvantages as an engineering technique applied to large corpora in natural language processing. While supervised learning generally achieves greater accuracy with less data, unsupervised learning offers significant savings in the intensive labour required for annotating text. Second, we discuss the possible relevance of unsupervised learning to debates on the cognitive basis of human language acquisition. In this context we explore the implications of (...) recent work on grammar induction for poverty of stimulus arguments that purport to motivate a strong bias model of language learning, commonly formulated as a theory of Universal Grammar (UG). We examine the second issue both as a problem in computational learning theory, and with reference to empirical work on unsupervised Machine Learning (ML) of syntactic structure. We compare two models of learning theory and the place of unsupervised learning within each of them. Looking at recent work on part of speech tagging and the recognition of syntactic structure, we see how far unsupervised ML methods have come in acquiring different kinds of grammatical knowledge from raw text. (shrink)
The tension between expressive power and computational tractability poses an acute problem for theories of underspeciﬁed semantic representation. In previous work we have presented an account of underspeciﬁed scope representations within Property Theory with Curry Typing, an intensional ﬁrst-order theory for natural language semantics. Here we show how ﬁlters applied to the underspeciﬁed-scope terms of PTCT permit both expressive completeness and the reduction of computational complexity in a signiﬁcant class of non-worst case scenarios.
We present Property Theory with Curry Typing (PTCT), an intensional first-order logic for natural language semantics. PTCT permits fine-grained specifications of meaning. It also supports polymorphic types and separation types. We develop an intensional number theory within PTCT in order to represent proportional generalized quantifiers like âmost.â We use the type system and our treatment of generalized quantifiers in natural language to construct a type-theoretic approach to pronominal anaphora that avoids some of the difficulties that undermine previous type-theoretic analyses of (...) this phenomenon. (shrink)
The question of whether it is possible to characterise grammatical knowledge in probabilistic terms is central to determining the relationship of linguistic representation to other cognitive domains. We present a statistical model of grammaticality which maps the probabilities of a statistical model for sentences in parts of the British National Corpus (BNC) into grammaticality scores, using various functions of the parameters of the model. We test this approach with a classifier on test sets containing different levels of syntactic infelicity. With (...) appropriate tuning, the classifiers achieve encouraging levels of accuracy. These experiments suggest that it may be possible to characterise grammaticality judgements in probabilistic terms using an enriched language model. (shrink)
Machine learning and statistical methods have yielded impressive results in a wide variety of natural language processing tasks. These advances have generally been regarded as engineering achievements. In fact it is possible to argue that the success of machine learning methods is signiﬁcant for our understanding of the cognitive basis of language acquisition and processing. Recent work in unsupervised grammar induction is particularly relevant to this issue. It suggests that knowledge of language can be achieved through general learning procedures, and (...) that a richly articulated language faculty is not required to explain its acquisition. (shrink)
We present an approach to anaphora and ellipsis resolution in which pronouns and elided structures are interpreted by the dynamic identiﬁcation in discourse of type constraints on their semantic representations. The content of these conditions is recovered in context from an antecedent expression. The constraints deﬁne separation types in Property Theory with Curry Typing, an expressive ﬁrst-order logic with Curry typing that we have proposed as a formal framework for natural language semantics.
The paper presents Property Theory with Curry Typing (PTCT) where the language of terms and well-formed formulæ are joined by a language of types. In addition to supporting fine-grained intensionality, the basic theory is essentially first-order, so that implementations using the theory can apply standard first-order theorem proving techniques. Some extensions to the type theory are discussed, type polymorphism, and enriching the system with sufficient number theory to account for quantifiers of proportion, such as “most.”.
I compare several types of knowledge-based and knowledge-poor approaches to anaphora and ellipsis resolution. The former are able to capture ﬁne-grained distinctions that depend on lexical meaning and real world knowledge, but they are generally not robust. The latter show considerable promise for yielding wide coverage systems. However, they consistently miss a small but signiﬁcant subset of cases that are not accessible to rough-grained techniques of intepretation. I propose a sequenced model which ﬁrst applies the most computationally eﬃcient and inexpensive (...) methods to resolution and then progresses successively to more costly techniques to deal with cases not handled by previous modules. Conﬁdence measures evaluate the judgements of each component in order to determine which instances of anaphora or ellipsis are to be passed on to the next, more ﬁne-grained subsystem. (shrink)
In this paper we investigate the use of machine learning techniques to classify a wide range of non-sentential utterance types in dialogue, a necessary ﬁrst step in the interpretation of such fragments. We train different learners on a set of contextual features that can be extracted from PoS information. Our results achieve an 87% weighted f-score—a 25% improvement over a simple rule-based algorithm baseline.
The paper presents Property Theory with Curry Typing where the language of terms and well-formed formulæ are joined by a language of types. In addition to supporting ﬁne-grained intensionality, the basic theory is essentially ﬁrst-order, so that implementations using the theory can apply standard ﬁrst-order theorem proving techniques. The paper sketches a system of tableau rules that implement the theory. Some extensions to the type theory are discussed, including type polymorphism, which provides a useful analysis of conjunctive terms. Such terms (...) can be given a single polymorphic type that expresses the fact that they can conjoin phrases of any one type, yielding an expression of the same type. (shrink)
Indirect negative evidence is clearly an important way for learners to constrain overgeneralisation, and yet a good learning theoretic analysis has yet to be provided for this, whether in a PAC or a probabilistic identification in the limit framework. In this paper we suggest a theoretical analysis of indirect negative evidence that allows the presence of ungrammatical strings in the input and also accounts for the relationship between grammaticality/acceptability and probability. Given independently justified assumptions about lower bounds on the probabilities (...) of grammatical strings, we establish that a limited number of membership queries of some strings can be probabilistically simulated. (shrink)
The paper presents Property Theory with Curry Typing (PTCT) where the language of terms and well-formed formulæ are joined by a language of types. In addition to supporting fine-grained intensionality, the basic theory is essentially first-order, so that implementations using the theory can apply standard first-order theorem proving techniques. The paper sketches a system of tableau rules that implement the theory. Some extensions to the type theory are discussed, including type polymorphism, which provides a useful analysis of conjunctive terms. Such (...) terms can be given a single polymorphic type that expresses the fact that they can conjoin phrases of any one type, yielding an expression of the same type. (shrink)
Much previous work on generation has focused on the general problem of producing lexical strings from abstract semantic representations. We consider generation in the context of a particular task, creating full sentential paraphrases of fragments in dialogue. When the syntactic, semantic and phonological information provided by a dialogue fragment resolution system is made accessible to a generation component, much of the indeterminacy of lexical selection is eliminated.
In Fox and Lappin (2005a) we propose Property Theory with Curry Typing (PTCT) as a formal framework for the semantics of natural language. PTCT allows finegrained distinctions of meaning without recourse to modal notions like (im)possible worlds. It also supports a unified dynamic treatment of pronominal anaphora and VP ellipsis, as well as related phenomena such as gapping and pseudo-gapping.
Intensional logic (IL) and its application to natural language, which the present monograph addresses, was ﬁrst developed by Richard Montague in the late 1960s (e.g., Montague 1970a, 1970b). Through the efforts of (especially) Barbara Partee (e.g., Partee 1975, 1976), and Richmond Thomason, who edited the posthumous collection of Montague’s works (Thomason 1974), this became the main framework for those who aspired to a formal semantic theory for natural language, and these included computational linguists as early as Jerry Hobbs in the (...) late 1970s (e.g., Hobbs and Rosenschein 1977). In fact, until the advent of the current interest in statistical linguistics with its own conception of what semantics is, IL, or some variant of it, was perhaps the main theory of semantics within computational linguistics generally. And within current computational semantics it still is. But over the years, philosophers, linguists, and computational linguists have noted a variety of shortcomings in Montague’s version of IL. Montague deﬁned intensions as functions from possible worlds to extensions in that world. But this had the effect of making logically equivalent expressions have the same intension, thus leading to the problem of “logical omniscience” (believing/knowing all the logical consequences of what is believed/known). Montague had based his IL on Church’s simple theory of types (Church 1940), supplemented with intensions of each type. But this implies that each natural language item accepts only arguments of some one ﬁxed type. However, this is not true for natural language, where conjunctions, verbs, and pretty much any functional term that accepts arguments at all can accept arguments of different types. (For example, and can accept arguments that are of the sentence type, of the verb phrase type, of the adjective type, etc.; and indeed, it can accept arguments of differing types in its different argument.. (shrink)
The capacity to recognise and interpret sluices—bare wh-phrases that exhibit a sentential meaning—is essential to maintaining cohesive interaction between human users and a machine interlocutor in a dialogue system. In this paper we present a machine learning approach to sluice disambiguation in dialogue. Our experiments, based on solid theoretical considerations, show that applying machine learning techniques using a compact set of features that can be automatically identified from PoS markings in a corpus can be an efficient tool to disambiguate between (...) sluice interpretations. (shrink)
In previous work we have developed Property Theory with Curry Typing (PTCT), an intensional first-order logic for natural language semantics. PTCT permits fine-grained specifications of meaning. It also supports polymorphic types and separation types. We develop an intensional number theory within PTCT in order to represent proportional generalized quantifiers like "most", and we suggest a dynamic type-theoretic approach to anaphora and ellipsis resolution. Here we extend the type system to include product types, and use these to define a permutation function (...) that generates underspecified scope representations within PTCT. We indicate how filters can be added to encode constraints on possible scope readings. Our account offers several important advantages over other current theories of underspecification. (shrink)
A BSTRACT. We present Property Theory with Curry Typing, an intensional ﬁrst-order logic for natural language semantics. PTCT permits ﬁne-grained speciﬁcations of meaning. It also supports polymorphic types and separation types.1 We develop an intensional number theory within PTCT in order to represent proportional generalized quantiﬁers like most. We use the type system and our treatment of generalized quantiﬁers in natural language to construct a typetheoretic approach to pronominal anaphora that avoids some of the difﬁculties that undermine previous type-theoretic analyses (...) of this phenomenon. (shrink)
On the Fregean view of NP's, quantified NP's are represented as operator-variable structures while proper names are constants appearing in argument position. The Generalized Quantifier approach characterizes quantified NP's and names as elements of a unified syntactic category and semantic type. According to the Logicality Thesis, the distinction between quantified NP's, which undergo an operation of quantifier raising to yield operator-variable structures at Logical Form and non-quantified NP's, which appear in situ at LF, corresponds to a difference in logicality status. (...) The former are logical expressions while the latter are not. Using van Benthem's [2, 3] criterion for logicality, I extend the concept of logicality to GQ's. I argue that NP's modified by exception phrases constitute a class of quantified NP's which are heterogeneous with respect to logicality. However, all exception phrase NP's exhibit the syntactic and semantic properties which motivate May to treat quantified NP's as operators at LF. I present a semantic analysis of exception phrases as modifiers of GQ's, and I indicate how this account captures the central semantic properties of exception phrase NP's. I explore the consequences of the logically heterogeneous character of exception phrase NP's for proof theoretic accounts of quantifiers in natural language. The proposed analysis of exception phrase NP's provides support for the GQ approach to the syntax and semantics of NP's. (shrink)
A major challenge for any grammar-driven text understanding system is the resolution of fragments. Basic examples include bare NP answers (1a), where the bare NP John is resolved as the assertion John saw Mary, and sluicing (1b), where the wh-phrase who is interpreted as the question Which student saw John.