In Sebastian Thrun, Lawrence K. Saul, and Bernhard Schölkopf (eds), Advances in Neural Information Processing Systems 16 (NIPS 2003). Cambridge, MA: MIT Press, pp. 497-504.
Dictionary-making is an increasingly important avenue for cultural preservation and maintenance for Aboriginal people. It is also one of the main jobs performed by linguists working in Aboriginal communities. However, current tools for making dicitionaries are either not specifically designed for the purpose (Word, Nisus), with the result that dictionaries written in them are difficult to maintain, to keep consistent, and to manipulate automatically, or are too complex for many people to use (Shoebox), and are thereby wasted as potential resources. (...) Moreover, neither of these sets of tools provides a suitable user interface for people who simply want to browse or find words in a dictionary. We set out to design a dictionary 'template', written in software that was easy (and fun!) for people to use, and that maintained a consistent relationship among the information in the dictionary. (shrink)
The website Rotten Tomatoes, located at www.rottentomatoes.com, is primarily an online repository of movie reviews. For each movie review document, the site provides a link to the full review, along with a brief description of its sentiment. The description consists of a rating (“fresh” or “rotten”) and a short quotation from the review. Other research (Pang, Lee, & Vaithyanathan 2002) has predicted a movie review’s rating from its text. In this paper, we focus on the quotation, which is a main (...) attraction to site users. A Rotten Tomatoes quotation is typically about one sentence in length and expresses concisely the reviewer’s opinion of the movie. To illustrate, Curtis Edmonds’s review of the documentary Spellbound is encapsulated, “Hitchcock couldn’t have asked for a more suspenseful situation.” A.O. Scott’s review of Once upon a Time in Mexico is encapsulated, “A noisy, unholy mess, with moments of wit and surprise that ultimately make its brutal tedium all the more disappointing.” A reader can infer from these statements whether or not the overall sentiment is favorable, and get an impression about why. Consequently, we refer to them as sentiment summaries. (shrink)
The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. A treebank is a (typically hand-built) collection of natural language utterances and associated linguistic analyses; typical treebanks—as for example the widely recognized Penn Treebank (Marcus, Santorini, & Marcinkiewicz, 1993), the Prague Dependency Treebank (Hajic, 1998), or the German TiGer Corpus (Skut, Krenn, Brants, & Uszkoreit, 1997)—assign syntactic phrase structure or tectogrammatical dependency trees over sentences taken from a naturally-occuring source, often newspaper (...) text. Applications of existing treebanks fall into two broad categories: (i) use of an annotated corpus in empirical linguistics as a source of structured language data and distributional patterns and (ii) use of the treebank for the acquisition (e.g. using stochastic or machine learning approaches) and evaluation of parsing systems. While several medium- to large-scale treebanks exist for English (and some for other major languages), all pre-existing publicly available resources exhibit the following limitations: (i) the depth of linguistic information recorded in these treebanks is comparatively shallow, (ii) the design and format of linguistic representation in the treebank hard-wires a small, predefined range of ways in which information.. (shrink)
Dictionaries have long been seen as an essential contribution by linguists to work on endangered languages. We report on preliminary investigations of actual dictionary usage and usability by 76 speakers, semi-speakers and learners of Australian Aboriginal languages. The dictionaries include: electronic and printed bilingual Warlpiri-English dictionaries, a printed trilingual Alawa-Kriol- English dictionary, and a printed bilingual Warumungu-English dictionary. We examine competing demands for completeness of coverage and ease of access, and focus on the prospects of electronic dictionaries for solving many (...) traditional problems, based in particular on observations on the usability of a prototype interface developed in our project. The flexibility of computer interfaces can help accommodate different needs including those of speakers with emerging literacy skills, but they are not useful in communities where computer access is generally unavailable. (shrink)
Linguists have seen creating dictionaries of endangered languages as a key activity in language maintenance and revival work. However, like any approach to language engineering, there are concerns to address. The first is the tension between language documentation and language maintenance2. The second is the role of literacy. A lot of effort has been put into vernacular literacy, on the assumption that it assists language maintenance, as well as language documentation. In some respects this is a dubious assumption, because writing (...) a language does not necessarily lead to speaking it or maintaining the language. Moreover, in some cases putting effort into writing the language can detract from efforts to encourage learners to speak the language. It is certain that much more effort should be put into oral language development. (shrink)
We present a machine learning approach to robust textual inference, in which parses of the text and the hypothesis sentences are used to measure their asymmetric “similarity”, and thereby to decide if the hypothesis can be inferred. This idea is realized in two different ways. In the first, each sentence is represented as a graph (extracted from a dependency parser) in which the nodes are words/phrases, and the links represent dependencies. A learned, asymmetric, graph-matching cost is then computed to measure (...) the similarity between the text and the hypothesis. In the second approach, the text and the hypothesis are parsed into the logical formula-like representation used by (Harabagiu et al., 2000). An abductive theorem prover (using learned costs for making different types of assumptions in the proof) is then applied to try to infer the hypothesis from the text, and the total “cost” of proving the hypothesis is used to decide if the hypothesis is entailed. (shrink)
We initially describe a feature-rich discriminative Conditional Random Field (CRF) model for Information Extraction in the workshop announcements domain, which offers good baseline performance in the PASCAL shared task. We then propose a method for leveraging domain knowledge in Information Extraction tasks, scoring candidate document labellings as one-value-per-field templates according to domain feasibility after generating sample labellings from a trained sequence classifier. Our relational models evaluate these templates according to our intuitions about agreement in the domain: workshop acronyms should resemble (...) their names, workshop dates occur after paper submission dates. These methods see a 5% f-score improvement in fields retrieved when sampling labellings from a Maximum-Entropy Markov Model, however we do not observe improvement over a CRF model. We discuss reasons for this, including the problem of recovering all field instances from a best template, and propose future work in adapting such a model to the CRF, a better standalone system. (shrink)
Phrase re-ordering is a well-known obstacle to robust machine translation for language pairs with significantly different word orderings. For Arabic-English, two languages that usually differ in the ordering of subject and verb, the subject and its modifiers must be accurately moved to produce a grammatical translation. This operation requires more than base phrase chunking and often defies current phrase-based statistical decoders. We present a conditional random field sequence classi- fier that detects the full scope of Arabic noun phrase subjects in (...) verb-initial clauses at the Fβ=1 61.3% level, a 5.0% absolute improvement over a statistical parser baseline. We suggest methods for integrating the classifier output with a statistical decoder and present preliminary machine translation results. (shrink)
A significant portion of the world’s text is tagged by readers on social bookmarking websites. Credit attribution is an inherent problem in these corpora because most pages have multiple tags, but the tags do not always apply with equal specificity across the whole document. Solving the credit attribution problem requires associating each word in a document with the most appropriate tags and vice versa. This paper introduces Labeled LDA, a topic model that constrains Latent Dirichlet Allocation by defining a one-to-one (...) correspondence between LDA’s latent topics and user tags. This allows Labeled LDA to directly learn word-tag correspondences. We demonstrate Labeled LDA’s improved expressiveness over traditional LDA with visualizations of a corpus of tagged web pages from del.icio.us. Labeled LDA outperforms SVMs by more than 3 to 1 when extracting tag-specific document snippets. As a multi-label text classifier, our model is competitive with a discriminative baseline on a variety of datasets. (shrink)
How can the development of ideas in a scientific field be studied over time? We apply unsupervised topic modeling to the ACL Anthology to analyze historical trends in the field of Computational Linguistics from 1978 to 2006. We induce topic clusters using Latent Dirichlet Allocation, and examine the strength of each topic over time. Our methods find trends in the field including the rise of probabilistic methods starting in 1988, a steady increase in applications, and a sharp decline of research (...) in semantics and understanding between 1978 and 2001, possibly rising again after 2001. We also introduce a model of the diversity of ideas, topic entropy, using it to show that COLING is a more diverse conference than ACL, but that both conferences as well as EMNLP are becoming broader over time. Finally, we apply Jensen-Shannon divergence of topic distributions to show that all three conferences are converging in the topics they cover. (shrink)
We describe an approach to textual inference that improves alignments at both the typed dependency level and at a deeper semantic level. We present a machine learning approach to alignment scoring, a stochastic search procedure, and a new tool that finds deeper semantic alignments, allowing rapid development of semantic features over the aligned graphs. Further, we describe a complementary semantic component based on natural logic, which shows an added gain of 3.13% accuracy on the RTE3 test set.
While Ç ´Ò¿ µ methods for parsing probabilistic context-free grammars (PCFGs) are well known, a tabular parsing framework for arbitrary PCFGs which allows for botton-up, topdown, and other parsing strategies, has not yet been provided. This paper presents such an algorithm, and shows its correctness and advantages over prior work. The paper finishes by bringing out the connections between the algorithm and work on hypergraphs, which permits us to extend the presented Viterbi (best parse) algorithm to an inside (total probability) (...) algorithm. (shrink)
We present a generative distributional model for the unsupervised induction of natural language syntax which explicitly models constituent yields and contexts. Parameter search with EM produces higher quality analyses than previously exhibited by unsupervised systems, giving the best published unsupervised parsing results on the ATIS corpus. Experiments on Penn treebank sentences of comparable length show an even higher F1 of 71% on nontrivial brackets. We compare distributionally induced and actual part-of-speech tags as input data, and examine extensions to the basic (...) model. We discuss errors made by the system, compare the system to previous models, and discuss upper bounds, lower bounds, and stability for this task. (shrink)
A* PCFG parsing can dramatically reduce the time required to find the exact Viterbi parse by conservatively estimating outside Viterbi probabilities. We discuss various estimates and give efficient algorithms for computing them. On Penn treebank sentences, our most detailed estimate reduces the total number of edges processed to less than 3% of that required by exhaustive parsing, and even a simpler estimate which can be pre-computed in under a minute still reduces the work by a factor of 5. The algorithm (...) extends the classic A* graph search procedure to a certain hypergraph associated with parsing. Unlike bestfirst and finite-beam methods for achieving this kind of speed-up, the A* parser is guaranteed to return the most likely parse, not just an approximation. The algorithm is also correct for a wide range of parser control strategies and maintains a worst-case cubic time bound. (shrink)
We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence assumptions latent in a vanilla treebank grammar. Indeed, its performance of 86.36% (LP/LR F1) is better than that of early lexicalized PCFG models, and surprisingly close to the current state-of-theart. This result has potential uses beyond establishing a strong lower bound on the maximum possible accuracy of unlexicalized models: an unlexicalized PCFG is (...) much more compact, easier to replicate, and easier to interpret than more complex lexical models, and the parsing algorithms are simpler, more widely understood, of lower asymptotic complexity, and easier to optimize. (shrink)
This paper separates conditional parameter estima- tion, which consistently raises test set accuracy on statistical NLP tasks, from conditional model struc- tures, such as the conditional Markov model used for maximum-entropy tagging, which tend to lower accuracy. Error analysis on part-of-speech tagging shows that the actual tagging errors made by the conditionally structured model derive not only from label bias, but also from other ways in which the independence assumptions of the conditional model structure are unsuited to linguistic sequences. The (...) paper presents new word-sense disambiguation and POS tagging experiments, and integrates apparently conflicting reports from other recent work. (shrink)
Unsupervised grammar induction systems commonly judge potential constituents on the basis of their effects on the likelihood of the data. Linguistic justifications of constituency, on the other hand, rely on notions such as substitutability and varying external contexts. We describe two systems for distributional grammar induction which operate on such principles, using part-of-speech tags as the contextual features. The advantages and disadvantages of these systems are examined, including precision/recall trade-offs, error analysis, and extensibility.
We present a novel generative model for natural language tree structures in which semantic (lexical dependency) and syntactic (PCFG) structures are scored with separate models. This factorization provides conceptual simplicity, straightforward opportunities for separately improving the component models, and a level of performance comparable to similar, non-factored models. Most importantly, unlike other modern parsing models, the factored model admits an extremely effective A* parsing algorithm, which enables efficient, exact inference.
We present an improved method for clustering in the presence of very limited supervisory information, given as pairwise instance constraints. By allowing instance-level constraints to have spacelevel inductive implications, we are able to successfully incorporate constraints for a wide range of data set types. Our method greatly improves on the previously studied constrained -means algorithm, generally requiring less than half as many constraints to achieve a given accuracy on a range of real-world data, while also being more robust when over-constrained. (...) We additionally discuss an active learning algorithm which increases the value of constraints even further. (shrink)
erative clustering. First, we show formally that the common heuristic agglomerative clustering algorithms – Ward’s method, single-link, complete-link, and a variant of group-average – are each equivalent to a hierarchical model-based method. This interpretation gives a theoretical explanation of the empirical behavior of these algorithms, as well as a principled approach to resolving practical issues, such as number of clusters or the choice of method. Second, we show how a model-based viewpoint can suggest variations on these basic agglomerative algorithms. We (...) introduce adjusted complete-link, Mahalanobis-link, and line-link as variants, and demonstrate their utility. (shrink)
This paper presents a novel approach to the unsupervised learning of syntactic analyses of natural language text. Most previous work has focused on maximizing likelihood according to generative PCFG models. In contrast, we employ a simpler probabilistic model over trees based directly on constituent identity and linear context, and use an EM-like iterative procedure to induce structure. This method produces much higher quality analyses, giving the best published results on the ATIS dataset.
While symbolic parsers can be viewed as deduction systems, this view is less natural for probabilistic parsers. We present a view of parsing as directed hypergraph analysis which naturally covers both symbolic and probabilistic parsing. We illustrate the approach by showing how a dynamic extension of Dijkstra’s algorithm can be used to construct a probabilistic chart parser with an Ç´Ò¿µ time bound for arbitrary PCFGs, while preserving as much of the flexibility of symbolic chart parsers as allowed by the inherent (...) ordering of probabilistic dependencies. (shrink)
This paper presents empirical studies and closely corresponding theoretical models of the performance of a chart parser exhaustively parsing the Penn Treebank with the Treebank’s own CFG grammar. We show how performance is dramatically affected by rule representation and tree transformations, but little by top-down vs. bottom-up strategies. We discuss grammatical saturation, including analysis of the strongly connected components of the phrasal nonterminals in the Treebank, and model how, as sentence length increases, the effective grammar rule size increases as regions (...) of the grammar are unlocked, yielding super-cubic observed time behavior in some configurations. (shrink)
This paper discusses ensembles of simple but heterogeneous classifiers for word-sense disambiguation, examining the Stanford-CS224N system entered in the SENSEVAL-2 English lexical sample task. First-order classifiers are combined by a second-order classifier, which variously uses majority voting, weighted voting, or a maximum entropy model. While individual first-order classifiers perform comparably to middle-scoring teams’ systems, the combination achieves high performance. We discuss trade-offs and empirical performance. Finally, we present an analysis of the combination, examining how ensemble performance depends on error independence (...) and task difficulty. (shrink)
We present a Chinese word segmentation system submitted to the closed track of Sighan bakeoff 2005. Our segmenter was built using a conditional random field sequence model that provides a framework to use a large number of linguistic features such as character identity, morphological and character reduplication features. Because our morphological features were extracted from the training corpora automatically, our system was not biased toward any particular variety of Mandarin. Thus, our system does not overfit the variety of Mandarin most (...) familiar to the system's designers. Our final system achieved a F-score of.. (shrink)
We present a maximum-entropy based system for identifying Named Entities (NEs) in biomedical abstracts and present its performance in the only two biomedical Named Entity Recognition (NER) comparative evaluations that have been held to date, namely BioCreative and Coling BioNLP. Our system obtained an exact match f-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation. We discuss our system in detail including its rich use of local features, attention to correct boundary identification, innovative use of external (...) knowledge resources including parsing and web searches, and rapid adaptation to new NE sets. We also discuss in depth problems with data annotation in the evaluations which caused the final performance to be lower than the optimal. (shrink)
We describe a machine learning system for the recognition of names in biomedical texts. The system makes extensive use of local and syntactic features within the text, as well as external resources including the web and gazetteers. It achieves an F- score of 70% on the Coling 2004 NLPBA/BioNLP shared task of identifying five biomedical named entities in the GENIA corpus.
Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sam- pling, a simple Monte Carlo method used to perform approximate inference in factored probabilistic models. By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and (...) CRFs, it is possible to incorporate non-local structure while preserving tractable inference. We use this technique to augment an existing CRF-based information extraction system with long-distance dependency models, enforcing label consistency and extraction template consistency constraints. This technique results in an error reduction of up to 9% over state-of-the-art systems on two established information extraction tasks. (shrink)
Part-of-speech tagging, like any supervised statistical NLP task, is more difficult when test sets are very different from training sets, for example when tagging across genres or language varieties. We examined the problem of POS tagging of different varieties of Mandarin Chinese (PRC-Mainland, PRC- Hong Kong, and Taiwan). An analytic study first showed that unknown words were a major source of difficulty in cross-variety tagging. Unknown words in English tend to be proper nouns. By contrast, we found that (...) Mandarin unknown words were mostly common nouns and verbs. We showed these results are caused by the high frequency of morphological compounding in Mandarin; in this sense Mandarin is more like German than English. Based on this analysis, we propose a variety of new morphological unknown-word features for POS tagging, extending earlier work by others on unknown-word tagging in English and German. Our features were implemented in a maximum entropy Markov model. Our system achieves state-of-the-art performance in Mandarin tagging, including improving unknown-word tagging performance on unseen varieties in Chinese Treebank 5.0 from 61% to 80% correct. (shrink)
The ontology of LFG. We need to get straight what is out there in the world and what our model objects are, what are denotations and what are descriptions that get interpreted. The title of Bresnan (1982a), The Mental Representation of Grammatical Relations, seems more likely to confuse us than help us. But in the introduction, there are some fairly clear statements of how their model of human use of language is to be constructed. Kaplan & Bresnan (1982, p. 173) (...) adopt a Competence Hypothesis which postulates some form of grammar inside the mind of a human being: We assume that an explanatory model of human language performance will incorporate a theoretically justified representation of the native speaker’s linguistic knowledge (a grammar ). Bresnan & Kaplan (1982, p. xxxi) explain how their model relates to this hypothesized grammar. (shrink)
Michel Galley, Pi-Chuan Chang, Daniel Cer, Jenny R. Finkel, and Christopher D. Manning Computer Science and Linguistics Departments Stanford University..
Pi-Chuan Chang, Michel Galley, and Christopher D. Manning Computer Science Department, Stanford University Stanford, CA 94305 pichuan,galley,manning@cs.stanford.edu..
We show a number of improvements in the use of Hidden Conditional Random Fields (HCRFs) for phone classification on the TIMIT and Switchboard corpora. We first show that the use of regularization effectively prevents overfitting, improving over other methods such as early stopping. We then show that HCRFs are able to make use of non-independent features in phone classification, at least with small numbers of mixture components, while HMMs degrade due to their strong independence assumptions. Finally, we successfully apply (...) Maximum a Posteriori adaptation to HCRFs, decreasing the phone classification error rate in the Switchboard corpus by around 1% – 5% given only small amounts of adaptation data. (shrink)
We propose a general model for joint inference in correlated natural language processing tasks when fully annotated training data is not available, and apply this model to the dual tasks of word sense disambiguation and verb subcategorization frame determination. The model uses the EM algorithm to simultaneously complete partially annotated training sets and learn a generative probabilistic model over multiple annotations. When applied to the word sense and verb subcategorization frame determination tasks, the model learns sharp joint probability distributions which (...) correspond to linguistic intuitions about the correlations of the variables. Use of the joint model leads to error reductions over competitive independent models on these tasks. (shrink)
Most theories of binding in most syntactic frameworks assume that the same notion of surface obliqueness that identi es the subject of a clause is also used for obliqueness conditions on re exive binding For instance in GB Chomsky binding theory is standardly de ned on S structure so that in Nancy can bind herself due to the c commanding con guration that also makes Nancy the subject of the sentence..
This paper presents a new method for producing a dictionary of subcategorization frames from unlabelled text corpora. It is shown that statistical filtering of the results of a finite state parser running on the output of a stochastic tagger produces high quality results, despite the error rates of the tagger and the parser. Further, it is argued that this method can be used to learn all subcategorization frames, whereas previous methods are not extensible to a general solution to the problem.
This paper shows that a simple two-stage approach to handle non-local dependencies in Named Entity Recognition (NER) can outperform existing approaches that handle non-local dependencies, while being much more computationally efficient. NER systems typically use sequence models for tractable inference, but this makes them unable to capture the long distance structure present in text. We use a Conbel.
1 Boolean retrieval 1 2 The term vocabulary and postings lists 19 3 Dictionaries and tolerant retrieval 49 4 Index construction 67 5 Index compression 85 6 Scoring, term weighting and the vector space model 109 7 Computing scores in a complete search system 135 8 Evaluation in information retrieval 151 9 Relevance feedback and query expansion 177 10 XML retrieval 195 11 Probabilistic information retrieval 219 12 Language models for information retrieval 237 13 Text classification and Naive Bayes 253 (...) 14 Vector space classification 289 15 Support vector machines and machine learning on documents 319 16 Flat clustering 349 17 Hierarchical clustering 377 18 Matrix decompositions and latent semantic indexing 403 19 Web search basics 421 20 Web crawling and indexes 443 21 Link analysis 461.. (shrink)
The alignment problem—establishing links between corresponding phrases in two related sentences—is as important in natural language inference (NLI) as it is in machine translation (MT). But the tools and techniques of MT alignment do not readily transfer to NLI, where one cannot assume semantic equivalence, and for which large volumes of bitext are lacking. We present a new NLI aligner, the MANLI system, designed to address these challenges. It uses a phrase-based alignment representation, exploits external lexical resources, and capitalizes on (...) a new set of supervised training data. We compare the performance of MANLI to existing NLI and MT aligners on an NLI alignment task over the well-known Recognizing Textual Entailment data. We show that MANLI significantly outperforms existing aligners, achieving gains of 6.2% in F1 over a representative NLI aligner and 10.5% over GIZA++. (shrink)
adjacent phrases, but they typically lack the ability to perform the kind of long-distance reorderings possible with syntax-based systems. In this paper, we present a novel hierarchical phrase reordering model aimed at improving non-local reorderings, which seamlessly integrates with a standard phrase-based system with little loss of computational efficiency. We show that this model can successfully handle the key examples often used to motivate syntax-based systems, such as the rotation of a prepositional phrase around a noun phrase. We contrast our (...) model with reordering models commonly used in phrase-based systems, and show that our approach provides statistically significant BLEU point gains for two language pairs: Chinese-English (+0.53 on MT05 and +0.71 on MT08) and Arabic-English (+0.55 on MT05). (shrink)
The correct locus (or loci) of binding theory has been a matter of much discussion. Theories can be seen as varying along at least two dimensions. The rst is whether binding theory is con gurationally determined (that is, the theory exploits the geometry of a phrase marker, appealing to such purely structural notions as c-command and government) or whether the theory depends rather on examining the relations between items selected by a predicate (where by selection I am intending to cover (...) everything from semantic dependencies to syntactic subcategorization). The second is the level of grammar on which binding is de ned. Attempting to roughly equate levels across di erent theories, suggestions have included the semantics/lexical conceptual structure (Jackendo 1992), thematic structure (Jackendo 1972, Wilkins 1988), argument structure/D-structure/initial grammatical relations (Manning 1994, Belletti and Rizzi 1988, Perlmutter 1984), surface syntax/grammatical relations, logical form, linear order, pragmatics (Levinson 1991), and discourse (Iida 1992). The data is su ciently varied and complex that many theories end up as mixtures, variously employing a combination of elements along both dimensions (for instance, Chomsky (1986) relies purely on con gurational notions for the relationship between an anaphor and its antecedent, but uses concepts from selection in the de nition of the binding domain of an anaphor). LFG has always rejected a con gurational account of binding. For instancError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMape, Simpson (1991) argues that a con gurational theory of binding in Warlpiri cannot be maintained, among other reasons because nite clauses lack a VP.. (shrink)
We present a novel algorithm for the fast computation of PageRank, a hyperlink-based estimate of the “importance” of Web pages. The original PageRank algorithm uses the Power Method to compute successive iterates that converge to the principal eigenvector of the Markov matrix representing the Web link graph. The algorithm presented here, called Quadratic Extrapolation, accelerates the convergence of the Power Method by periodically subtracting off estimates of the nonprincipal eigenvectors from the current iterate of the Power Method. In Quadratic Extrapolation, (...) we take advantage of the fact that the first eigenvalue of a Markov matrix is known to be 1 to compute the nonprincipal eigenvectors using successive iterates of the Power Method. Empirically, we show that using Quadratic Extrapolation speeds up PageRank computation by 50- 300% on a Web graph of 80 million nodes, with minimal overhead. (shrink)
Automatically clustering web pages into semantic groups promises improved search and browsing on the web. In this paper, we demonstrate how user-generated tags from largescale social bookmarking websites such as del.icio.us can be used as a complementary data source to page text and anchor text for improving automatic clustering of web pages. This paper explores the use of tags in 1) K-means clustering in an extended vector space model that includes tags as well as page text and 2) a novel (...) generative clustering algorithm based on latent Dirichlet allocation that jointly models text and tags. We evaluate the models by comparing their output to an established web directory. We find that the naive inclusion of tagging data improves cluster quality versus page text alone, but a more principled inclusion can substantially improve the quality of all models with a statistically signifi- cant absolute F-score increase of 4%. The generative model outperforms K-means with another 8% F-score increase. (shrink)
Linking constructions involving dሇ (DE) are ubiquitous in Chinese, and can be translated into English in many different ways. This is a major source of machine translation error, even when syntaxsensitive translation models are used. This paper explores how getting more information about the syntactic, semantic, and discourse context of uses of dሇ (DE) can facilitate producing an appropriate English translation strategy. We describe a finergrained classification of dሇ (DE) constructions in Chinese NPs, construct a corpus of annotated examples, and (...) then train a log-linear classifier, which contains linguistically inspired features. We use the DE classifier to preprocess MT data by explicitly labeling dሇ (DE) constructions, as well as reordering phrases, and show that our approach provides significant BLEU point gains on MT02 (+1.24), MT03 (+0.88) and MT05 (+1.49) on a phrasedbased system. The improvement persists when a hierarchical reordering model is applied. (shrink)
Discriminative feature-based methods are widely used in natural language processing, but sentence parsing is still dominated by generative methods. While prior feature-based dynamic programming parsers have restricted training and evaluation to artificially short sentences, we present the first general, featurerich discriminative parser, based on a conditional random field model, which has been successfully scaled to the full WSJ parsing data. Our efficiency is primarily due to the use of stochastic optimization techniques, as well as parallelization and chart prefiltering. On WSJ15, (...) we attain a state-of-the-art F-score of 90.9%, a 14% relative reduction in error over previous models, while being two orders of magnitude faster. On sentences of length 40, our system achieves an F-score of 89.0%, a 36% relative reduction in error over a generative baseline. (shrink)
A desirable quality of a coreference resolution system is the ability to handle transitivity constraints, such that even if it places high likelihood on a particular mention being coreferent with each of two other mentions, it will also consider the likelihood of those two mentions being coreferent when making a final assignment. This is exactly the kind of constraint that integer linear programming (ILP) is ideal for, but, surprisingly, previous work applying ILP to coreference resolution has not encoded this type (...) of constraint. We train a coreference classifier over pairs of mentions, and show how to encode this type of constraint on top of the probabilities output from our pairwise classifier to extract the most probable legal entity assignments. We present results on two commonly used datasets which show that enforcement of transitive closure consistently improves performance, including improvements of up to 3.6% using the b3 scorer, and up to 16.5% using cluster f-measure. (shrink)
We propose an approach to natural language inference based on a model of natural logic, which identifies valid inferences by their lexical and syntactic features, without full semantic interpretation. We greatly extend past work in natural logic, which has focused solely on semantic containment and monotonicity, to incorporate both semantic exclusion and implicativity. Our system decomposes an inference problem into a sequence of atomic edits linking premise to hypothesis; predicts a lexical entailment relation for each edit using a statistical classifier; (...) propagates these relations upward through a syntax tree according to semantic properties of intermediate nodes; and composes the resulting entailment relations across the edit sequence. We evaluate our system on the FraCaS test suite, and achieve a 27% reduction in error from previous work. We also show that hybridizing an existing RTE system with our natural logic system yields significant gains on the RTE3 test suite. (shrink)
This paper presents the first use of a computational model of natural logic—a system of logical inference which operates over natural language—for textual inference. Most current approaches to the PAS- CAL RTE textual inference task achieve robustness by sacrificing semantic precision; while broadly effective, they are easily confounded by ubiquitous inferences involving monotonicity. At the other extreme, systems which rely on first-order logic and theorem proving are precise, but excessively brittle. This work aims at a middle way. Our system finds (...) a low-cost edit sequence which transforms the premise into the hypothesis; learns to classify entailment relations across atomic edits; and composes atomic entailments into a top-level entailment judgment. We provide the first reported results for any system on the FraCaS test suite. We also evaluate on RTE3 data, and show that hybridizing an existing RTE system with our natural logic system yields significant performance gains. (shrink)
Many named entities contain other named entities inside them. Despite this fact, the field of named entity recognition has almost entirely ignored nested named entity recognition, but due to technological, rather than ideological reasons. In this paper, we present a new technique for recognizing nested named entities, by using a discriminative constituency parser. To train the model, we transform each sentence into a tree, with constituents for each named entity (and no other syntactic structure). We present results on both newspaper (...) and biomedical corpora which contain nested named entities. In three out of four sets of experiments, our model outperforms a standard semi-CRF on the more traditional top-level entities. At the same time, we improve the overall F-score by up to 30% over the flat model, which is unable to recover any nested entities. (shrink)
I examine what would be necessary to move part-of-speech tagging performance from its current level of about 97.3% token accuracy (56% sentence accuracy) to close to 100% accuracy. I suggest that it must still be possible to greatly increase tagging performance and examine some useful improvements that have recently been made to the Stanford Part-of-Speech Tagger. However, an error analysis of some of the remaining errors suggests that there is limited further mileage to be had either from better machine learning (...) or better features in a discriminative sequence classifier. The prospects for further gains from semisupervised learning also seem quite limited. Rather, I suggest and begin to demonstrate that the largest opportunity for further progress comes from improving the taxonomic basis of the linguistic resources from which taggers are trained. That is, from improved descriptive linguistics. However, I conclude by suggesting that there are also limits to this process. The status of some words may not be able to be adequately captured by assigning them to one of a small number of categories. While conventions can be used in such cases to improve tagging consistency, they lack a strong linguistic basis. (shrink)
Abeill´e and Godard (1994) seek to show that the rightward branching analysis of French tense auxiliaries shown in (1b), that I argued for in Manning (1992) and which is widely adopted in general, is wrong, and that rather we should adopt a flat analysis for this construction as shown in (1c), and they show how such an analysis can be realized within HPSG (Pollard and Sag 1994).
We present a system for deciding whether a given sentence can be inferred from text. Each sentence is represented as a directed graph (extracted from a dependency parser) in which the nodes represent words or phrases, and the links represent syntactic and semantic relationships. We develop a learned graph matching model to approximate entailment by the amount of the sentence’s semantic content which is contained in the text. We present results on the Recognizing Textual Entailment dataset (Dagan et al., 2005), (...) and show that our approach outperforms Bag- Of-Words and TF-IDF models. (shrink)
The same categorical phenomena which are attributed to hard grammatical constraints in some languages continue to show up as statistical preferences in other languages, motivating a grammatical model that can account for soft constraints. The effects of a hierarchy of person (1st, 2nd 3rd) on grammar are categorical in some languages, most famously in languages withError: Illegal entry in bfrange block in ToUnicode CMap inverse systems, but also in languages with person restrictions on passivization. In Lummi, for example, the person (...) of the subject argument cannot be lower than the person of a nonsubject argument. If this would happen in the active, passivization is obligatory; if it would happen in the passive, the active is obligatory (Jelinek and Demers 1983). These facts follow from the theory of harmonic alignment in OT: constraints favoring the harmonic association of prominent person (1st, 2nd) with prominent syntactic function (subject) are hypothesized to be present as subhierarchies of the grammars of all languages, but to vary in their effects across languages depending on their interactions with other constraints (Aissen 1999). There is a statistical reflection of these hierarchies in English. The same disharmonic person/argument associations which are avoided categorically in languages like Lummi by making passives either impossible or obligatory, are avoided in the SWITCHBOARD corpus of spoken English by either depressing or elevating the frequency of passives relative to actives. The English data can be grammatically analyzed within the stochastic OT framework (Boersma 1998, Boersma and Hayes 2001) in a way which provides a principled and unifying explanation for their relation to the crosslinguistic categorical person effects studied by Aissen (1999). (shrink)
Technology for local textual inference is central to producing a next generation of intelligent yet robust human language processing systems. One can think of it as Information Retrieval++. It is needed for a search on male fertility may be affected by use of cell phones to match a document saying Startling new research into mobile phones suggests they can reduce a man’s sperm count up to 30%, despite the fact that the only word overlap is phones. But textual inference is (...) useful more broadly. It is an enabling technology for applications of document interpretation, such as customer response management, where one would like to conclude from the message My Squeezebox regularly skips during music playback that Sender has set up Squeezebox and Sender can hear music through Squeezebox, and information extraction, where from the text Jorma Ollila joined Nokia in 1985 and held a variety of key management positions before taking the helm in 1992, one wants to extract that Jorma Ollila has served as the CEO of Nokia, a relation that might be more formally denoted as role(CEO, Nokia, Jorma Ollila). Textual inference is a difficult problem (as the results from early evaluations have shown): current systems do statistically better than random guessing, but not by very much. Nevertheless, it is also an area where there is promising developing technology and a good deal of natural language community interest. In other words, it is an ideal research problem. To further this research agenda, data sets have been constructed to assess textual inference systems. This paper examines how the task of textual inference has been and should be defined and discusses what kind of evaluation data is appropriate for the task.1.. (shrink)
This paper examines the Stanford typed dependencies representation, which was designed to provide a straightforward description of grammatical relations for any user who could benefit from automatic text understanding. For such purposes, we argue that dependency schemes must follow a simple design and provide semantically contentful information, as well as offer an automatic procedure to extract the relations. We consider the underlying design principles of the Stanford scheme from this perspective, and compare it to the GR and PARC representations. Finally, (...) we address the question of the suitability of the Stanford scheme for parser evaluation. (shrink)
Many factors are thought to increase the chances of misrecognizing a word in ASR, including low frequency, nearby disfluencies, short duration, and being at the start of a turn. However, few of these factors have been formally examined. This paper analyzes a variety of lexical, prosodic, and disfluency factors to determine which are likely to increase ASR error rates. Findings include the following. (1) For disfluencies, effects depend on the type of disfluency: errors increase by up to 15% (absolute) for (...) words near fragments, but decrease by up to 7.2% (absolute) for words near repetitions. This decrease seems to be due to longer word duration. (2) For prosodic features, there are more errors for words with extreme values than words with typical values. (3) Although our results are based on output from a system with speaker adaptation, speaker differences are a major factor influencing error rates, and the effects of features such as frequency, pitch, and intensity may vary between speakers. (shrink)
This paper proposes a new architecture for textual inference in which finding a good alignment is separated from evaluating entailment. Current approaches to semantic inference in question answering and textual entailment have approximated the entailment problem as that of computing the best alignment of the hypothesis to the text, using a locally decomposable matching score. While this formulation is adequate for representing local (word-level) phenomena such as synonymy, it is incapable of representing global interactions, such as that between verb negation (...) and the addition/removal of qualifiers, which are often critical for determining entailment. We propose a pipelined approach where alignment is followed by a classification step, in which we extract features representing high-level characteristics of the entailment problem, and give the resulting feature vector to a statistical classifier trained on development data. (shrink)
Grammatical theory has long wrestled with the fact that causative constructions exhibit properties of both single words and complex phrases. However, as Paul Kiparsky has observed, the distribution of such properties of causatives is not arbitrary: ‘construal’ phenomena such as honorification, anaphor and pronominal binding, and quantifier ‘floating’ typically behave as they would if causatives were syntactically complex, embedding constructions; whereas case marking, agreement and word order phenomena all point to the analysis of causatives as single lexical items.1 Although an (...) analysis of causatives in terms of complex syntactic structures has frequently been adopted in an attempt to simplify the mapping to semantic structure, we believe that motivating syntactic structure based on perceived semantics is questionable because in general a syntax/semantics homomorphism cannot be maintained without vitiating syntactic theory (Miller 1991). Instead, we sketch a strictly lexical theory of Japanese causatives that deals with the evidence offered for a complex phrasal analysis. Such an analysis makes the phonology, morphology and syntax parallel, while a mismatch occurs with the semantics. The conclusions we will reach are given in (1). (shrink)
This paper examines feature selection for log linear models over rich constraint-based grammar (HPSG) representations by building decision trees over features in corresponding probabilistic context free grammars (PCFGs). We show that single decision trees do not make optimal use of the available information; constructed ensembles of decision trees based on different feature subspaces show signifi- cant performance gains (14% parse selection error reduction). We compare the performance of the learned PCFG grammars and log linear models over the same features.
Many NLP tasks rely on accurately estimating word dependency probabilities P(w1|w2), where the words w1 and w2 have a particular relationship (such as verb-object). Because of the sparseness of counts of such dependencies, smoothing and the ability to use multiple sources of knowledge are important challenges. For example, if the probability P(N |V ) of noun N being the subject of verb V is high, and V takes similar objects to V , and V is synonymous to V , then (...) we want to conclude that P(N |V ) should also be reasonably high—even when those words did not cooccur in the training data. (shrink)
This report details experimental results of using stochastic disambiguation models for parsing sentences from the Redwoods treebank (Oepen et al., 2002). The goals of this paper are two-fold: (i) to report accuracy results on the more highly ambiguous latest version of the treebank, as compared to already published results achieved by the same stochastic models on a previous version of the corpus, and (ii) to present some newly developed models using features from the HPSG signs, as well as the MRS (...) dependency graphs. (shrink)
This paper discusses what is required from dictionary databases, and one approach, based on experience with Kirrkirr, a dictionary browser originally developed for Warlpiri, an Indigenous Australian language. The paper suggests that there is something ofadisconnectbetweenthedataaccess needs of lexical databases and most work on semi-structured databases withinthedatabasecommunity.
This paper studies the properties and performance of models for estimating local probability distributions which are used as components of larger probabilistic systems — history-based generative parsing models. We report experimental results showing that memory-based learning outperforms many commonly used methods for this task (Witten-Bell, Jelinek-Mercer with fixed weights, decision trees, and log-linear models). However, we can connect these results with the commonly used general class of deleted interpolation models by showing that certain types of memory-based learning, including the kind (...) that performed so well in our experiments, are instances of this class. In addition, we illustrate the divergences between joint and conditional data likelihood and accuracy performance achieved by such models, suggesting that smoothing based on optimizing accuracy directError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMapError: Illegal entry in bfrange block in ToUnicode CMaply might greatly improve performance. (shrink)
Determining the semantic role of sentence constituents is a key task in determining sentence meanings lying behind a veneer of variant syntactic expression. We present a model of natural language generation from semantics using the FrameNet semantic role and frame ontology. We train the model using the FrameNet corpus and apply it to the task of automatic semantic role and frame identification, producing results competitive with previous work (about 70% role labeling accuracy). Unlike previous models used for this task, our (...) model does not assume that the frame of a sentence is known, and is able to identify null- instantiated roles, which commonly occur in our corpus and whose identification is crucial to natural language interpretation. (shrink)
fundamental rule” in an order-independent manner, such that the same basic algorithm supports top-down and Most PCFG parsing work has used the bottom-up bottom-up parsing, and the parser deals correctly with CKY algorithm (Kasami, 1965; Younger, 1967) with the difficult cases of left-recursive rules, empty elements, Chomsky Normal Form Grammars (Baker, 1979; Jeand unary rules, in a natural way.
Determining the semantic role of sentence constituents is a key task in determining sentence meanings lying behind a veneer of variant syntactic expression. We present a model of natural language generation from semantics using the FrameNet semantic role and frame ontology. We train the model using the FrameNet corpus and apply it to the task of automatic semantic role and frame identification, producing results competitive with previous work (about 70% role labeling accuracy). Unlike previous models used for this task, our (...) model does not assume that the frame of a sentence is known, and is able to identify null- instantiated roles, which commonly occur in our corpus and whose identification is crucial to natural language interpretation. (shrink)
We present a novel technique for speeding up the computation of PageRank, a hyperlink-based estimate of the “importance” of Web pages, based on the ideas presented in [7]. The original PageRank algorithm uses the Power Method to compute successive iterates that converge to the principal eigenvector of the Markov matrix representing the Web link graph. The algorithm presented here, called Power Extrapolation, accelerates the convergence of the Power Method by subtracting off the error along several nonprincipal eigenvectors from the current (...) iterate of the Power Method, making use of known nonprincipal eigenvalues of the Web hyperlink matrix. Empirically, we show that using Power Extrapolation speeds up PageRank computation by 30% on a Web graph of 80 million nodes in realistic scenarios over the standard power method, in a way that is simple to understand and implement. (shrink)
In Pollard and Sag (1987) and Pollard and Sag (1994:Ch. 1–8), the subcategorized arguments of a head are stored on a single ordered list, the subcat list. However, Borsley (1989) argues that there are various defi- ciencies in this approach, and suggests that the unified list should be split into separate lists for subjects, complements, and specifiers. This proposal has been widely adopted in what is colloquially known as HPSG3 (Pollard and Sag (1994:Ch. 9) and other recent work in HPSG). (...) Such a move provides in HPSG an analog of the external/internal argument distinction generally adopted in GB, solves certain technical problems such as allowing prepositions to take complements rather than things identical in subcat list position to subjects, and allows recognition of the special features of subjects which have been noted in the LFG literature, where keyword grammatical relations are used. In HPSG3, it is these valence features subj, comps and spr whose values are ‘cancelled off’ (in a Categorial Grammar-like manner) as a head projects a phrase. A lexical head combines with its complements and subject or specifier (if any) according to the lexically inherited specification, as in (1). (shrink)
Kristina Toutanova Christopher D. Manning Dept of Computer Science Depts of Computer Science and Linguistics Gates Bldg 4A, 353 Serra Mall Gates Bldg 4A, 353 Serra Mall Stanford, CA 94305–9040, USA Stanford, CA 94305–9040, USA kristina@cs.stanford.edu manning@cs.stanford.edu..
Marie-Catherine de Marneffe, Anna N. Rafferty and Christopher D. Manning Linguistics Department Computer Science Department Stanford University Stanford University Stanford, CA 94305 Stanford, CA 94305 {rafferty,manning}@stanford.edu mcdm@stanford.edu..
We present a novel generative model for natural language tree structures in which semantic (lexical dependency) and syntactic (PCFG) structures are scored with separate models. This factorization provides conceptual simplicity, straightforward opportunities for separately improving the component models, and a level of performance comparable to similar, non-factored models. Most importantly, unlike other modern parsing models, the factored model admits an extremely effective A* parsing algorithm, which enables efficient, exact inference.
first-order HMM, the current tag t0 is predicted based on the previous tag t−1 (and the current word).1 The back- We present a new part-of-speech tagger that ward interaction between t0 and the next tag t+1 shows demonstrates the following ideas: (i) explicit up implicitly later, when t+1 is generated in turn. While unidirectional models are therefore able to capture both use of both preceding and following tag con-.
This paper describes a system for extracting typed dependency parses of English sentences from phrase structure parses. In order to capture inherent relations occurring in corpus texts that can be critical in real-world applications, many NP relations are included in the set of grammatical relations used. We provide a comparison of our system with Minipar and the Link parser. The typed dependency extraction facility described here is integrated in the Stanford Parser, available for download.
Nathanael Chambers, Daniel Cer, Trond Grenager, David Hall, Chloe Kiddon Bill MacCartney, Marie-Catherine de Marneffe, Daniel Ramage Eric Yeh, Christopher D. Manning Computer Science Department Stanford University Stanford, CA 94305..
Jackendoff (1987, 1990) has brought up various problems with the current use of thematic roles (Kiparsky, 1987; Bresnan & Kanerva, 1989 and references cited therein) and suggested a different way of thinking of thematic roles as structural configurations in his semantic Lexical Conceptual Structures (LCSs). Conversely, Joshi (1989) has claimed that Jackendoff’s LCSs alone are insufficient, and that an analysis of certain facts in Marathi additionally requires the existence of a level of predicate-argument structure (PAS). Below we will mention a (...) few of Jackendoff’s arguments against the current conception of thematic roles. We will then look at Joshi’s arguments about the necessity of a level of PAS in addition to LCS and conclude that providing Jackendoff’s LCSs are integrated into a suitable syntactic theory, neither of her points are problematic to Jackendoff.1 From there we will go on to re-examine some of the facts of Marathi, and show that certain facts that have merely been stipulated or left unanalyzed when using thematic roles, receive a rather elegant treatment when described via a combination of their syntax and LCS. (shrink)
This paper presents a novel approach to the unsupervised learning of syntactic analyses of natural language text. Most previous work has focused on maximizing likelihood according to generative PCFG models. In contrast, we employ a simpler probabilistic model over trees based directly on constituent identity and linear context, and use an EM-like iterative procedure to induce structure. This method produces much higher quality analyses, giving the best published results on the ATIS dataset.
While symbolic parsers can be viewed as deduction systems, this view is less natural for probabilistic parsers. We present a view of parsing as directed hypergraph analysis which naturally covers both symbolic and probabilistic parsing. We illustrate the approach by showing how a dynamic extension of Dijkstra’s algorithm can be used to construct a probabilistic chart parser with an Ç´Ò¿µ time bound for arbitrary PCFGs, while preserving as much of the flexibility of symbolic chart parsers as allowed by the inherent (...) ordering of probabilistic dependencies. (shrink)
In this paper I will discuss a rather recondite phenomenon in the area of sequence of tense (SOT), exhibited by sentences like (1): (1) John said that Mary is pregnant. According to traditional grammar, this is a sentence where sequence of tense has failed to apply (i.e., concord has been broken): standard sequence of tense rules would dictate use of a past tense when embedding an event contemporaneous to the embedding verb under a past tense verb, giving the sentence John (...) said that Mary was pregnant. For some verbs breaking concord is impossible (*Mary said that John builds a house) or can only have a present-as-future interpretation (John said that the last spaceship to Mars leaves tomorrow), but with stative verbs, as En¸c (1987) and others have observed, this failure of sequence of tense to apply is associated with a rather special meaning, which we will try to elucidate below. For the moment, let us merely observe that the use of present tense seems to cause such sentences to end up saying something about a larger interval including both the time of utterance and the time of the event described in the main clause. For this reason En¸c calls them “double access sentences”, but that seems a rather dubious name as the interpretation seems to rely on evaluation at a large interval, not just at two points. (shrink)
“Everyone knows that language is variable.” This is the bald sentence with which Sapir (1921:147) begins his chapter on language as an historical product. He goes on to emphasize how two speakers’ usage is bound to differ “in choice of words, in sentence structure, in the relative frequency with which particular forms or combinations of words are used”. I should add that much sociolinguistic and historical linguistic research has shown that the same speaker’s usage is also variable (Labov 1966, Kroch (...) 2001:722). However, the tradition of most syntacticians has been to ignore this thing that everyone knows. (shrink)
In this paper I want to look at what the evidence from Complex Predicates can tell us about the design parameters of an empirically adequate theory of Universal Grammar (UG). This is a fertile field for investigation because, according to the standard assumptions of the field, complex predicates are monoclausal with respect to some properties and multiclausal with respect to others and this tension can only be resolved by giving up some cherished beliefs. After introducing the problem in Section 1, (...) Sections 2–4 will lay out the basis of the dilemma. Sections 2 and 3 argue that Romance complex predicates have an articulated rightwardbranching phrase structure, and cannot be analyzed as some sort of verb compound or verbal complex while conversely Section 4 shows how in many respects a complex predicate does behave just like a single predicate. Hence we require a notion of monoclausality that these complex predicates satisfy despite their articulated phrase structure. (shrink)
mentation for languages such as Chinese. Almost no NLP task is truly standalone. The end-to-end performance of natural Most current systems for higher-level, aggre-.
number of hidden categories is not fixed, but when the number of hidden states is unknown (Beal et al., 2002; Teh et al., 2006). can grow with the amount of training data.
We propose a model of natural language inference which identifies valid inferences by their lexical and syntactic features, without full semantic interpretation. We extend past work in natural logic, which has focused on semantic containment and monotonicity, by incorporating both semantic exclusion and implicativity. Our model decomposes an inference problem into a sequence of atomic edits linking premise to hypothesis; predicts a lexical semantic relation for each edit; propagates these relations upward through a semantic composition tree according to properties of (...) intermediate nodes; and joins the resulting semantic relations across the edit sequence. A computational implementation of the model achieves 70% accuracy and 89% precision on the FraCaS test suite. Moreover, including this model as a component in an existing system yields significant performance gains on the Recognizing Textual Entailment challenge. (shrink)
I wish to present a codi cation of syntactic approaches to dealing with ergative languages and argue for the correctness of one particular approach, which I will call the Inverse Grammatical Relations hypothesis.1 I presume familiarity with the term `ergativity', but, brie y, many languages have ergative case marking, such as Burushaski in (1), in contrast to the accusative case marking of Latin in (2). More generally, if we follow Dixon (1979) and use A to mark the agent-like argument of (...) a transitive verb, O to mark the patient-like argument of a transitive verb, and S to mark the single argument of an intransitive verb, then we can call ergative any subsystem of a language that groups S and O in contrast to A, as shown in (3). (shrink)