93 found
Sort by:
  1. Christopher Manning, Ofer Dekel & Yoram Singer, Log-Linear Models for Label Ranking.
    In Sebastian Thrun, Lawrence K. Saul, and Bernhard Schölkopf (eds), Advances in Neural Information Processing Systems 16 (NIPS 2003). Cambridge, MA: MIT Press, pp. 497-504.
    No categories
    Direct download (2 more)  
     
    My bibliography  
     
    Export citation  
  2. Brett Baker & Christopher Manning, A Dictionary Database Template For.
    Dictionary-making is an increasingly important avenue for cultural preservation and maintenance for Aboriginal people. It is also one of the main jobs performed by linguists working in Aboriginal communities. However, current tools for making dicitionaries are either not specifically designed for the purpose (Word, Nisus), with the result that dictionaries written in them are difficult to maintain, to keep consistent, and to manipulate automatically, or are too complex for many people to use (Shoebox), and are thereby wasted as potential resources. (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  3. Philip Beineke & Christopher Manning, An Exploration of Sentiment Summarization.
    The website Rotten Tomatoes, located at www.rottentomatoes.com, is primarily an online repository of movie reviews. For each movie review document, the site provides a link to the full review, along with a brief description of its sentiment. The description consists of a rating (“fresh” or “rotten”) and a short quotation from the review. Other research (Pang, Lee, & Vaithyanathan 2002) has predicted a movie review’s rating from its text. In this paper, we focus on the quotation, which is a main (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  4. Ezra Callahan, Christopher D. Manning & Kristina Toutanova, LinGO Redwoods.
    The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. A treebank is a (typically hand-built) collection of natural language utterances and associated linguistic analyses; typical treebanks—as for example the widely recognized Penn Treebank (Marcus, Santorini, & Marcinkiewicz, 1993), the Prague Dependency Treebank (Hajic, 1998), or the German TiGer Corpus (Skut, Krenn, Brants, & Uszkoreit, 1997)—assign syntactic phrase structure or tectogrammatical dependency trees over sentences taken from a naturally-occuring source, often newspaper (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  5. Miriam Corris, Christopher Manning, Susan Poetsch & Jane Simpson, Bilingual Dictionaries for Australian Languages: User Studies on the Place of Paper and Electronic Dictionaries.
    Dictionaries have long been seen as an essential contribution by linguists to work on endangered languages. We report on preliminary investigations of actual dictionary usage and usability by 76 speakers, semi-speakers and learners of Australian Aboriginal languages. The dictionaries include: electronic and printed bilingual Warlpiri-English dictionaries, a printed trilingual Alawa-Kriol- English dictionary, and a printed bilingual Warumungu-English dictionary. We examine competing demands for completeness of coverage and ease of access, and focus on the prospects of electronic dictionaries for solving many (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  6. Miriam Corris, Christopher Manning, Susan Poetsch & Jane Simpson, Dictionaries and Endangered Languages.
    Linguists have seen creating dictionaries of endangered languages as a key activity in language maintenance and revival work. However, like any approach to language engineering, there are concerns to address. The first is the tension between language documentation and language maintenance2. The second is the role of literacy. A lot of effort has been put into vernacular literacy, on the assumption that it assists language maintenance, as well as language documentation. In some respects this is a dubious assumption, because writing (...)
    Direct download  
     
    My bibliography  
     
    Export citation  
  7. Christopher Cox, Christopher D. Manning & Kristina Toutanova, Robust Textual Inference Using Diverse Knowledge Sources.
    We present a machine learning approach to robust textual inference, in which parses of the text and the hypothesis sentences are used to measure their asymmetric “similarity”, and thereby to decide if the hypothesis can be inferred. This idea is realized in two different ways. In the first, each sentence is represented as a graph (extracted from a dependency parser) in which the nodes are words/phrases, and the links represent dependencies. A learned, asymmetric, graph-matching cost is then computed to measure (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  8. Christopher Cox, Christopher Manning & Pat Langley, Template Sampling for Leveraging Domain Knowledge in Information Extraction.
    We initially describe a feature-rich discriminative Conditional Random Field (CRF) model for Information Extraction in the workshop announcements domain, which offers good baseline performance in the PASCAL shared task. We then propose a method for leveraging domain knowledge in Information Extraction tasks, scoring candidate document labellings as one-value-per-field templates according to domain feasibility after generating sample labellings from a trained sequence classifier. Our relational models evaluate these templates according to our intuitions about agreement in the domain: workshop acronyms should resemble (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  9. Spence Green & Christopher D. Manning, NP Subject Detection in Verb-Initial Arabic Clauses.
    Phrase re-ordering is a well-known obstacle to robust machine translation for language pairs with significantly different word orderings. For Arabic-English, two languages that usually differ in the ordering of subject and verb, the subject and its modifiers must be accurately moved to produce a grammatical translation. This operation requires more than base phrase chunking and often defies current phrase-based statistical decoders. We present a conditional random field sequence classi- fier that detects the full scope of Arabic noun phrase subjects in (...)
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  10. David Hall & Christopher D. Manning, Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-Labeled Corpora.
    A significant portion of the world’s text is tagged by readers on social bookmarking websites. Credit attribution is an inherent problem in these corpora because most pages have multiple tags, but the tags do not always apply with equal specificity across the whole document. Solving the credit attribution problem requires associating each word in a document with the most appropriate tags and vice versa. This paper introduces Labeled LDA, a topic model that constrains Latent Dirichlet Allocation by defining a one-to-one (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  11. David Hall & Christopher D. Manning, Studying the History of Ideas Using Topic Models.
    How can the development of ideas in a scientific field be studied over time? We apply unsupervised topic modeling to the ACL Anthology to analyze historical trends in the field of Computational Linguistics from 1978 to 2006. We induce topic clusters using Latent Dirichlet Allocation, and examine the strength of each topic over time. Our methods find trends in the field including the rise of probabilistic methods starting in 1988, a steady increase in applications, and a sharp decline of research (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  12. David Hall, Christopher D. Manning, Daniel Cer & Chloe Kiddon, Learning Alignments and Leveraging Natural Logic.
    We describe an approach to textual inference that improves alignments at both the typed dependency level and at a deeper semantic level. We present a machine learning approach to alignment scoring, a stochastic search procedure, and a new tool that finds deeper semantic alignments, allowing rapid development of semantic features over the aligned graphs. Further, we describe a complementary semantic component based on natural logic, which shows an added gain of 3.13% accuracy on the RTE3 test set.
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  13. Dan Klein & Christopher D. Manning, An Ç ´Ò¿ Μ Agenda-Based Chart Parser for Arbitrary Probabilistic Context-Free Grammars.
    While Ç ´Ò¿ µ methods for parsing probabilistic context-free grammars (PCFGs) are well known, a tabular parsing framework for arbitrary PCFGs which allows for botton-up, topdown, and other parsing strategies, has not yet been provided. This paper presents such an algorithm, and shows its correctness and advantages over prior work. The paper finishes by bringing out the connections between the algorithm and work on hypergraphs, which permits us to extend the presented Viterbi (best parse) algorithm to an inside (total probability) (...)
    No categories
    Translate to English
    | Direct download (2 more)  
     
    My bibliography  
     
    Export citation  
  14. Dan Klein & Christopher D. Manning, A Generative Constituent-Context Model for Improved Grammar Induction.
    We present a generative distributional model for the unsupervised induction of natural language syntax which explicitly models constituent yields and contexts. Parameter search with EM produces higher quality analyses than previously exhibited by unsupervised systems, giving the best published unsupervised parsing results on the ATIS corpus. Experiments on Penn treebank sentences of comparable length show an even higher F1 of 71% on nontrivial brackets. We compare distributionally induced and actual part-of-speech tags as input data, and examine extensions to the basic (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  15. Dan Klein & Christopher D. Manning, A∗ Parsing: Fast Exact Viterbi Parse Selection.
    A* PCFG parsing can dramatically reduce the time required to find the exact Viterbi parse by conservatively estimating outside Viterbi probabilities. We discuss various estimates and give efficient algorithms for computing them. On Penn treebank sentences, our most detailed estimate reduces the total number of edges processed to less than 3% of that required by exhaustive parsing, and even a simpler estimate which can be pre-computed in under a minute still reduces the work by a factor of 5. The algorithm (...)
    No categories
    Direct download (2 more)  
     
    My bibliography  
     
    Export citation  
  16. Dan Klein & Christopher D. Manning, Accurate Unlexicalized Parsing.
    We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence assumptions latent in a vanilla treebank grammar. Indeed, its performance of 86.36% (LP/LR F1) is better than that of early lexicalized PCFG models, and surprisingly close to the current state-of-theart. This result has potential uses beyond establishing a strong lower bound on the maximum possible accuracy of unlexicalized models: an unlexicalized PCFG is (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  17. Dan Klein & Christopher D. Manning, Conditional Structure Versus Conditional Estimation in NLP Models.
    This paper separates conditional parameter estima- tion, which consistently raises test set accuracy on statistical NLP tasks, from conditional model struc- tures, such as the conditional Markov model used for maximum-entropy tagging, which tend to lower accuracy. Error analysis on part-of-speech tagging shows that the actual tagging errors made by the conditionally structured model derive not only from label bias, but also from other ways in which the independence assumptions of the conditional model structure are unsuited to linguistic sequences. The (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  18. Dan Klein & Christopher D. Manning, Distributional Phrase Structure Induction.
    Unsupervised grammar induction systems commonly judge potential constituents on the basis of their effects on the likelihood of the data. Linguistic justifications of constituency, on the other hand, rely on notions such as substitutability and varying external contexts. We describe two systems for distributional grammar induction which operate on such principles, using part-of-speech tags as the contextual features. The advantages and disadvantages of these systems are examined, including precision/recall trade-offs, error analysis, and extensibility.
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  19. Dan Klein & Christopher D. Manning, Fast Exact Inference with a Factored Model for Natural Language Parsing.
    We present a novel generative model for natural language tree structures in which semantic (lexical dependency) and syntactic (PCFG) structures are scored with separate models. This factorization provides conceptual simplicity, straightforward opportunities for separately improving the component models, and a level of performance comparable to similar, non-factored models. Most importantly, unlike other modern parsing models, the factored model admits an extremely effective A* parsing algorithm, which enables efficient, exact inference.
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  20. Dan Klein & Christopher D. Manning, From Instance-Level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering.
    We present an improved method for clustering in the presence of very limited supervisory information, given as pairwise instance constraints. By allowing instance-level constraints to have spacelevel inductive implications, we are able to successfully incorporate constraints for a wide range of data set types. Our method greatly improves on the previously studied constrained -means algorithm, generally requiring less than half as many constraints to achieve a given accuracy on a range of real-world data, while also being more robust when over-constrained. (...)
    No categories
    Direct download  
     
    My bibliography  
     
    Export citation  
  21. Dan Klein & Christopher D. Manning, Interpreting and Extending Classical Agglomerative Clustering Algorithms Using a Model-Based Approach.
    erative clustering. First, we show formally that the common heuristic agglomerative clustering algorithms – Ward’s method, single-link, complete-link, and a variant of group-average – are each equivalent to a hierarchical model-based method. This interpretation gives a theoretical explanation of the empirical behavior of these algorithms, as well as a principled approach to resolving practical issues, such as number of clusters or the choice of method. Second, we show how a model-based viewpoint can suggest variations on these basic agglomerative algorithms. We (...)
    No categories
    Direct download  
     
    My bibliography  
     
    Export citation  
  22. Dan Klein & Christopher D. Manning, Natural Language Grammar Induction Using a Constituent-Context Model.
    This paper presents a novel approach to the unsupervised learning of syntactic analyses of natural language text. Most previous work has focused on maximizing likelihood according to generative PCFG models. In contrast, we employ a simpler probabilistic model over trees based directly on constituent identity and linear context, and use an EM-like iterative procedure to induce structure. This method produces much higher quality analyses, giving the best published results on the ATIS dataset.
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  23. Dan Klein & Christopher D. Manning, Parsing and Hypergraphs.
    While symbolic parsers can be viewed as deduction systems, this view is less natural for probabilistic parsers. We present a view of parsing as directed hypergraph analysis which naturally covers both symbolic and probabilistic parsing. We illustrate the approach by showing how a dynamic extension of Dijkstra’s algorithm can be used to construct a probabilistic chart parser with an Ç´Ò¿µ time bound for arbitrary PCFGs, while preserving as much of the flexibility of symbolic chart parsers as allowed by the inherent (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  24. Dan Klein & Christopher D. Manning, Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank.
    This paper presents empirical studies and closely corresponding theoretical models of the performance of a chart parser exhaustively parsing the Penn Treebank with the Treebank’s own CFG grammar. We show how performance is dramatically affected by rule representation and tree transformations, but little by top-down vs. bottom-up strategies. We discuss grammatical saturation, including analysis of the strongly connected components of the phrasal nonterminals in the Treebank, and model how, as sentence length increases, the effective grammar rule size increases as regions (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  25. Dan Klein, Christopher D. Manning & Kristina Toutanova, Combining Heterogeneous Classifiers for Word-Sense Disambiguation.
    This paper discusses ensembles of simple but heterogeneous classifiers for word-sense disambiguation, examining the Stanford-CS224N system entered in the SENSEVAL-2 English lexical sample task. First-order classifiers are combined by a second-order classifier, which variously uses majority voting, weighted voting, or a maximum entropy model. While individual first-order classifiers perform comparably to middle-scoring teams’ systems, the combination achieves high performance. We discuss trade-offs and empirical performance. Finally, we present an analysis of the combination, examining how ensemble performance depends on error independence (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  26. Christopher Manning, A Conditional Random Field Word Segmenter.
    We present a Chinese word segmentation system submitted to the closed track of Sighan bakeoff 2005. Our segmenter was built using a conditional random field sequence model that provides a framework to use a large number of linguistic features such as character identity, morphological and character reduplication features. Because our morphological features were extracted from the training corpora automatically, our system was not biased toward any particular variety of Mandarin. Thus, our system does not overfit the variety of Mandarin most (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  27. Christopher Manning, A System For Identifying Named Entities in Biomedical Text: How Results From Two Evaluations Reflect on Both the System and the Evaluations.
    We present a maximum-entropy based system for identifying Named Entities (NEs) in biomedical abstracts and present its performance in the only two biomedical Named Entity Recognition (NER) comparative evaluations that have been held to date, namely BioCreative and Coling BioNLP. Our system obtained an exact match f-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation. We discuss our system in detail including its rich use of local features, attention to correct boundary identification, innovative use of external (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  28. Christopher Manning, Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web.
    We describe a machine learning system for the recognition of names in biomedical texts. The system makes extensive use of local and syntactic features within the text, as well as external resources including the web and gazetteers. It achieves an F- score of 70% on the Coling 2004 NLPBA/BioNLP shared task of identifying five biomedical named entities in the GENIA corpus.
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  29. Christopher Manning, Incorporating Non-Local Information Into Information Extraction Systems by Gibbs Sampling.
    Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sam- pling, a simple Monte Carlo method used to perform approximate inference in factored probabilistic models. By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  30. Christopher Manning, Language Varieties.
    Part-of-speech tagging, like any supervised statistical NLP task, is more difficult when test sets are very different from training sets, for example when tagging across genres or language varieties. We examined the problem of POS tagging of different varieties of Mandarin Chinese (PRC-Mainland, PRC- Hong Kong, and Taiwan). An analytic study first showed that unknown words were a major source of difficulty in cross-variety tagging. Unknown words in English tend to be proper nouns. By contrast, we found that (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  31. Christopher Manning, LFG Within King's Descriptive Formalism.
    The ontology of LFG. We need to get straight what is out there in the world and what our model objects are, what are denotations and what are descriptions that get interpreted. The title of Bresnan (1982a), The Mental Representation of Grammatical Relations, seems more likely to confuse us than help us. But in the introduction, there are some fairly clear statements of how their model of human use of language is to be constructed. Kaplan & Bresnan (1982, p. 173) (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  32. Christopher Manning, NIST Open Machine Translation 2008 Evaluation: Stanford University's System Description.
    Michel Galley, Pi-Chuan Chang, Daniel Cer, Jenny R. Finkel, and Christopher D. Manning Computer Science and Linguistics Departments Stanford University..
    No categories
    Direct download  
     
    My bibliography  
     
    Export citation  
  33. Christopher Manning, Optimizing Chinese Word Segmentation for Machine Translation Performance.
    Pi-Chuan Chang, Michel Galley, and Christopher D. Manning Computer Science Department, Stanford University Stanford, CA 94305 pichuan,galley,manning@cs.stanford.edu..
    No categories
    Direct download  
     
    My bibliography  
     
    Export citation  
  34. Christopher Manning, Regularization, Adaptation, and Non-Independent Features Improve Hidden Conditional Random Fields for Phone Classification.
    We show a number of improvements in the use of Hidden Conditional Random Fields (HCRFs) for phone classification on the TIMIT and Switchboard corpora. We first show that the use of regularization effectively prevents overfitting, improving over other methods such as early stopping. We then show that HCRFs are able to make use of non-independent features in phone classification, at least with small numbers of mixture components, while HMMs degrade due to their strong independence assumptions. Finally, we successfully apply (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  35. Christopher Manning, Regularization and Search for Minimum Error Rate Training.
    method. It is shown that the stochastic method obtains test set gains of +0.98 BLEU on MT03..
    No categories
    Direct download  
     
    My bibliography  
     
    Export citation  
  36. Christopher Manning, Verb Sense and Subcategorization: Using Joint Inference to Improve Performance on Complementary Tasks.
    We propose a general model for joint inference in correlated natural language processing tasks when fully annotated training data is not available, and apply this model to the dual tasks of word sense disambiguation and verb subcategorization frame determination. The model uses the EM algorithm to simultaneously complete partially annotated training sets and learn a generative probabilistic model over multiple annotations. When applied to the word sense and verb subcategorization frame determination tasks, the model learns sharp joint probability distributions which (...)
    No categories
    Direct download  
     
    My bibliography  
     
    Export citation  
  37. Christopher Manning, Valency Versus Binding on the Distinctness of Argument Structure.
    Most theories of binding in most syntactic frameworks assume that the same notion of surface obliqueness that identi es the subject of a clause is also used for obliqueness conditions on re exive binding For instance in GB Chomsky binding theory is standardly de ned on S structure so that in Nancy can bind herself due to the c commanding con guration that also makes Nancy the subject of the sentence..
    Direct download  
     
    My bibliography  
     
    Export citation  
  38. Christopher D. Manning, Automatic Acquisition of a Large Subcategorization Dictionary From Corpora.
    This paper presents a new method for producing a dictionary of subcategorization frames from unlabelled text corpora. It is shown that statistical filtering of the results of a finite state parser running on the output of a stochastic tagger produces high quality results, despite the error rates of the tagger and the parser. Further, it is argued that this method can be used to learn all subcategorization frames, whereas previous methods are not extensible to a general solution to the problem.
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  39. Christopher D. Manning, An Effective Two-Stage Model for Exploiting Non-Local Dependencies in Named Entity Recognition.
    This paper shows that a simple two-stage approach to handle non-local dependencies in Named Entity Recognition (NER) can outperform existing approaches that handle non-local dependencies, while being much more computationally efficient. NER systems typically use sequence models for tractable inference, but this makes them unable to capture the long distance structure present in text. We use a Conbel.
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  40. Christopher D. Manning, An Introduction to Information Retrieval.
    1 Boolean retrieval 1 2 The term vocabulary and postings lists 19 3 Dictionaries and tolerant retrieval 49 4 Index construction 67 5 Index compression 85 6 Scoring, term weighting and the vector space model 109 7 Computing scores in a complete search system 135 8 Evaluation in information retrieval 151 9 Relevance feedback and query expansion 177 10 XML retrieval 195 11 Probabilistic information retrieval 219 12 Language models for information retrieval 237 13 Text classification and Naive Bayes 253 (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  41. Christopher D. Manning, A Phrase-Based Alignment Model for Natural Language Inference.
    The alignment problem—establishing links between corresponding phrases in two related sentences—is as important in natural language inference (NLI) as it is in machine translation (MT). But the tools and techniques of MT alignment do not readily transfer to NLI, where one cannot assume semantic equivalence, and for which large volumes of bitext are lacking. We present a new NLI aligner, the MANLI system, designed to address these challenges. It uses a phrase-based alignment representation, exploits external lexical resources, and capitalizes on (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  42. Christopher D. Manning, A Simple and Effective Hierarchical Phrase Reordering Model.
    adjacent phrases, but they typically lack the ability to perform the kind of long-distance reorderings possible with syntax-based systems. In this paper, we present a novel hierarchical phrase reordering model aimed at improving non-local reorderings, which seamlessly integrates with a standard phrase-based system with little loss of computational efficiency. We show that this model can successfully handle the key examples often used to motivate syntax-based systems, such as the rotation of a prepositional phrase around a noun phrase. We contrast our (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  43. Christopher D. Manning, Argument Structure as a Locus for Binding Theory.
    The correct locus (or loci) of binding theory has been a matter of much discussion. Theories can be seen as varying along at least two dimensions. The rst is whether binding theory is con gurationally determined (that is, the theory exploits the geometry of a phrase marker, appealing to such purely structural notions as c-command and government) or whether the theory depends rather on examining the relations between items selected by a predicate (where by selection I am intending to cover (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  44. Christopher D. Manning, Computations.
    We present a novel algorithm for the fast computation of PageRank, a hyperlink-based estimate of the “importance” of Web pages. The original PageRank algorithm uses the Power Method to compute successive iterates that converge to the principal eigenvector of the Markov matrix representing the Web link graph. The algorithm presented here, called Quadratic Extrapolation, accelerates the convergence of the Power Method by periodically subtracting off estimates of the nonprincipal eigenvectors from the current iterate of the Power Method. In Quadratic Extrapolation, (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  45. Christopher D. Manning, Clustering the Tagged Web.
    Automatically clustering web pages into semantic groups promises improved search and browsing on the web. In this paper, we demonstrate how user-generated tags from largescale social bookmarking websites such as del.icio.us can be used as a complementary data source to page text and anchor text for improving automatic clustering of web pages. This paper explores the use of tags in 1) K-means clustering in an extended vector space model that includes tags as well as page text and 2) a novel (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  46. Christopher D. Manning, Disambiguating “DE” for Chinese-English Machine Translation.
    Linking constructions involving dሇ (DE) are ubiquitous in Chinese, and can be translated into English in many different ways. This is a major source of machine translation error, even when syntaxsensitive translation models are used. This paper explores how getting more information about the syntactic, semantic, and discourse context of uses of dሇ (DE) can facilitate producing an appropriate English translation strategy. We describe a finergrained classification of dሇ (DE) constructions in Chinese NPs, construct a corpus of annotated examples, and (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  47. Christopher D. Manning, Efficient, Feature-Based, Conditional Random Field Parsing.
    Discriminative feature-based methods are widely used in natural language processing, but sentence parsing is still dominated by generative methods. While prior feature-based dynamic programming parsers have restricted training and evaluation to artificially short sentences, we present the first general, featurerich discriminative parser, based on a conditional random field model, which has been successfully scaled to the full WSJ parsing data. Our efficiency is primarily due to the use of stochastic optimization techniques, as well as parallelization and chart prefiltering. On WSJ15, (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  48. Christopher D. Manning, Enforcing Transitivity in Coreference Resolution.
    A desirable quality of a coreference resolution system is the ability to handle transitivity constraints, such that even if it places high likelihood on a particular mention being coreferent with each of two other mentions, it will also consider the likelihood of those two mentions being coreferent when making a final assignment. This is exactly the kind of constraint that integer linear programming (ILP) is ideal for, but, surprisingly, previous work applying ILP to coreference resolution has not encoded this type (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  49. Christopher D. Manning, Modeling Semantic Containment and Exclusion in Natural Language Inference.
    We propose an approach to natural language inference based on a model of natural logic, which identifies valid inferences by their lexical and syntactic features, without full semantic interpretation. We greatly extend past work in natural logic, which has focused solely on semantic containment and monotonicity, to incorporate both semantic exclusion and implicativity. Our system decomposes an inference problem into a sequence of atomic edits linking premise to hypothesis; predicts a lexical entailment relation for each edit using a statistical classifier; (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
  50. Christopher D. Manning, Natural Logic for Textual Inference.
    This paper presents the first use of a computational model of natural logic—a system of logical inference which operates over natural language—for textual inference. Most current approaches to the PAS- CAL RTE textual inference task achieve robustness by sacrificing semantic precision; while broadly effective, they are easily confounded by ubiquitous inferences involving monotonicity. At the other extreme, systems which rely on first-order logic and theorem proving are precise, but excessively brittle. This work aims at a middle way. Our system finds (...)
    No categories
    Translate to English
    | Direct download  
     
    My bibliography  
     
    Export citation  
1 — 50 / 93