Topics in Cognitive Science 3 (1):92-122 (2011)
The focus of this paper is two-fold. First, similarities generated from six semantic models were compared to human ratings of paragraph similarity on two datasets—23 World Entertainment News Network paragraphs and 50 ABC newswire paragraphs. Contrary to findings on smaller textual units such as word associations (Griffiths, Tenenbaum, & Steyvers, 2007), our results suggest that when single paragraphs are compared, simple nonreductive models (word overlap and vector space) can provide better similarity estimates than more complex models (LSA, Topic Model, SpNMF, and CSM). Second, various methods of corpus creation were explored to facilitate the semantic models’ similarity estimates. Removing numeric and single characters, and also truncating document length improved performance. Automated construction of smaller Wikipedia-based corpora proved to be very effective, even improving upon the performance of corpora that had been chosen for the domain. Model performance was further improved by augmenting corpora with dataset paragraphs
|Keywords||Wikipedia corpora Corpus preprocessing Corpus construction Semantic models Paragraph similarity|
|Categories||categorize this paper)|
References found in this work BETA
A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.Thomas K. Landauer & Susan T. Dumais - 1997 - Psychological Review 104 (2):211-240.
Topics in Semantic Representation.Thomas L. Griffiths, Mark Steyvers & Joshua B. Tenenbaum - 2007 - Psychological Review 114 (2):211-244.
Sequential Sampling Models of Human Text Classification.Michael D. Lee & Elissa Y. Corlett - 2003 - Cognitive Science 27 (2):159-193.
Citations of this work BETA
Computational Methods to Extract Meaning From Text and Advance Theories of Human Cognition.Danielle S. McNamara - 2011 - Topics in Cognitive Science 3 (1):3-17.
Graph‐Theoretic Properties of Networks Based on Word Association Norms: Implications for Models of Lexical Semantic Memory.Thomas M. Gruenenfelder, Gabriel Recchia, Tim Rubin & Michael N. Jones - 2016 - Cognitive Science 40 (6):1460-1495.
Similar books and articles
Composition in Distributional Models of Semantics.Jeff Mitchell & Mirella Lapata - 2010 - Cognitive Science 34 (8):1388-1429.
Dimensional Characterization in Finite Quasi-Analysis.Daniel Schoch - 2001 - Erkenntnis 54 (1):121-131.
A Critique of the Similarity Space Theory of Concepts.Christopher Gauker - 2007 - Mind and Language 22 (4):317–345.
Generalization, Similarity, and Bayesian Inference.Joshua B. Tenenbaum & Thomas L. Griffiths - 2001 - Behavioral and Brain Sciences 24 (4):629-640.
Added to index2010-08-19
Total downloads52 ( #99,841 of 2,163,699 )
Recent downloads (6 months)1 ( #348,043 of 2,163,699 )
How can I increase my downloads?