Topics in Cognitive Science 3 (1):92-122 (2011)
|Abstract||The focus of this paper is two-fold. First, similarities generated from six semantic models were compared to human ratings of paragraph similarity on two datasets—23 World Entertainment News Network paragraphs and 50 ABC newswire paragraphs. Contrary to findings on smaller textual units such as word associations (Griffiths, Tenenbaum, & Steyvers, 2007), our results suggest that when single paragraphs are compared, simple nonreductive models (word overlap and vector space) can provide better similarity estimates than more complex models (LSA, Topic Model, SpNMF, and CSM). Second, various methods of corpus creation were explored to facilitate the semantic models’ similarity estimates. Removing numeric and single characters, and also truncating document length improved performance. Automated construction of smaller Wikipedia-based corpora proved to be very effective, even improving upon the performance of corpora that had been chosen for the domain. Model performance was further improved by augmenting corpora with dataset paragraphs|
|Keywords||No keywords specified (fix it)|
|Through your library||Configure|
Similar books and articles
Joshua B. Tenenbaum & Thomas L. Griffiths (2001). Generalization, Similarity, and Bayesian Inference. Behavioral and Brain Sciences 24 (4):629-640.
Christopher Gauker (2007). A Critique of the Similarity Space Theory of Concepts. Mind and Language 22 (4):317–345.
Daniel Schoch (2001). Dimensional Characterization in Finite Quasi-Analysis. Erkenntnis 54 (1):121-131.
Jeff Mitchell & Mirella Lapata (2010). Composition in Distributional Models of Semantics. Cognitive Science 34 (8):1388-1429.
Dedre Gentner (2001). Exhuming Similarity. Behavioral and Brain Sciences 24 (4):669-669.
Emmanuel M. Pothos (2005). The Rules Versus Similarity Distinction. Behavioral and Brain Sciences 28 (1):1-14.
Added to index2010-08-19
Total downloads4 ( #178,517 of 549,006 )
Recent downloads (6 months)0
How can I increase my downloads?