Comparing Methods for Single Paragraph Similarity Analysis

Topics in Cognitive Science 3 (1):92-122 (2011)
The focus of this paper is two-fold. First, similarities generated from six semantic models were compared to human ratings of paragraph similarity on two datasets—23 World Entertainment News Network paragraphs and 50 ABC newswire paragraphs. Contrary to findings on smaller textual units such as word associations (Griffiths, Tenenbaum, & Steyvers, 2007), our results suggest that when single paragraphs are compared, simple nonreductive models (word overlap and vector space) can provide better similarity estimates than more complex models (LSA, Topic Model, SpNMF, and CSM). Second, various methods of corpus creation were explored to facilitate the semantic models’ similarity estimates. Removing numeric and single characters, and also truncating document length improved performance. Automated construction of smaller Wikipedia-based corpora proved to be very effective, even improving upon the performance of corpora that had been chosen for the domain. Model performance was further improved by augmenting corpora with dataset paragraphs
Keywords Wikipedia corpora  Corpus preprocessing  Corpus construction  Semantic models  Paragraph similarity
Categories (categorize this paper)
DOI 10.1111/j.1756-8765.2010.01108.x
 Save to my reading list
Follow the author(s)
My bibliography
Export citation
Find it on Scholar
Edit this record
Mark as duplicate
Revision history Request removal from index
Download options
PhilPapers Archive

Upload a copy of this paper     Check publisher's policy on self-archival     Papers currently archived: 16,707
External links
Setup an account with your affiliations in order to access resources via your University's proxy server
Configure custom proxy (use this if your affiliation does not provide a proxy)
Through your library
References found in this work BETA

Add more references

Citations of this work BETA

Add more citations

Similar books and articles

Monthly downloads

Added to index


Total downloads

8 ( #276,630 of 1,726,249 )

Recent downloads (6 months)

4 ( #183,615 of 1,726,249 )

How can I increase my downloads?

My notes
Sign in to use this feature

Start a new thread
There  are no threads in this forum
Nothing in this forum yet.