Clustering the Tagged Web


Abstract
Automatically clustering web pages into semantic groups promises improved search and browsing on the web. In this paper, we demonstrate how user-generated tags from largescale social bookmarking websites such as del.icio.us can be used as a complementary data source to page text and anchor text for improving automatic clustering of web pages. This paper explores the use of tags in 1) K-means clustering in an extended vector space model that includes tags as well as page text and 2) a novel generative clustering algorithm based on latent Dirichlet allocation that jointly models text and tags. We evaluate the models by comparing their output to an established web directory. We find that the naive inclusion of tagging data improves cluster quality versus page text alone, but a more principled inclusion can substantially improve the quality of all models with a statistically signifi- cant absolute F-score increase of 4%. The generative model outperforms K-means with another 8% F-score increase.
Keywords No keywords specified (fix it)
Categories (categorize this paper)
Options
Edit this record
Mark as duplicate
Export citation
Find it on Scholar
Request removal from index
Translate to english
Revision history

Download options

Our Archive


Upload a copy of this paper     Check publisher's policy     Papers currently archived: 42,993
External links

Setup an account with your affiliations in order to access resources via your University's proxy server
Configure custom proxy (use this if your affiliation does not provide a proxy)
Through your library

References found in this work BETA

No references found.

Add more references

Citations of this work BETA

Add more citations

Similar books and articles

Agency and the Semantic Web.Christopher Walton - 2006 - Oxford University Press.
The Web-Extended Mind.Paul R. Smart - 2012 - Metaphilosophy 43 (4):446-463.
Sense and Reference on the Web.Harry Halpin - 2011 - Minds and Machines 21 (2):153-178.
Defining Web Ethics.Marsha Woodbury - 1998 - Science and Engineering Ethics 4 (2):203-212.
Toward a Philosophy of The Web.Alexandre Monnin & Harry Halpin - 2012 - Metaphilosophy 43 (4):361-379.

Analytics

Added to PP index
2010-12-22

Total views
39 ( #212,402 of 2,259,536 )

Recent downloads (6 months)
2 ( #663,859 of 2,259,536 )

How can I increase my downloads?

Downloads

My notes

Sign in to use this feature