David Bourget (Western Ontario)
David Chalmers (ANU, NYU)
Rafael De Clercq
Jack Alan Reynolds
Learn more about PhilPapers
I examine what would be necessary to move part-of-speech tagging performance from its current level of about 97.3% token accuracy (56% sentence accuracy) to close to 100% accuracy. I suggest that it must still be possible to greatly increase tagging performance and examine some useful improvements that have recently been made to the Stanford Part-of-Speech Tagger. However, an error analysis of some of the remaining errors suggests that there is limited further mileage to be had either from better machine learning or better features in a discriminative sequence classiﬁer. The prospects for further gains from semisupervised learning also seem quite limited. Rather, I suggest and begin to demonstrate that the largest opportunity for further progress comes from improving the taxonomic basis of the linguistic resources from which taggers are trained. That is, from improved descriptive linguistics. However, I conclude by suggesting that there are also limits to this process. The status of some words may not be able to be adequately captured by assigning them to one of a small number of categories. While conventions can be used in such cases to improve tagging consistency, they lack a strong linguistic basis.
|Keywords||No keywords specified (fix it)|
|Categories||categorize this paper)|
|Through your library||Only published papers are available at libraries|
Similar books and articles
Dan Klein & Christopher D. Manning, Conditional Structure Versus Conditional Estimation in NLP Models.
Mitchell Green, Speech Acts. Stanford Encyclopedia of Philosophy.
Jerrold J. Katz (1977). Propositional Structure and Illocutionary Force: A Study of the Contribution of Sentence Meaning to Speech Acts. Harvester.
Savas L. Tsohatzidis (ed.) (1994). Foundations of Speech Act Theory: Philosophical and Linguistic Perspectives. Routledge.
Lynne Tirrell (2012). Genocidal Language Games. In Ishani Maitra & Mary Kate McGowan (eds.), Speech and Harm: Controversies Over Free Speech. Oxford University Press. 174--221.
Jeffrey Nesteruk (2007). Corporate Speech as Commercial Speech. Business Ethics Quarterly 17 (1):97-103.
Christopher Manning, Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger.
Ruth M. Kempson (1977). Semantic Theory. Cambridge University Press.
Philip J. Nickel (2013). Artificial Speech and Its Authors. Minds and Machines 23 (4):489-502.
Nellie Wieland (2007). Linguistic Authority and Convention in a Speech Act Analysis of Pornography. Australasian Journal of Philosophy 85 (3):435 – 456.
Added to index2011-02-10
Total downloads19 ( #73,775 of 1,018,319 )
Recent downloads (6 months)1 ( #65,343 of 1,018,319 )
How can I increase my downloads?