Abstract
Text categorization is a key task in information retrieval and natural language processing. Providing a reliability measure of the classification result for a text document into a particular category can benefit the recognition rate as well as better inform the user with regard to the confidence that should be attributed to the output. A novel reliability measure is proposed starting from running different binary classifiers in the Error-Correcting Output Codes (ECOC) framework. Documents classified in a particular category which have a higher ECOC-computed distance from their classification in the next ranked category also have a higher associated reliability. This is the main idea explored in the proposed ECOC-based text classifier with a reject option. Experiments performed for some commonly used text categorization benchmark datasets demonstrate the potential of the proposed method.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Feldman, R., Sanger, J.: The Text Mining Handbook - Advanced Approaches in Analyzing Unstructured Data, pp. I-XII, 1–410. Cambridge University Press (2007)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1) (2002)
Hotho, A., Nurnberger, A., Paass, G.: A Brief Survey of Text Mining. LDV Forum 20(1), 19–62 (2005)
Fumera, G., Pillai, I., Roli, F.: Classification with reject option in text categorisation systems. In: Proc. 12th International Conference on Image Analysis and Processing, pp. 582–587. IEEE Computer Society (2003)
Fumera, G., Pillai, I., Roli, F.: A Two-Stage Classifier with Reject Option for Text Categorisation. In: Structural, Syntactic, and Statistical Patt. Rec., pp. 771–779 (2004)
Theeramunkong, T., Sriphaew, K.: Discovery of Relations among Scientific Articles using Association Rule Mining. In: Proceedings of the 2007 NSTDA Annual Conference Science (Science and Technology for National Productivity and Happiness), Thailand Science Park, Pathumthani, Thailand (2007)
Pillai, I., Fumera, G., Roli, F.: A Classification Approach with a Reject Option for Multi-label Problems. In: Maino, G., Foresti, G.L. (eds.) ICIAP 2011, Part I. LNCS, vol. 6978, pp. 98–107. Springer, Heidelberg (2011)
Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error correcting output codes. J. of Artificial Intelligence Research 2, 263–286 (1995)
Kołcz, A., Chowdhury, A.: Improved Naive Bayes for Extremely Skewed Misclassification Costs. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 561–568. Springer, Heidelberg (2005)
Smirnov, E.N., Nalbantov, G.I., Kaptein, A.M.: Meta-conformity approach to reliable classification. Intell. Data Anal. 13(6), 901–915 (2009)
Kaptein, A.M.: Meta-Classifier Approaches to Reliable Text Classification, Master Thesis, Universiteit Maastricht, The Netherlands (2005)
Allwein, E.L., Shapire, R.E., Singer, Y.: Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research 1, 113–141 (2000)
Hastie, T., Tibshirani, R.: Classification by pairwise grouping. The Annals of Stat. 26(5), 451–471 (1998)
Lin, S., Costello, D.J.: Error Control Coding, 2nd edn. Prentice-Hall, Inc. (2004)
Hatami, N.: Thinned-ECOC ensemble based on sequential code shrinking. Expert Systems with Applications 39(1) (2012)
Pujol, O., Radeva, P., Vitria, J.: Discriminant ECOC: A heuristic method for application dependent design of error correcting output codes. IEEE Transactions on PAMI 28(6), 1001–1007 (2006)
Pujol, O., Escalera, S., Radeva, P.: An incremental node embedding technique for error correcting output codes. Pattern Recognition 41, 713–725 (2008)
Zhou, J., Peng, H., Suen, C.Y.: Data-driven decomposition for multi-class classification. Pattern Recognition 41, 67–76 (2008)
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)
Kosmopoulos, A., Gaussier, E., Paliouras, G., Aseervatham, S.: The ECIR 2010 Large Scale Hierarchical Classification, Workshop report (2010)
Silla Jr., C.N., Freitas, A.A.: A Survey of Hierarchical Classification Across Different Application Domains. Data Mining and Knowledge Discovery 20(1) (2010)
Armano, G., Chira, C., Hatami, N.: Error-Correcting Output Codes for Multi-label Text Categorization. In: Proceedings of 3rd Italian Information Retrieval Workshop (IIR 2012), Bari (in press, 2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Armano, G., Chira, C., Hatami, N. (2012). Ensemble of Binary Learners for Reliable Text Categorization with a Reject Option. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, SB. (eds) Hybrid Artificial Intelligent Systems. HAIS 2012. Lecture Notes in Computer Science(), vol 7208. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28942-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-28942-2_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28941-5
Online ISBN: 978-3-642-28942-2
eBook Packages: Computer ScienceComputer Science (R0)