An integrated explicit and implicit offensive language taxonomy

Barbara Lewandowska-Tomaszczyk; Anna Bączkowska; Chaya Liebeskind; Giedre Valunaite Oleskeviciene; Slavko Žitnik

doi:10.1515/lpp-2023-0002

Published by De Gruyter Mouton July 20, 2023

An integrated explicit and implicit offensive language taxonomy

Barbara Lewandowska-Tomaszczyk
Barbara Lewandowska-Tomaszczyk is Professor Ordinarius Dr Habil. in Linguistics and English Language at the Department of Language and Communication at the University of Applied Sciences in Konin (Poland). Her research focuses on cognitive semantics and pragmatics of language contrasts, corpus linguistics and their applications in translation studies, lexicography and online discourse analysis. She is invited to read papers at international conferences and to lecture and conduct seminars at universities. She publishes extensively, supervises dissertations and also organizes international conferences and workshops.
, Anna Bączkowska
Anna Bączkowska, Dr Habil. Prof. UG, holds MA in English Philology, which she received from Adam Mickiewicz University in Poznań, as well as PhD in linguistics and D.Litt. in English Linguistics, which she received from the University of Łódź. Her research interests revolve around translation studies (film subtitles), cognitive semantics, corpus and computational linguistics, and discourse studies (media discourse). She has guest lectured in Italy, Spain, Portugal, UK, Norway, Kazakhstan and Slovakia, and she has also conducted her research during her scientific stays in Ireland, Iceland, Norway, Austria and Luxembourg.
, Chaya Liebeskind
Chaya Liebeskind is a lecturer and researcher in the Department of Computer Science at the Jerusalem College of Technology. Her research interests span both Natural Language Processing and data mining. Especially, her scientific interests include Semantic Similarity, Language Technology for Cultural Heritage, Morphologically rich languages (MRL), Multi-word Expressions (MWEs), Information Retrieval (IR), and Text Classification (TC). Much of her recent work has been focusing on analysing offensive language. She has published a variety of studies and a few of her articles are under review or in preparation. She is a member of several international research actions funded by the EU.
, Giedre Valunaite Oleskeviciene
Giedrė Valūnaitė Oleškevičienė is a Vice-Dean for Scientific Research of the Faculty of Public Governance and Business and a professor at the Institute of Humanities, Mykolas Romeris University. Her scientific interests in humanities include discourse analysis, professional English, legal English, linguistics and translation research, while in the domain of social sciences, her scientific interests include social research methodology, modern education, philosophical issues, creativity development in modern education system, and second language teaching and learning. The researcher coordinated international research projects funded by the EU, publishes scientific articles, participates as a presenter in scientific conferences.
and Slavko Žitnik
Slavko Žitnik is Assistant Professor and Vice-dean for Education at the University of Ljubljana, Faculty for Computer and Information Science. His research focuses on natural language processing, information extraction, databases, semantic technologies, and information systems. He is actively collaborating with Université Paris 1 Sorbonne, Harvard University, University of South Florida, and University of Belgrade. He is engaged in multiple research and professional projects. As a chairman of Slovenian Language Technologies Society he is organizing lectures related to language technologies and provides grants to students to visit summer schools. He is also Chairman of the Slovene Society INFORMATIKA, and organizes national conferences on informatics and is editor of a scientific journal.

From the journal Lodz Papers in Pragmatics

https://doi.org/10.1515/lpp-2023-0002

Showing a limited preview of this publication:

Abstract

The current study represents an integrated model of explicit and implicit offensive language taxonomy. First, it focuses on a definitional revision and enrichment of the explicit offensive language taxonomy by reviewing the collection of available corpora and comparing tagging schemas applied there. The study relies mainly on the categories originally proposed by Zampieri et al. (2019) in terms of offensive language categorization schemata. After the explanation of semantic differences between particular concepts used in the tagging systems and the analysis of theoretical frameworks, a finite set of classes is presented, which cover aspects of offensive language representation along with linguistically sound explanations (Lewandowska-Tomaszczyk et al. 2021). In the analytic procedure, offensive from non-offensive discourse is first distinguished, with the question of offence Target and the following categorization levels and sublevels. Based on the relevant data generated from Sketch Engine (https://www.sketchengine.eu/ententen-english-corpus/), we propose the concept of offensive language as a superordinate category in our system with a number of hierarchically arranged 17 subcategories. The categories are taxonomically structured into 4 levels and verified with the use of neural-based (lexical) embeddings. Together with a taxonomy of implicit offensive language and its subcategorization levels which has received little scholarly attention until now, the categorization is exemplified in samples of offensive discourses in selected English social media materials, i.e., publicly available 25 web-based hate speech datasets (consult Appendix 1 for a complete list). The offensive category levels (types of offence, targets, etc.) and aspects (offensive language property clusters) as well as the categories of explicitness and implicitness are discussed in the study and the computationally verified integrated explicit and implicit offensive language taxonomy proposed in the study.

Keywords: (categorization) aspects; categorization schemata; corpus data(sets); explicit; implicit; language levels; offensive language; tagging, taxonomy; word (lexical) embeddings

About the authors

Barbara Lewandowska-Tomaszczyk

Barbara Lewandowska-Tomaszczyk is Professor Ordinarius Dr Habil. in Linguistics and English Language at the Department of Language and Communication at the University of Applied Sciences in Konin (Poland). Her research focuses on cognitive semantics and pragmatics of language contrasts, corpus linguistics and their applications in translation studies, lexicography and online discourse analysis. She is invited to read papers at international conferences and to lecture and conduct seminars at universities. She publishes extensively, supervises dissertations and also organizes international conferences and workshops.

Anna Bączkowska

Anna Bączkowska, Dr Habil. Prof. UG, holds MA in English Philology, which she received from Adam Mickiewicz University in Poznań, as well as PhD in linguistics and D.Litt. in English Linguistics, which she received from the University of Łódź. Her research interests revolve around translation studies (film subtitles), cognitive semantics, corpus and computational linguistics, and discourse studies (media discourse). She has guest lectured in Italy, Spain, Portugal, UK, Norway, Kazakhstan and Slovakia, and she has also conducted her research during her scientific stays in Ireland, Iceland, Norway, Austria and Luxembourg.

Chaya Liebeskind

Chaya Liebeskind is a lecturer and researcher in the Department of Computer Science at the Jerusalem College of Technology. Her research interests span both Natural Language Processing and data mining. Especially, her scientific interests include Semantic Similarity, Language Technology for Cultural Heritage, Morphologically rich languages (MRL), Multi-word Expressions (MWEs), Information Retrieval (IR), and Text Classification (TC). Much of her recent work has been focusing on analysing offensive language. She has published a variety of studies and a few of her articles are under review or in preparation. She is a member of several international research actions funded by the EU.

Giedre Valunaite Oleskeviciene

Giedrė Valūnaitė Oleškevičienė is a Vice-Dean for Scientific Research of the Faculty of Public Governance and Business and a professor at the Institute of Humanities, Mykolas Romeris University. Her scientific interests in humanities include discourse analysis, professional English, legal English, linguistics and translation research, while in the domain of social sciences, her scientific interests include social research methodology, modern education, philosophical issues, creativity development in modern education system, and second language teaching and learning. The researcher coordinated international research projects funded by the EU, publishes scientific articles, participates as a presenter in scientific conferences.

Slavko Žitnik

Slavko Žitnik is Assistant Professor and Vice-dean for Education at the University of Ljubljana, Faculty for Computer and Information Science. His research focuses on natural language processing, information extraction, databases, semantic technologies, and information systems. He is actively collaborating with Université Paris 1 Sorbonne, Harvard University, University of South Florida, and University of Belgrade. He is engaged in multiple research and professional projects. As a chairman of Slovenian Language Technologies Society he is organizing lectures related to language technologies and provides grants to students to visit summer schools. He is also Chairman of the Slovene Society INFORMATIKA, and organizes national conferences on informatics and is editor of a scientific journal.

Acknowledgements

The present study has been conducted within the Use Case WG 4.1.1. Incivility in Media and Social Media, COST Action CA 18209 European network for Web-centred linguistic data science Nexus Linguarum.

Appendix 1

Appendix 1 English datasets used in the present study

Types:

Level A (offensive vs. non-offensive)

Level B Offensive (subtypes)

Level C (implicit vs. explicit)

Level D (morphosyntactic features)

Size indicates the number of posts in a dataset

Project	Source	Size	Tags	Reference	Type
Automated Hate Speech Detection and the Problem of Offensive Language	Twitter	24 802	Hierarchy (Hate, Offensive, Neither)	Davidson, T., Warmsley, D., Macy, M. and Weber, I., 2017. Automated Hate Speech Detection and the Problem of Offensive Language. ArXiv,.	A, B
Hate Speech Dataset from a White Supremacy Forum	Stormfront (Forum)	9 916	Ternary (Hate, Relation, Not)	de Gibert, O., Perez, N.,García-Pablos, A., and Cuadros, M., 2018. Hate Speech Dataset from a White Supremacy Forum. In: Proceedings of the 2nd Workshop on Abusive Language Online (ALW2). Brussels, Belgium: Association for Computational Linguistics, pp.11-20.	A
Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter	Twitter	16 914	3-topic (Sexist, Racist, Not)	Waseem, Z. and Horvy, D., 2016. Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In: Proceedings of the NAACL Student Research Workshop. San Diego, California: Association for Computational Linguistics, pp. 88-93.	A
Detecting Online Hate Speech Using Context Aware Models	FoxNews, posts	1 528	Binary (Hate / Not)	Gao, L. and Huang, R., 2018. Detecting Online Hate Speech Using Context Aware Models. ArXiv.	A
Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter	Twitter	4 033	Multi-topic (Sexist, Racist, Neither, Both)	Waseem, Z., 2016. Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter. In: Proceedings of 2016 EMNLP Workshop on Natural Language Processing and Computational Social Science. Copenhagen, Denmark: Association for Computational Linguistics, pp. 138-142.	A
When Does a Compliment Become Sexist? Analysis and Classification of Ambivalent Sexism Using Twitter Data	Twitter	712	Hierarchy of Sexism (Benevolent sexism, Hostile sexism, None)	Jha, A. and Mamidi, R., 2017. When does a Compliment become Sexist? Analysis and Classification of Ambivalent Sexism using Twitter Data. In: Proceedings of the Second Workshop on Natural Language Processing and	A
				Computational Social Science. Vancouver, Canada: Association for Computational Linguistics, pp. 7-16.
Overview of the Task on Automatic Misogyny Identification at IberEval 2018	Twitter	3 977	Binary (misogyny / not), 5 categories (stereotype, dominance, derailing, sexual harassment, discredit), target of misogyny (active or passive)	Fersini, E., Rosso, P. and Anzovino, M., 2018. Overview of the Task on Automatic Misogyny Identification at IberEval 2018. In: Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018).	A
CONAN - COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech	Synthetic / Facebook posts	1 288	Binary (Islamophobic / not), multi-topic (Culture, Economics, Crimes, Rapism, Terrorism, Women Oppression, History, Other/generic)	Chung, Y., Kuzmenko, E., Tekiroglu, S. and Guerini, M., 2019. CONAN - COunter NArratives through Nichesourcing: A Multilingual Dataset of Responses to Fight Online Hate Speech. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, pp. 2819-2829.	A
Characterizing and Detecting Hateful Users on Twitter	Twitter	4 972	Binary (Hateful/Not)	Ribeiro, M., Calais, P., Santos, Y., Almeida, V. and Meira, W., 2018. Characterizing and Detecting Hateful Users on Twitter. ArXiv	A
A Benchmark Dataset for Learning to Intervene in Online Hate Speech	Platform Gab, posts	33 776	Binary (Hateful/Not)	Qian, J., Bethke, A., Belding, E. and Yang Wang, W., 2019. A Benchmark Dataset for Learning to Intervene in Online Hate Speech. ArXiv	A
A Benchmark Dataset for Learning to Intervene in Online Hate Speech	Reddit	22 324	Binary (Hateful/Not)	Qian, J., Bethke, A., Belding, E. and Yang Wang, W., 2019. A Benchmark Dataset for Learning to Intervene in Online Hate Speech. ArXiv	A
Multilingual and Multi-Aspect Hate Speech Analysis	Twitter	5 647	Hostility, Directness, Target attribute and Target group	Ousidhoum, N., Lin, Z., Zhang, H., Song, Y. and Yeung, D., 2019. Multilingual and Multi-Aspect Hate	A, B, C
				Speech Analysis. ArXiv
Exploring Hate Speech Detection in Multimodal Publications	Twitter	149 823	Six primary categories (No attacks to any community, Racist, Sexist, Homophobic, Religion based attack, Attack to other community)	Gomez, R., Gibert, J., Gomez, L. and Karatzas, D., 2019. Exploring Hate Speech Detection in Multimodal Publications. ArXiv	A
Predicting the Type and Target of Offensive Posts in Social Media	Twitter	14 100	Branching structure of tasks: Binary (Offensive, Not), Within Offensive (Target, Not), Within Target (Individual, Group, Other)	Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N. and Kumar, R., 2019. SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval). ArXiv,.	A, C
hatEval, SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter	Twitter	13 000	Branching structure of tasks: Binary (Hate, Not), Within Hate (Group, Individual), Within Hate (Agressive, Not)	Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Pardo, F., Rosso, P. and Sanguinetti, M., 2019. SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter. In: Proceedings of the 13th International Workshop on Semantic Evaluation. Minneapolis, Minnesota: Association for Computational Linguistics, pp. 54-63.	A, B, C
Peer to Peer Hate: Hate Speech Instigators and Their Targets	Twitter	27 330	Binary (Hate/Not), only for tweets which have both a Hate Instigator and Hate Target	ElSherief, M., Nilizadeh, S., Nguyen, D., Vigna, G. and Belding, E., 2018. Peer to Peer Hate: Hate Speech Instigators and Their Targets. In: Proceedings of the Twelfth International AAAI Conference on Web and Social Media (ICWSM 2018). Santa Barbara, California: University of California, pp. 52-61.	A
Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo-European Languages	Twitter and Facebook	7 005	Branching structure of tasks. A: Hate / Offensive or Neither, B: Hate Speech, Offensive, or Profane, C: Targeted or Untargeted	Modha, S., Mandl T., Majumder, P., Patel, D. 2019. Overview of the HASOC track at FIRE 2019. In: Proceedings of the 11th Forum for Information Retrieval Evaluation	A, B
Large Scale Crowdsourcing	Twitter	80 000	Multi-thematic (Abusive, Hateful, Normal, Spam)	Founta, A., Djouvas, C., Chatzakou, D., Leontiadis, I.,	A, B
and Characterization of Twitter Abusive Behavior				Blackburn, J., Stringhini, G., Vakali, A., Sirivianos, M. and Kourtellis, N., 2018. Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. ArXiv
A Large Labeled Corpus for Online Harassment Research	Twitter	35 000	Binary (Harassment, Not)	Golbeck, J., Ashktorab, Z., Banjo, R., Berlinger, A., Bhagwan, S., Buntain, C., Cheakalos, P., Geller, A., Gergory, Q., Gnanasekaran, R., Gnanasekaran, R., Hoffman, K., Hottle, J., Jienjitlert, V., Khare, S., Lau, R., Martindale, M., Naik, S., Nixon, H., Ramachandran, P., Rogers, K., Rogers, L., Sarin, M., Shahane, G., Thanki, J., Vengataraman, P., Wan, Z. and Wu, D., 2017. A Large Labeled Corpus for Online Harassment Research. In: Proceedings of the 2017 ACM on Web Science Conference. New York: Association for Computing Machinery, pp. 229-233.	A
Ex Machina: Personal Attacks Seen at Scale, Personal attacks	Wikipedia posts	115 737	Binary (Personal attack, Not)	Wulczyn, E., Thain, N. and Dixon, L., 2017. Ex Machina: Personal Attacks Seen at Scale. ArXiv	A
Ex Machina: Personal Attacks Seen at Scale, Toxicity	Wikipedia posts	100 000	Toxicity/healthiness judgement (very toxic, neutral, very healthy)	Wulczyn, E., Thain, N. and Dixon, L., 2017. Ex Machina: Personal Attacks Seen at Scale. ArXiv	A
Detecting cyberbullying in online communities	World of Warcraft, posts	16 975	Binary (Harassment, Not)	Bretschneider, U. and Peters, R., 2016. Detecting Cyberbullying in Online Communities. Research Papers, 61.	A
Detecting cyberbullying in online communities	League of Legends, posts	17 354	Binary (Harassment, Not)	Bretschneider, U. and Peters, R., 2016. Detecting Cyberbullying in Online Communities. Research Papers, 61.	A
A Quality Type- aware Annotated Corpus and Lexicon for Harassment Research	Twitter	24 189	Multi-topic harassment detection	Rezvan, M., Shekarpour, S., Balasuriya, L., Thirunarayan, K., Shalin, V. and Sheth, A., 2018. A Quality Type-aware Annotated Corpus and Lexicon for Harassment	A
				Research. ArXiv
Ex Machina: Personal Attacks Seen at Scale, Aggression and Friendliness	Wikipedia posts	160 000	Aggression/friendliness judgement on a 5-point scale. (very aggressive, neutral, very friendly)	Wulczyn, E., Thain, N. and Dixon, L., 2017. Ex Machina: Personal Attacks Seen at Scale. ArXiv	A, B
OffensEval 2019, OffensEval 2020	Twitter, except for Danish: Facebook, Reddit, and comments in a local newspaper, Ekstra Bladet	over nine million, 10 000, 3 600, 10 287, 35 000	Three-level hierarchy: • Level A - Offensive Language Detection – NOT: content that is neither offensive, nor profane; – OFF: content containing inappropriate language, insults, or threats. • Level B - Categorization of Offensive Language – TIN: targeted insult or threat towards a group or an individual; – UNT: text containing untargeted profanity or swearing. • Level C- Offensive Language Target Identification – IND: the target is an individual explicitly or implicitly mentioned in the conversation; – GRP: hate speech, targeting group of people based on ethnicity, gender, sexual orientation, religious belief, or other common characteristic; – OTH: targets that do not fall into any of the previous categories, e.g., organizations, events, and issues.	Zampieri, M., Nakov, P., Rosenthal, S., Atanasova, P., Karadzhov, G., Mubarak, H., ... & Çöltekin, Ç. (2020). SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020). arXiv preprint arXiv:2006.07235.	A, C, D
Illegal is not a Noun: Linguistic Form for Detection of Pejorative Nominalizations	Twitter, Reddit, news articles and interviews, political debates, and video and written blogs	56 237	Four target adjectives: Illegal, Female, Gay and Poor, two categories: linguistic form and pejorative meaning	Palmer, A., Robinson, M., & Phillips, K. K. (2017, August). Illegal is not a noun: Linguistic form for detection of pejorative nominalizations. In Proceedings of the First Workshop on Abusive Language Online (pp. 91-100).	D
Detecting and Monitoring Hate Speech in Twitter	Twitter	6 000	Binary	Pereira-Kohatsu, J. C., Quijano-Sánchez, L., Liberatore, F., & Camacho-Collados, M. (2019).	A
				Detecting and monitoring hate speech in Twitter. Sensors, 19(21), 4654.
HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection	Twitter and Gap	9 055, 11 093	3-class classification (i.e., hate, offensive or normal), the target community (i.e., the community that has been the victim of hate speech/offensive speech in the post), and the rationales, i.e., the portions of the post on which their labelling decision (as hate, offensive or normal) is based.	Mathew, B., Saha, P., Yimam, S. M., Biemann, C., Goyal, P., & Mukherjee, A. (2020). HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection. arXiv preprint arXiv:2012.10289.	A, B, C
Automatic detection of cyberbullying in social media text	social networking site ASKfm	113 69 8, 78 387	four roles are distinguished in the annotation scheme, including victim, bully, and two types of bystanders, a number of textual categories that are often inherent to a cyberbullying event, such as threats, insults, defensive statements from a victim, encouragements to the harasser, etc.	Van Hee, C., Jacobs, G., Emmery, C., Desmet, B., Lefever, E., Verhoeven, B., ... & Hoste, V. (2018). Automatic detection of cyberbullying in social media text. PloS one, 13(10), e0203794.	A, B, C

References

Adams, Maurianne, Lee Anne Bell & Pat Griffin. 2007. Teaching for diversity and social justice. London: Routledge/Taylor & Francis Group10.4324/9780203940822Search in Google Scholar

Alhujailli, Ashraf, Waldemar Karwowski, Thomas Wan & Peter Hancock. 2020. Affective and stress consequences of cyberbullying. Symmetry 12.9. 153610.3390/sym12091536Search in Google Scholar

Allan, Keith. 2015. When is a slur not a slur? the use of nigger in ‘pulp fiction’. Language Sciences 52. 187–19910.1016/j.langsci.2015.03.001Search in Google Scholar

Allan, Keith & Kate Burridge. 2006. Forbidden Words: Taboo and the Censoring of Language. Cambridge: Cambridge University Press10.1017/CBO9780511617881Search in Google Scholar

Anderson, Luvell & Ernie Lepore. 2013. A brief essay on slurs. Alessandro Capone, Franco Lo Piparo & Marco Carapezza (eds.), Perspectives on Pragmatics and Philosophy, 507–514. Cham: Springer10.1007/978-3-319-01011-3_23Search in Google Scholar

Andersson, Lars-Gunnar & Peter Trudgill. 1990. Bad Language. London: Penguin Books LtdSearch in Google Scholar

Austin, James. 1962. How to do things with words. Oxford: Oxford University PressSearch in Google Scholar

Bach, Kent. 1994. Conversational implicature. Mind and Language 9. 124–16210.1111/j.1468-0017.1994.tb00220.xSearch in Google Scholar

Bach, Kent & Robert Harnish. 1979. Linguistic Communication and Speech Acts. Cambridge, MA: MIT PressSearch in Google Scholar

Baider, Fabienne & Monika Kopytowska. 2018. Narrating hostility, challenging hostile narratives. Lodz Papers in Pragmatics 14.1–2410.1515/lpp-2018-0001Search in Google Scholar

Bączkowska, Anna. 2022. Explicit and implicit offensiveness in dialogical film discourse in Bridgit Jones films. International Review of Pragmatics 14. 198–22510.1163/18773109-01402003Search in Google Scholar

Bączkowska, Anna, Barbara Lewandowska-Tomaszczyk, Slavko Žitnik, Chaya Liebeskind, Giedre Oleskeviciene Valunaite & Marcin Trojszczak. 2022. Implicit offensive language taxonomy and its application to automatic extraction and ontology. Paper presented at LLOD Approaches to Language Data Research and Management, Mykolas Romeris University in Vilnius, 21–22 SeptemberSearch in Google Scholar

Bhattacharya, Shiladitya, Siddharth Singh, Ritesh Kumar, Akanksha Bansal, Akash Bhagat, Yogesh Dawer, Bornini Lahiri & Atul Kr. Ojha, 2020. Developing a multilingual annotated corpus of misogyny and aggression. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, 158–168, Marseille, France. European Language Resources Association (ELRA)Search in Google Scholar

Bretschneider, Uwe & Ralf Peters. 2016. Detecting cyberbullying in online communities. Research Papers, Paper 61Search in Google Scholar

Bretschneider, Uwe & Ralf Peters. 2017. Detecting offensive statements towards foreigners in social media. In Proceedings of the 50th Hawaii International Conference on System Sciences (HICSS), Hawaii, USA10.24251/HICSS.2017.268Search in Google Scholar

Brown, Penelope & Stephen, C. Levinson. 1987. Politeness: Some universals in language usage. Cambridge: Cambridge University Press10.1017/CBO9780511813085Search in Google Scholar

Cachola, Isabel, Eric Holgate, Daniel Preoţiuc-Pietro & Junyi Jessy Li. 2018. Expressively vulgar: The socio- dynamics of vulgarity and its effects on sentiment analysis in social media. In Proceedings of the 27th International Conference on Computational Linguistics, 2927–2938Search in Google Scholar

Carston, Robyn. 2009. The explicit/implicit distinction in pragmatics and the limits of explicit communication. International Review of Pragmatics 1(1). 35–6210.1163/187731009X455839Search in Google Scholar

Caselli, Tommaso, Valerio Basile, Jelena Mitrović & Michael Granitzer. 2021. HateBERT: Retraining BERT for abusive language detection in English. In Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), 17–25. Association for Computational Linguistics10.18653/v1/2021.woah-1.3Search in Google Scholar

Cepollaro, Bianca. 2015. In defense of a presuppositional account of slurs. Language Sciences 52. 36–4510.1016/j.langsci.2014.11.004Search in Google Scholar

Chalkidis, Ilias, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, & Ion Androutsopoulos. 2020. Legalbert: “preparing the muppets for court’”. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, 2898–2904Search in Google Scholar

Chandrasekharan, Eshwar, Mattia Samory, Shagun Jhaver, Hunter Charvat, Amy Bruckman, Clif Lampe, Jacob Eisenstein & Eric Gilbert. 2018. The internet’s hidden rules: An empirical study of reddit norm violations at micro, meso, and macro scales. Proceedings of the ACM on Human-Computer Interaction, 2(CSCW): 3210.1145/3274301Search in Google Scholar

Chung, Yi-Ling, Elizaveta Kuzmenko, Serra Sinem Tekiroglu, & Marco Guerini. 2019. CONAN - COunter NArratives through Nichesourcing: a multilingual dataset of responses to fight online hate speech. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2819–2829, Florence, Italy. Association for Computational Linguistics10.18653/v1/P19-1271Search in Google Scholar

Cousens, Chris. 2020. Are ableist insults secretly slurs? Language Sciences 77.10125210.1016/j.langsci.2019.101252Search in Google Scholar

Croom, Adam. 2011. Slurs. Language Sciences 33(3). 343–35810.1016/j.langsci.2010.11.005Search in Google Scholar

Croom, Adam. 2014. The semantics of slurs: A refutation of pure expressivism. Language Sciences 41. 227–24210.1016/j.langsci.2013.07.003Search in Google Scholar

Culpeper, Jonathan. 2005. Impoliteness and entertainment in the television quiz show: The weakest link. Journal of Politeness Research 1(1). 35–7210.1515/jplr.2005.1.1.35Search in Google Scholar

Culpeper, Jonathan. 2011. Impoliteness: Using language to cause offence. Cambridge: Cambridge University Press10.1017/CBO9780511975752Search in Google Scholar

Culpeper, Jonathan & Michael Haugh. 2014. Pragmatics and the English Language. London: Red Globe Press10.1007/978-1-137-39391-3Search in Google Scholar

Culpeper, Jonathan & Michael Haugh. 2021. The metalinguistics of offence in (British) English: a corpus-based metapragmatic approach. Journal of Language Aggression and Conflict 9(2). 185–21410.1075/jlac.00035.culSearch in Google Scholar

Davidson, Thomas, Dana Warmsley, Michael Macy & Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media, 512–51510.1609/icwsm.v11i1.14955Search in Google Scholar

de Gibert, Ona, Naiara Perez, Aitor García-Pablos & Montse Cuadros. 2018. Hate Speech Dataset from a White Supremacy Forum. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2). 11–20, Brussels, Belgium. Association for Computational Linguistics10.18653/v1/W18-5102Search in Google Scholar

Eelen, Gino. 2014. A Critique of Politeness Theory: Volume 1. London: Routledge10.4324/9781315760179Search in Google Scholar

Erjavec, Karmen & Melita Poler Kovacic. 2012. “You don’t understand, this is a new war!” analysis of hate speech in news web sites’ comments. Mass Communication and Society 15. 899–92010.1080/15205436.2011.619679Search in Google Scholar

Founta, Antigoni-Maria, Djouvas, Constantinos, Chatzakou, Despoina, Leontiadis, Ilias, Blackburn, Jeremy, Stringhini, Gianluca, Vakali, Athena, Sirivianos, Michael & Nicolas Kourtellis. 2018a. Large scale crowdsourcing and characterization of twitter abusive behavior. In 11th International Conference on Web and Social Media, ICWSM 2018. AAAI PressSearch in Google Scholar

Founta, Antigoni-Maria, Constantinos Djouvas, Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Gianluca Stringhini, Athena Vakali, Michael Sirivianos & Nicolas Kourtellis. 2018b. Large scale crowdsourcing and characterization of twitter abusive behavior. In Twelfth International AAAI Conference on Web and Social Media, 491–50010.1609/icwsm.v12i1.14991Search in Google Scholar

Frege, Gottlob 1956. I.—the thought: A logical inquiry. Mind 65. 289–31110.1093/mind/65.1.289Search in Google Scholar

Goossens, Louis. 1990. Metaphtonymy: The interaction of metaphor and metonymy in expressions for linguistic actions. Cognitive Linguistics 1–3. 323–34010.1515/cogl.1990.1.3.323Search in Google Scholar

Gao, Lei & Ruihong Huang. 2017. Detecting online hate speech using context aware models. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017. 260–266, Varna, Bulgaria. INCOMA Ltd10.26615/978-954-452-049-6_036Search in Google Scholar

Goffman, Erving. 1955. On face-work; an analysis of ritual elements in social interaction. Psychiatry MMC 18. 213–23110.1080/00332747.1955.11023008Search in Google Scholar

Golbeck, Jennifer, Zahra Ashktorab, Rashad O. Banjo, Alexandra Berlinger, Siddharth Bhagwan, Cody Buntain, Paul Cheakalos, Alicia A. Geller, Quint Gergory, Rajesh Kumar Gnanasekaran, Raja R. Gunasekaran, Kelly M. Hoffman, Jenny Hottle, Vichita Jienjitlert, Shivika Khare, Ryan Lau, Marianna J. Martindale, Shalmali M. Naik, Heather L. Nixon, Riyush Ramachandran, Kristine M. Rogers, Lisa Rogers, Meghna S. Sarin, Gaurav Shahane, Jayanee Thanki, Priyanka Vengataraman, Zijian Wan & Derek Wu. 2017. A large labeled corpus for online harassment research. In Proceedings of the 2017 ACM on Web Science Conference, WebSci 17. 229–23310.1145/3091478.3091509Search in Google Scholar

Gomez, Raul, Jaume Gibert, Lluis Gomez & Dimosthenis Karatzas. 2020. Exploring hate speech detection in multimodal publications. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). 1459–146710.1109/WACV45572.2020.9093414Search in Google Scholar

Graumann, Carl-Fridrich & Marget Wintermantel. 2015. Diskriminierende Sprechakte. Ein funktionaler Ansatz. 147–178. transcript Verlag. In Steffen Herrmann, Sybille Krämer & Hannes Kuch (eds.), Verletzende Worte: Die Grammatik sprachlicher Missachtung, 147–178. Bielefeld: transcript Verlag10.1515/9783839405659-006Search in Google Scholar

Grice, Paul H. 1968, ‘Utterer’s Meaning, Sentence Meaning, and Word-Meaning’, Foundations of Language 4. 225–4210.1007/978-94-009-2727-8_2Search in Google Scholar

Grice Paul H. 1989. Studies in the way of words. Cambridge, MA: Harvard University PressSearch in Google Scholar

Haugh, Michael & Jonathan Culpeper. 2018. Integrative pragmatics and (im)politeness theory. Pragmatics and its interfaces, 213–239 Amsterdam: John Benjamins10.1075/pbns.294.10hauSearch in Google Scholar

Haugh, Michael & Valerie Sinkeviciute. 2019. Offence and conflict talk. In Matthew Evans, Lesley Jeffries & Jim O’Driscoll (eds.), The Routledge Handbook of Language in Conflict, 196–214. London: Routledge10.4324/9780429058011-12Search in Google Scholar

Hess, Leopold. 2020. Slurs and expressive commitments. Acta Analytica 36. 263–29010.1007/s12136-020-00445-xSearch in Google Scholar

Hom, Christopher. 2008. The semantics of racial epithets. The Journal of Philosophy 105. 416–44010.5840/jphil2008105834Search in Google Scholar

Hornsby, Jennifer. 2001. Meaning and uselessness: How to think about derogatory words. Midwest Studies in Philosophy 25(1). 128–14110.1111/1475-4975.00042Search in Google Scholar

Hudson, David L. 2012. The first amendment: freedom of speech. West, a Thomson Reuters businessSearch in Google Scholar

Jeshion, Robin. 2013. Expressivism and the offensiveness of slurs. Philosophical Perspectives 27. 231–25910.1111/phpe.12027Search in Google Scholar

Jha, Akshita & Radhika Mamidi. 2017. When does a compliment become sexist? analysis and classification of ambivalent sexism using twitter data. In Proceedings of the Second Workshop on NLP and Computational Social Science. 7–1610.18653/v1/W17-2902Search in Google Scholar

Jigsaw & Google. 2018. Toxic Comment Classification ChallengeSearch in Google Scholar

Jucker, Andreas H. 2000. Slanders, slurs and insults on the road to canterbury: Forms of verbal aggression in Chaucer’s canterbury tales. In Irma Taavitsainen, Terttu Nevalainen, Paivi Pahta & Matti Rissanen (eds.), Placing Middle English in Context, 369–389. Berlin and New York: Muton de Gruyter10.1515/9783110869514.369Search in Google Scholar

Kampf, Zohar. 2015. The politics of being insulted: The uses of hurt feelings in Israeli public discourse. Journal of Language Aggression and Conflict 3(1). 107–12710.1075/jlac.3.1.05kamSearch in Google Scholar

Kecskes, Istvan. 2017. Implicitness in the use of situation bound utterances. In Piotr Cap & Marta Dynel (eds.), Implicitness: From Lexis to Discourse, 201–215. Amsterdam: John Benjamins10.1075/pbns.276.09kecSearch in Google Scholar

Kennedy, Randall 2002. Nigger: The strange career of a troublesome word. New York: Knopf Doubleday PublishingSearch in Google Scholar

Koller, Pavel & Petr Darida. 2020. Emotional behavior with verbal violence: Problems and solutions. Interdisciplinary Journal Papier Human Review 1(2). 1–610.47667/ijphr.v1i2.41Search in Google Scholar

Kunupundi, Deepti, Shamtanu Godbole, Pankaj Kumar & Suhas Pai. 2020. Toxic language using robust filters. SNU Data Science Review 3(2). Available at: https://scholar.smu.edu/datasciencereview/vol3/iss2/12 (accessed 20 July 2022)Search in Google Scholar

Lakoff, George. 1987a. Women, Fire, and Dangerous Things. Chicago: Chicago University Press10.7208/chicago/9780226471013.001.0001Search in Google Scholar

Lakoff, George. 1987b. Cognitive Models and Prototype Theory. In Ulric Neisser (ed.), Concepts and Conceptual Development: Ecological and Intellectual Factors in Categorization, 63–100. Cambridge: Cambridge University PressSearch in Google Scholar

Lakoff, George & Mark Johnson. 1980. Metaphors We Live By. Chicago: Chicago University PressSearch in Google Scholar

Langacker, Ronald. 1987. Foundations of cognitive grammar: Volume I: Theoretical prerequisites. Stanford: Stanford University PressSearch in Google Scholar

Lee, Jinhyuk, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So & Jaewoo Kang. 2020. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4). 1234–124010.1093/bioinformatics/btz682Search in Google Scholar

Leets, Laura. 2001. Explaining perceptions of racist speech. Communication Research 28. 676–70610.1177/009365001028005005Search in Google Scholar

Lepore, Ernie & Mathew Stone. 2018. Explicit indirection. In Daniel Fogal, Daniel W. Harris & Matt Moss (eds.), New Work on Speech Acts, 165–184. Oxford: Oxford University Press10.1093/oso/9780198738831.003.0007Search in Google Scholar

Lewandowska-Tomaszczyk, Barbara. 2017. Incivility and confrontation in online conflict discourses. Lodz Papers in Pragmatics 13. 347–36710.1515/lpp-2017-0017Search in Google Scholar

Lewandowska-Tomaszczyk, Barbara. 2020. Culture-driven emotional profiles and online discourse extremism. Pragmatics and Society 11. 262–29110.1075/ps.18069.lewSearch in Google Scholar

Lewandowska-Tomaszczyk, Barbara, Slavko Žitnik, Anna Bączkowska, Chaya Liebeskind, Jelena Mitrović & Giedre Valunaite Oleskeviciene. 2021. LOD-connected offensive language ontology and tagset enrichment. In Sara Carvalho & Renato Rocha Souza (eds.), Proceedings of the workshops and tutorials held at LDK 2021 co-located with the 3rd Language, Data and Knowledge Conference, 135–150. CEUR Workshop Proceedings. Zaragossa, SpainSearch in Google Scholar

Liu, Zhuang, Degen Huang, Kaiyu Huang, Zhuang Li & Jun Zhao. 2020. FinBERT: A pretrained financial language representation model for financial text mining. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence Special Track on AI in FinTech, 4513–451910.24963/ijcai.2020/622Search in Google Scholar

Ljung, Magnus. 2010. Swearing: A Cross-Cultural Linguistic Study. London: Palgrave Macmillan10.1057/9780230292376Search in Google Scholar

Modha, Sandip, Thomas Mandl, Prasenjit Majumder & Daksh Patel. 2019. Overview of the HASOC track at fire 2019: Hate speech and offensive content identification in Indo-European languages. In Proceedings of the 11th Forum for Information Retrieval Evaluation, FIRE ’19: Forum for Information Retrieval Evaluation, 14–17Search in Google Scholar

Martínez, José M. & Francisco Yus. 2013. Towards a cross-cultural pragmatic taxonomy of insults. Journal of Language Aggression and Conflict 1(1). 87–11410.1075/jlac.1.1.05matSearch in Google Scholar

Mathew, Binny, Punyajoy Saha, Seid M. Yimam, Chris Biemann, Pawan Goyal & Animesh Mukherjee. 2021. Hatexplain: A benchmark dataset for explainable hate speech detection. In Proceedings of the AAAI Conference on Artificial Intelligence 35(17). 14867–1487510.1609/aaai.v35i17.17745Search in Google Scholar

McEnery, Tony. 2004. Swearing in English: Bad language, purity and power from 1586 to the present (Routledge Advances in Corpus Linguistics). London: RoutledgeSearch in Google Scholar

Mills, Sara. 2003. Gender and politeness. Cambridge: Cambridge University Press10.1017/CBO9780511615238Search in Google Scholar

Nobata, Chikashi, Joel Tetreault, Achint Thomas, Yashar Mehdad & Yi Chang. 2016. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web, WWW 16, 145–15310.1145/2872427.2883062Search in Google Scholar

Nunberg, Geoffrey. 2018. The social life of slurs. In Daniel Fogal, Daniel Harris & Matt Moss (eds.), New Work on Speech Acts, 237–295. Oxford: Oxford University Press10.1093/oso/9780198738831.003.0010Search in Google Scholar

Ousidhoum, Nedjma, Zizheng Lin, Hongming Zhang, Yangqiu Song & Dit-Yan Yeung. 2019. Multilingual and multiaspect hate speech analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 4675–4684, Hong Kong, China. Association for Computational Linguistics10.18653/v1/D19-1474Search in Google Scholar

Pamungkas, Endang. W. & Viviana Patti. 2019. Cross-domain and cross-lingual abusive language detection: A hybrid approach with deep learning and a multilingual lexicon. In Proceedings of the 57th annual meeting of the association for computational linguistics: Student research workshop, 363–37010.18653/v1/P19-2051Search in Google Scholar

Partington, Alan. 2006. The Linguistics of Laughter: A Corpus-assisted Study of Laughter talk. London: Routledge10.4324/9780203966570Search in Google Scholar

Poletto, Fabio, Valerio Basile, Manuela Sanguinetti, Cristina Bosco, & Viviana Patti. 2020. Resources and benchmark corpora for hate speech detection: a systematic review. Language Resources and Evaluation, 1–4710.1007/s10579-020-09502-8Search in Google Scholar

Pruksachatkun, Yada, Jason Phang, Haokun Liu, Phu M. Htut, Xiaoyi Zhang, Richard Y. Pang, Clara Vania, Katharina Kann & Samuel Bowman. 2020. Intermediate-task transfer learning with pretrained language models: When and why does it work? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5231–524710.18653/v1/2020.acl-main.467Search in Google Scholar

Qian, Jing, Anna Bethke, Yinyin Liu, Elizabeth Belding & William Y. Wang. 2019a. A benchmark dataset for learning to intervene in online hate speech. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 4755–4764, Hong Kong, China. Association for Computational LinguisticsSearch in Google Scholar

Qian, Jing, Anna Bethke, Yinyin Liu, Elizabeth Belding & William Y. Wang. 2019b. A benchmark dataset for learning to intervene in online hate speech. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 4755–4764, Hong Kong, China: Association for Computational Linguistics10.18653/v1/D19-1482Search in Google Scholar

Ramirez, Artemio, Palazzolo, Kellie E. & Matthew W. Savage. 2010. New directions in understanding cyberbullying. In Rotimi Taiwo (ed.), Handbook of Research on Discourse Behavior and Digital Communication: Language Structures and Social Interaction, 729–744. Herskey, Pennsylvania: IGI Global10.4018/978-1-61520-773-2.ch047Search in Google Scholar

Razavi, Amir H., Diana Inkpen, Sasha Uritsky & Stan Matwin. 2010. Offensive language detection using multi-level classification. In Atefeh Farzindar & Vlado Kešelj (eds.), Advances in Artificial Intelligence. Canadian AI 2010. Lecture Notes in Computer Science 6085, 16–27. Berlin: Springer10.1007/978-3-642-13059-5_5Search in Google Scholar

Reynolds, Kelly, April Kontostathis & Lynne Edwards. 2011. Using machine learning to detect cyberbullying. In 2011 10th International Conference on Machine Learning and Applications and Workshops 2, 41–24410.1109/ICMLA.2011.152Search in Google Scholar

Sai, Siva & Yashvardhan Sharma. 2020. Siva@hasoc-dravidian-codemix-fire-2020: Multilingual offensive speech detecxxtion in code-mixed and romanized text. In FIRE: Forum for Information Retrieval Evaluation, 16-20 December, Hyderabad, India, 336–343Search in Google Scholar

Tenchini, Maria P. & Aldo Frigerio. The impoliteness of slurs and other pejoratives in reported speech. Corpus Pragmatics 4(1). 1–1910.1007/s41701-019-00073-wSearch in Google Scholar

Tirrell, Lynne. 2018. Toxic speech: Inoculations and antidotes. Southern Journal of Philosophy 56. 116–14410.1111/sjp.12297Search in Google Scholar

Vidgen, Bertie, Tristan Thrush, Zeerak Waseem & Douwe Kiela. 2021. Learning from the worst: Dynamically generated datasets to improve online hate detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1667–1682, Online. Association for Computational Linguistics10.18653/v1/2021.acl-long.132Search in Google Scholar

Waseem, Zeerak. 2016. Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In Proceedings of the First Workshop on NLP and Computational Social Science, 138–142, Austin, Texas. Association for Computational Linguistics10.18653/v1/W16-5618Search in Google Scholar

Waseem, Zeerak & Dirk Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on Twitter. In Proceedings of the NAACL Student Research Workshop, 88–93. San Diego, California: Association for Computational Linguistics10.18653/v1/N16-2013Search in Google Scholar

Wellsby, Michele, Paul D. Siakaluk, Penny Pexman & William J. Owen. 2010. Some insults are easier to detect: The embodied insult detection effect. Frontiers in Psychology 110.3389/fpsyg.2010.00198Search in Google Scholar

Zadeh, Lotfi. 1965. Fuzzy sets. Information and Control 8(3). 338–35310.1016/S0019-9958(65)90241-XSearch in Google Scholar

Zampieri, Marcos, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra & Ritesh Kumar. 2019a. Predicting the type and target of offensive posts in social media. In Proceedings of NAACL10.18653/v1/N19-1144Search in Google Scholar

Zampieri, Marcos, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra & Ritesh Kumar. 2019b. Semeval2019 task 6: Identifying and categorizing offensive language in social media (offenseval). In Jonathan May, Ekaterina Shutova, Aurelie Herbelot, Xiaodan Zhu, Marianna Apidianaki & Saif M. Mohammad, Proceedings of the Thirteenth Workshop on Semantic Evaluation. Minneapolis, Minnesota: Association for Computational Linguistics10.18653/v1/S19-2010Search in Google Scholar

Zampieri, Marcos, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis & Cagri Coltekin. 2020. Semeval-2020 task 12: Multilingual offensive language identification in social media (offenseval 2020). In Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May & Ekaterina Shutova (eds.), Proceedings of the Fourteenth Workshop on Semantic Evaluation. Barcelona (online): International Committee for Computational Linguistics10.18653/v1/2020.semeval-1.188Search in Google Scholar

Žitnik, Slavko, Chaya Liebeskind, & Jelena Mitrović. 2021. Offensive language organization. Available at: https://github.com/UL-FRI-Zitnik/offensive-language-organization (accessed 10 April 2023)Search in Google Scholar

Published Online: 2023-07-20

Published in Print: 2023-05-25

An integrated explicit and implicit offensive language taxonomy

Abstract

About the authors

Acknowledgements

Appendix 1 English datasets used in the present study

References

Journal and Issue

Articles in the same Issue