Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Mouton December 12, 2023

Clickbait detection in Hebrew

  • Talya Natanya

    Talya Natanya graduated with honors a bachelor’s degree in computer science. Currently pursuing a master’s degree in data mining at the Jerusalem College of Technology and conducting research in the field of Natural Language Processing (NLP) under the supervision of Dr Chaya Liebeskind.

    ORCID logo EMAIL logo
    and Chaya Liebeskind

    Chaya Liebeskind is a lecturer and researcher in the Department of Computer Science at the Jerusalem College of Technology. Her research interests span both Natural Language Processing and data mining. Especially, her scientific interests include Semantic Similarity, Language Technology for Cultural Heritage, Morphologically rich languages (MRL), Multi-word Expressions (MWEs), Information Retrieval (IR), and Text Classification (TC). Much of her recent work has been focusing on analysing offensive language. She has published a variety of studies and a few of her articles are under review or in preparation. She is a member of several international research actions funded by the EU.

    ORCID logo
From the journal Lodz Papers in Pragmatics

Abstract

The prevalence of sensationalized headlines and deceptive narratives in online content has prompted the need for effective clickbait detection methods. This study delves into the nuances of clickbait in Hebrew, scrutinizing diverse features such as linguistic and structural features, and exploring various types of clickbait in Hebrew, a language that has received relatively limited attention in this context. Utilizing a range of machine learning models, this research aims to identify linguistic features that are instrumental in accurately classifying Hebrew headlines as either clickbait or non-clickbait. The findings underscore the critical role of linguistic attributes in enhancing the performance of the classification model. Notably, the employment of a machine learning model resulted in an impressive accuracy of 0.87 in clickbait detection. Moving forward, our research plan encompasses dataset expansion through the best machine learning model assisted labelling, with the objective of optimizing deep learning models for even more robust outcomes. This study not only advances clickbait detection in the realm of Hebrew but also emphasizes the fundamental importance of linguistic features in the accurate classification of clickbait.

About the authors

Talya Natanya

Talya Natanya graduated with honors a bachelor’s degree in computer science. Currently pursuing a master’s degree in data mining at the Jerusalem College of Technology and conducting research in the field of Natural Language Processing (NLP) under the supervision of Dr Chaya Liebeskind.

Chaya Liebeskind

Chaya Liebeskind is a lecturer and researcher in the Department of Computer Science at the Jerusalem College of Technology. Her research interests span both Natural Language Processing and data mining. Especially, her scientific interests include Semantic Similarity, Language Technology for Cultural Heritage, Morphologically rich languages (MRL), Multi-word Expressions (MWEs), Information Retrieval (IR), and Text Classification (TC). Much of her recent work has been focusing on analysing offensive language. She has published a variety of studies and a few of her articles are under review or in preparation. She is a member of several international research actions funded by the EU.

References

Agrawal, Amol. 2016. Clickbait detection using deep learning. In 2016 2nd international conference on next generation computing technologies (ngct), 268–272. Dehradun, India: IEEE.10.1109/NGCT.2016.7877426Search in Google Scholar

Al-Sarem, Mohammed, Faisal Saeed, Zeyad Ghaleb Al-Mekhlafi, Badiea Abdulkarem Mohammed, Mohammed Hadwan, Tawfik Al-Hadhrami, Mohammad T Alshammari, Abdulrahman Alreshidi & Talal Sarheed Alshammari. 2021. An improved multiple features and machine learning-based approach for detecting clickbait news on social networks. Applied Sciences 11(20). 9487.10.3390/app11209487Search in Google Scholar

Anand, Ankesh, Tanmoy Chakraborty & Noseong Park. 2017. We used neural networks to detect clickbaits: you won’t believe what happened next! In Advances in Information Retrieval: 39th European Conference on IR Research, ECIR 2017, Aberdeen, UK, April 8-13, 2017, Proceedings 39, 541–547. Aberdeen, UK: Springer International Publishing.10.1007/978-3-319-56608-5_46Search in Google Scholar

Biyani, Prakhar, Kostas Tsioutsiouliklis & John Blackmer. 2016. "8 Amazing Secrets for Getting More Clicks": Detecting Clickbaits in News Streams Using Article Informality. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30. DOI: https://doi.org/10.1609/aaai.v30i1.996610.1609/aaai.v30i1.9966Search in Google Scholar

Breiman, Leo. 1996. Bagging predictors. Machine learning 24. 123–140.10.1007/BF00058655Search in Google Scholar

Breiman, Leo. 2001. Random forests. Machine learning 45. 5–32.10.1023/A:1010933404324Search in Google Scholar

Cao, Xinyue, Thai Le, et al. 2017. Machine learning based detection of clickbait posts in social media. arXiv preprint arXiv:1710.01977.Search in Google Scholar

Chakraborty, Abhijnan, Bhargavi Paranjape, Sourya Kakarla & Niloy Ganguly. 2016. Stop clickbait: detecting and preventing clickbaits in online news media. In 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), 9–16.10.1109/ASONAM.2016.7752207Search in Google Scholar

Chang, Chih-Chung & Chih-Jen Lin. 2011. Libsvm: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST) 2(3). 1–27.10.1145/1961189.1961199Search in Google Scholar

Chriqui, Avihay & Inbal Yahav. 2022. HeBERT & HebEMO: A Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. INFORMS Journal on Data Science 1(1). 81–95.10.1287/ijds.2022.0016Search in Google Scholar

Christopher, D Manning, Raghavan Prabhakar, Schütze Hinrich, et al. 2008. Introduction to information retrieval. An Introduction To Information Retrieval 151(177). 5.10.1017/CBO9780511809071Search in Google Scholar

Dam, Shiva Ram, Sanjeeb Prasad Panday & Tara Bahadur Thapa. 2021. Detecting clickbaits on nepali news using SVM and RF. In Proceedings of 9th IOE Graduate Conference, vol. 9, 140–146. Nepal.Search in Google Scholar

Dong, Manqing, Lina Yao, Xianzhi Wang, Boualem Benatallah & Chaoran Huang. 2019. Similarity-aware deep attentive model for clickbait detection. In Advances in Knowledge Discovery and Data Mining: 23rd Pacific-Asia Conference, PAKDD 2019, Macau, China, April 14-17, 2019, Proceedings, Part II 23, 56–69. Macau, China: Springer International Publishing.10.1007/978-3-030-16145-3_5Search in Google Scholar

Fakhruzzaman, Muhammad N, Saidah Z Jannah, Ratih A. Ningrum & Indah Fahmiyah. 2021. Clickbait headline detection in indonesian news sites using multilingual bidirectional encoder representations from transformers (m-bert). arXiv preprint arXiv:2102.01497.Search in Google Scholar

Fu, Junfeng, Liang Liang, Xin Zhou & Jinkun Zheng. 2017. A convolutional neural network for clickbait detection. In 2017 4th International Conference on Information Science and Control Engineering (ICISCE), 6–10. Changsha, China: IEEE.10.1109/ICISCE.2017.11Search in Google Scholar

Geçkil, Ayse, Ahmet Anil Müngen, Esra Gündogan & Mehmet Kaya. 2018. A clickbait detection method on news sites. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 932–937. Barcelona, Spain: IEEE.10.1109/ASONAM.2018.8508452Search in Google Scholar

Guyon, Isabelle, Jason Weston, Stephen Barnhill & Vladimir Vapnik. 2002. Gene selection for cancer classification using support vector machines. Machine learning 46. 389–422.10.1023/A:1012487302797Search in Google Scholar

Klairith, Praphan & Sansiri Tanachutiwat. 2018. Thai clickbait detection algorithms using natural language processing with machine learning techniques. In 2018 International Conference on Engineering, Applied Sciences, and Technology (ICEAST), 1–4. Phuket, Thailand: IEEE.10.1109/ICEAST.2018.8434447Search in Google Scholar

Kumar, Vaibhav, Dhruv Khattar, Siddhartha Gairola, Yash Kumar Lal & Vasudeva Varma. 2018. Identifying clickbait: A multi-strategy approach using neural networks. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 1225–1228. MI, Ann Arbor, USA. DOI: https://doi.org/10.1145/3209978.3210144 .10.1145/3209978.3210144Search in Google Scholar

Liebeskind, Chaya & Shmuel Liebeskind. 2018. Identifying abusive comments in Hebrew Facebook. In 2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE), 1–5. Eilat, Israel: IEEE.10.1109/ICSEE.2018.8646190Search in Google Scholar

Litvak, Marina, Natalia Vanetik, Chaya Liebeskind, Omar Hmdia & Rizek Abu Madeghem. 2022. Offensive language detection in hebrew: can other languages help?. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, 3715–3723. Marseille, France: European Language Resources Association.Search in Google Scholar

Marreddy, Mounika, Subba Reddy Oota, Lakshmi Sireesha Vakada, Venkata Charan Chinni & Radhika Mamidi. 2021. Clickbait detection in Telugu: Overcoming NLP challenges in resource-poor languages using benchmarked techniques. In 2021 International Joint Conference on Neural Networks (IJCNN), 1–8. Shenzhen, China: IEEE.10.1109/IJCNN52387.2021.9534382Search in Google Scholar

McNemar, Quinn. 1947. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2). 153–157.10.1007/BF02295996Search in Google Scholar

More, Amir & Reut Tsarfaty. 2016. Data-driven morphological analysis and disambiguation for morphologically rich languages and universal dependencies. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers , 337–348. Osaka, Japan: The COLING 2016 Organizing Committee.Search in Google Scholar

Potthast, Martin, Sebastian Köpsel, Benno Stein & Matthias Hagen. 2016. Clickbait detection. In Advances in Information Retrieval: 38th European Conference on IR Research, ECIR 2016, Padua, Italy, March 20–23, 2016. Proceedings 38 , 810–817. Padua, Italy: Springer International Publishing.Search in Google Scholar

Prokofeva, Natalia A & Irina A Akulovich. 2021. The language means of comicality in clickbait headings. Vestnik Volgogradskogo Gosudarstvennogo Universiteta. Seriia 2, IAzykoznanie 20(3). 151–165.10.15688/jvolsu2.2021.3.13Search in Google Scholar

Yu, Hsiang-Fu, Fang-Lan Huang & Chih-Jen Lin. 2011. Dual coordinate descent methods for logistic regression and maximum entropy models. Machine Learning 85. 41–75.10.1007/s10994-010-5221-8Search in Google Scholar

Zheng, Hai-Tao, Jin-Yuan Chen, Xin Yao, Arun Kumar Sangaiah, Yong Jiang & Cong-Zhi Zhao. 2018. Clickbait convolutional neural network. Symmetry 10(5). 138.10.3390/sym10050138Search in Google Scholar

Zuhroh, Nurrida Aini & Nur Aini Rakhmawati. 2020. Clickbait detection: a literature review of the methods used. Register: Jurnal Ilmiah Teknologi Sistem Informasi 6(1). 1–10.10.26594/register.v6i1.1561Search in Google Scholar

Published Online: 2023-12-12
Published in Print: 2023-12-15

© 2023 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 3.5.2024 from https://www.degruyter.com/document/doi/10.1515/lpp-2023-0021/html
Scroll to top button