Preliminary Evaluation of Gaussian Naive Bayes for Multi-Label Hate Speech and Abusive Language Detection on Indonesian Twitter

Tri Pratiwi  Handayani; Wahyudin  Hasyim; Nursetia Wati

doi:10.62504/jimr532

Preliminary Evaluation of Gaussian Naive Bayes for Multi-Label Hate Speech and Abusive Language Detection on Indonesian Twitter

Authors

Tri Pratiwi Handayani Universitas Muhammadiyah Gorontalo, Gorontalo, Indonesia Author
Wahyudin Hasyim Universitas Muhammadiyah Gorontalo, Gorontalo, Indonesia Author
Nursetia Politeknik Negeri Gorontalo, Indonesia Author

DOI:

https://doi.org/10.62504/jimr532

Keywords:

Gaussian Naïve Bayes, Hate speech, Cyberbulling, TF-IDF, BERT

Abstract

Automatic detection of hate speech and abusive language is crucial for combating online toxicity. This study explores Gaussian Naive Bayes for multi-label classification of hate speech on Indonesian Twitter, including target, category, and level. We combined TF-IDF features with contextual BERT embeddings. The model achieved balanced performance for general hate speech and good non-abusive language detection. However, it exhibited limitations with imbalanced data and specific hate speech types. The classifier consistently favored the majority class (non-hateful/non-abusive) across labels, particularly struggling with HS_Gender, HS_Physical, etc. This suggests difficulty detecting less frequent but potentially severe hate speech, likely due to limited training data. Overall accuracy and F1-scores confirm that while Gaussian Naive Bayes is efficient, it lacks robustness for nuanced multi-label classification with imbalanced datasets. This necessitates exploring alternative approaches for effectively detecting specific and less frequent hate speech.

Downloads

Download data is not yet available.

References

Badjatiya, P., Gupta, S., Gupta, M., & Varma, V. (2017). Deep Learning for Hate Speech Detection in Tweets. Proceedings of the 26th International Conference on World Wide Web Companion (WWW).

Chen, Z., Zhou, Y., & Zou, Y. (2018). Integrating Sentiment Features and Word Embeddings for Sentiment Analysis. Journal of Information Science and Engineering, 34(5), 1237–1250.

Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated Hate Speech Detection and the Problem of Offensive Language. Proceedings of the 11th International AAAI Conference on Web and Social Media (ICWSM).

Ibrohim, M. O., & Budi, I. (2019). Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter. ALW3: 3rd Workshop on Abusive Language Online, 46–57. https://www.aclweb.org/anthology/W19-3506.pdf

Wang, B., Peng, T., Yang, J., & Sun, H. (2017). Stacking-Based Ensemble Learning for Sentiment Classification of Chinese Microblogs. Neurocomputing, 214, 708–718.

Waseem, Z., & Hovy, D. (2016). Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. Proceedings of the NAACL Student Research Workshop.

Xu, W., Liu, X., & Gong, Y. (2012). Document Clustering Based on Non-negative Matrix Factorization. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).

Zhang, Z., & Luo, L. (2019). Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter. Semantic Web, 10(5), 925–945.

Downloads

PDF + FULL TEXT

Published

29-11-2023

Issue

Vol. 1 No. 1 (2023): November 2023

Section

Articles

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

How to Cite

Preliminary Evaluation of Gaussian Naive Bayes for Multi-Label Hate Speech and Abusive Language Detection on Indonesian Twitter. (2023). Journal of International Multidisciplinary Research, 1(1), 159-165. https://doi.org/10.62504/jimr532

Download Citation

Most read articles by the same author(s)

Listiyana, Taufiq Nur Azis, Arizqi Ihsan Pratama, Peran Supervisi Kepala Sekolah Dalam Meningkatkan Kompetensi Profesional Guru Di SMP Terpadu Raganis , Journal of International Multidisciplinary Research: Vol. 2 No. 2 (2024): Februari 2024
Netty Lisdiantini, Edi Mulyadi, Mashudi, Luluk Fauziah, Garry Brumadyadisty, How Does PT Surya Sembada Integrate Corporate Social Responsibility into Its PR Strategy? , Journal of International Multidisciplinary Research: Vol. 2 No. 7 (2024): Juli 2024
Siti Nurhasanah, Lardin Korawijayanti, Alvianita Gunawan Putri, Optimasi Persediaan Bahan Baku Pada Cv Putra Cipta Sejati Dengan Metode Economic Order Quantity (EOQ) Guna Mencapai Efisiensi Biaya Persediaan , Journal of International Multidisciplinary Research: Vol. 1 No. 1 (2023): November 2023
Aminah Puspita Sari, Wiene Surya Putra, Dwi Anjani, Evaluasi Kebijakan dalam Peningkatan Kualitas Mutu Pendidikan di Indonesia , Journal of International Multidisciplinary Research: Vol. 1 No. 2 (2023): Desember 2023
Khaerunnas, Dahlan Lama Bawa, Yakub, Muhammad Syahruddin, Analisis Peranan Dakwah Pimpinan Cabang Muhammadiyah (PCM) Dalam Pembinaan Agama Masyarakat Kecamatan Pangkajene Kabupaten Pangkajene Dan Kepulauan , Journal of International Multidisciplinary Research: Vol. 2 No. 4 (2024): April 2024
Melati Br. Lubis, Marsyela, Putri Suci Ramadhani, Perencanaan Sistem Manajemen Strategi Pendidikan Untuk Meningkatkan Mutu Di Sekolah SMPN 35 Medan , Journal of International Multidisciplinary Research: Vol. 1 No. 2 (2023): Desember 2023
Herawati Syamsul, Abd. Basir, Andi Zulkarnain, Analisis Efektivitas Wasiat Sebagai Instrumen Peralihan Harta Dalam Perspektif Hukum Perdata , Journal of International Multidisciplinary Research: Vol. 2 No. 11 (2024): November 2024
Sukma Maladewi, M. Ilham Muchtar, Muhammad Yasin, Strategi Penyiar Dalam Meningkatkan Kualitas Pemberitaan Di Radio Insania 100,8 FM Makassar , Journal of International Multidisciplinary Research: Vol. 2 No. 2 (2024): Februari 2024
Ayuni Kumalawati, Ahmad Farid, Nailil Muna Sholihah, Telaah Kurikulum Pendidikan Akidah Anak Di Madrasah Ibtidaiyah Perspektif Abdullah Nashih Ulwan Di Abad Ke 21 , Journal of International Multidisciplinary Research: Vol. 2 No. 9 (2024): September 2024
Heinrich Rakuasa, Viktor Vladimirovich Budnikov, Daniel Anthoni Sihasale, Spatial Analysis of the Suitability of Residential Area Development to the Regional Spatial Plan of Ambon City, Indonesia, Based on the Slope Factor , Journal of International Multidisciplinary Research: Vol. 2 No. 10 (2024): Oktober 2024

<< < 65 66 67 68 69 70 71 72 73 74 > >>

Preliminary Evaluation of Gaussian Naive Bayes for Multi-Label Hate Speech and Abusive Language Detection on Indonesian Twitter

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Most read articles by the same author(s)

Similar Articles

Main Menu

CHAT US

JOURNAL TEMPLATE

RECOMENDED TOOL

Keywords

Visitors

Make a Submission