Preliminary Evaluation of Gaussian Naive Bayes for Multi-Label Hate Speech and Abusive Language Detection on Indonesian Twitter

Tri Pratiwi  Handayani; Wahyudin  Hasyim; Nursetia Wati

doi:10.62504/jimr532

Preliminary Evaluation of Gaussian Naive Bayes for Multi-Label Hate Speech and Abusive Language Detection on Indonesian Twitter

Authors

Tri Pratiwi Handayani Universitas Muhammadiyah Gorontalo, Gorontalo, Indonesia Author
Wahyudin Hasyim Universitas Muhammadiyah Gorontalo, Gorontalo, Indonesia Author
Nursetia Politeknik Negeri Gorontalo, Indonesia Author

DOI:

https://doi.org/10.62504/jimr532

Keywords:

Gaussian Naïve Bayes, Hate speech, Cyberbulling, TF-IDF, BERT

Abstract

Automatic detection of hate speech and abusive language is crucial for combating online toxicity. This study explores Gaussian Naive Bayes for multi-label classification of hate speech on Indonesian Twitter, including target, category, and level. We combined TF-IDF features with contextual BERT embeddings. The model achieved balanced performance for general hate speech and good non-abusive language detection. However, it exhibited limitations with imbalanced data and specific hate speech types. The classifier consistently favored the majority class (non-hateful/non-abusive) across labels, particularly struggling with HS_Gender, HS_Physical, etc. This suggests difficulty detecting less frequent but potentially severe hate speech, likely due to limited training data. Overall accuracy and F1-scores confirm that while Gaussian Naive Bayes is efficient, it lacks robustness for nuanced multi-label classification with imbalanced datasets. This necessitates exploring alternative approaches for effectively detecting specific and less frequent hate speech.

Downloads

Download data is not yet available.

References

Badjatiya, P., Gupta, S., Gupta, M., & Varma, V. (2017). Deep Learning for Hate Speech Detection in Tweets. Proceedings of the 26th International Conference on World Wide Web Companion (WWW).

Chen, Z., Zhou, Y., & Zou, Y. (2018). Integrating Sentiment Features and Word Embeddings for Sentiment Analysis. Journal of Information Science and Engineering, 34(5), 1237–1250.

Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated Hate Speech Detection and the Problem of Offensive Language. Proceedings of the 11th International AAAI Conference on Web and Social Media (ICWSM).

Ibrohim, M. O., & Budi, I. (2019). Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter. ALW3: 3rd Workshop on Abusive Language Online, 46–57. https://www.aclweb.org/anthology/W19-3506.pdf

Wang, B., Peng, T., Yang, J., & Sun, H. (2017). Stacking-Based Ensemble Learning for Sentiment Classification of Chinese Microblogs. Neurocomputing, 214, 708–718.

Waseem, Z., & Hovy, D. (2016). Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. Proceedings of the NAACL Student Research Workshop.

Xu, W., Liu, X., & Gong, Y. (2012). Document Clustering Based on Non-negative Matrix Factorization. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).

Zhang, Z., & Luo, L. (2019). Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter. Semantic Web, 10(5), 925–945.

Downloads

PDF + FULL TEXT

Published

29-11-2023

Issue

Vol. 1 No. 1 (2023): November 2023

Section

Articles

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

How to Cite

Preliminary Evaluation of Gaussian Naive Bayes for Multi-Label Hate Speech and Abusive Language Detection on Indonesian Twitter. (2023). Journal of International Multidisciplinary Research, 1(1), 159-165. https://doi.org/10.62504/jimr532

Download Citation

Preliminary Evaluation of Gaussian Naive Bayes for Multi-Label Hate Speech and Abusive Language Detection on Indonesian Twitter

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Most read articles by the same author(s)

Similar Articles

Main Menu

CHAT US

JOURNAL TEMPLATE

RECOMENDED TOOL

Keywords

Visitors

Make a Submission