Saranyanath K P, Wei Shi and Jean-Pierre Corriveau, Carleton University, Canada
Cyberbullying is a form of bullying that occurs across social media platforms using electronic messages. This paper proposes three approaches and five models to identify cyberbullying on a generated social media dataset derived from multiple online platforms. Our initial approach consists in enhancing Support Vector Machines. Our second approach is based on DistilBERT, a lighter and faster Transformer model than BERT. Staking the first three models we obtain two more ensemble models. Contrasting the ensemble models with the three others, we observe that the ensemble models outperform the base model concerning all evaluation metrics except precision. While the highest accuracy, of 89.6%, was obtained using an ensemble model, we achieved the lowest accuracy, at 85.53% on the SVM model. The DistilBERT model exhibited the highest precision, at 91.17%. The model developed using the different granularity of features outperformed the simple TF-IDF.
Machine Learning, Natural Language Processing, Support Vector Machine, DistilBERT, Cyberbullying.