Enhancing cyberbullying detection with advanced text preprocessing and machine learning
Abstract
The use of social media and the internet has been increasing dramatically in recent years. Cyber-bullying is the term used to describe the misuse of social media by some people who make threatening comments. This has a devastating influence on people's lives, especially those of children and teenagers, and can lead to feelings of depression and suicidal thoughts. The methodology proposed in this paper includes four steps for identifying cyberbullying: preprocessing, feature extraction, classification, and evaluation. The first step is to create a labeled, varied dataset. Word2Vec and term frequency-inverse document frequency are used in feature extraction to transform text into high-dimensional vectors. Word2Vec creates word embeddings using the skip-gram and continuous bag-of-words models, while term frequency-inverse document frequency assesses the text's term relevancy. Support vector machine classifiers are used in the model, and their effectiveness is compared to that of other techniques like logistic regression and naïve Bayes. The classifiers support vector machine, naïve Bayes, and logistic regression were assessed. The maximum accuracy was 95% for the support vector classifier with skip-gram and 93% for continuous bag-of-words. For sentiment categories, F1-scores, recall, and precision were computed. The average precision and recall were 0.77 and 0.79, respectively.
Keywords
Cyberbullying; Detection; Online threats; Social media; Social media misuse; Support vector machines; Text classification
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v15i3.pp3139-3148
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES).