Artificial intelligence for automatic moderation of textual content in online chats and social networks
Abstract
The article explores fundamental techniques for converting text into numerical data for machine learning algorithms. It meticulously examines various methods, including word vector representation via neural networks like Word2Vec, and explains the principles behind linear models such as logistic regression and support vector machines. Convolutional neural networks (CNN) and long short-term memory (LSTM) methods are also discussed, covering their components, mechanisms, and training processes. The research extends to developing and testing software for spam detection, hate speech identification, and recognizing offensive language. Using two datasets—one for labeled text messages and another for Twitter posts—the study analyzes data to address challenges like imbalanced data. A comparative analysis among linear models, deep neural networks, and single-layer models, using pre-trained bidirectional encoder representations from transformers (BERT) network, reveals promising results. The convolutional neural network stands out with a remarkable accuracy of 0.95. The study also adapts neural network architectures for hate speech and offensive language classification.
Keywords
Artificial intelligence; Language classification Machine learning algorithms; Neural networks; Spam detection
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v15i3.pp3396-3409
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES).