Improving misspelled word solving for human trafficking detection in online advertising data
Abstract
Social media is used by pimps to advertise their businesses for adult services due to easy accessibility. This requires the potentially computational model for law enforcement authorities to facilitate a detection of human trafficking activities. The machine learning (ML) models used to detect these activities mostly rely on text classification and often omit the correction of misspelled words, resulting in the risk of predictions error. Therefore, an improvement data processing approach is one of strategies to enhance an efficiency of human trafficking detection. This paper presents a novel approach to solving spelling mistakes. The approach is designed to select misspelled words, the replace them with the popular words having the same meaning based on an estimation of the probability of words and context used in human trafficking advertisements. The applicability of the proposed approach was demonstrated with the labeled human trafficking dataset using three classification models: k-nearest neighbor (KNN), naive Bayes (NB), and multilayer perceptron (MLP). The achievement of higher accuracy of the model predictions using the proposed method evidences an improved alert on human trafficking outperforming than the others. The proposed approach shows the potential applicability to other datasets and domains from the online advertisements.
Keywords
Artificial intelligence; bidirectional encoder representations from transformers; data preparation; human trafficking; text classification
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v13i6.pp6558-6567
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).