Indonesian multilabel classification using IndoBERT embedding and MBERT classification

Ghinaa Zain Nabiilah, Islam Nur Alam, Eko Setyo Purwanto, Muhammad Fadlan Hidayat


The rapid increase in social media activity has triggered various discussion spaces and information exchanges on social media. Social media users can easily tell stories or comment on many things without limits. However, this often triggers open debates that lead to fights on social media. This is because many social media users use toxic comments that contain elements of racism, radicalism, pornography, or slander to argue and corner individuals or groups. These comments can easily spread and trigger users vulnerable to mental disorders due to unhealthy and unfair debates on social media. Thus, a model is needed to classify comments, especially toxic ones, in Indonesian. Transformer-based model development and natural language processing approaches can be applied to create classification models. Some previous research related to the classification of toxic comments has been done, but the classification results of the model still require exploration to get optimal results. So, this research uses the proposed model by using different pre-trained models at the embedding and classification stages, in the embedding stage using Indonesia bidirectional encoder representations from transformers (IndoBERT), and classification using multilingual bidirectional encoder representations from transformers (MBERT). The proposed model provides optimal results with an F1 value of 0.9032.


IndoBERT embedding; Indonesian social media; MBERT; Text classification; Toxic comment;

Full Text:



Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).