Flagging clickbait in Indonesian online news websites using fine-tuned transformers

Muhammad Noor Fakhruzzaman, Sa'idah Zahrotul Jannah, Ratih Ardiati Ningrum, Indah Fahmiyah

Abstract


Click counts are related to the amount of money that online advertisers paid to news sites. Such business models forced some news sites to employ a dirty trick of click-baiting, i.e., using hyperbolic and interesting words, sometimes unfinished sentences in a headline to purposefully tease the readers. Some Indonesian online news sites also joined the party of clickbait, which indirectly degrade other established news sites' credibility. A neural network with a pre-trained language model multilingual bidirectional encoder representations from transformers (BERT) that acted as an embedding layer is then combined with a 100 node-hidden layer and topped with a sigmoid classifier was trained to detect clickbait headlines. With a total of 6,632 headlines as a training dataset, the classifier performed remarkably well. Evaluated with 5-fold cross-validation, it has an accuracy score of 0.914, an F1-score of 0.914, a precision score of 0.916, and a receiver operating characteristic-area under curve (ROC-AUC) of 0.92. The usage of multilingual BERT in the Indonesian text classification task was tested and is possible to be enhanced further. Future possibilities, societal impact, and limitations of clickbait detection are discussed.

Keywords


adult literacy; clickbait; natural language processing; online news;

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v13i3.pp2921-2930

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).