Evaluating the effectiveness of machine learning methods for keyword coverage using semantic data analysis

Anargul Shaushenova, Aigulim Bayegizova, Gulnaz Baidrakhmanova, Zhanargul Abuova, Akmaral Kassymova, Dana Bakirova, Yekaterina Golenko

Abstract


This article presents a comprehensive comparative analysis of two advanced hybrid machine learning approaches for keyword extraction: bidirectional encoder representations from transformers (BERT) combined with autoencoder (AE) and term frequency-inverse document frequency (TF-IDF) combined with autoencoder. The research targets the task of semantic analysis in text data to evaluate the effectiveness of these methods in ensuring adequate keyword coverage across diverse text corpora. The study delves into the architecture and operational principles of each method, with a particular focus on the integration with autoencoders to enhance the semantic integrity and relevance of the extracted keywords. The experimental section provides a detailed performance analysis of both methods on various text datasets, highlighting how the structure and semantic richness of the source data influence the outcomes. The evaluation methodology includes precision, recall, and F1-score metrics. The paper discusses the advantages and disadvantages of each approach and their suitability for specific keyword extraction tasks. The findings offer valuable insights for the scientific community, aiding in the selection of the most appropriate text processing method for applications requiring deep semantic understanding and high accuracy in information extraction.

Keywords


Bidirectional encoder representations from transformers; Hybrid methods; Inverse document frequency; Keyword extraction; Semantic data analysis; Term frequency

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v15i1.pp559-568

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).