Enhancing sentiment analysis in Kannada texts by feature selection
Abstract
In recent years, there has been a noticeable surge in research activities focused on sentiment analysis within the Kannada language domain. The existing research highlights a lack of labelled datasets and limited exploration in feature selection for Kannada sentiment analysis, hindering accurate sentiment classification. To address this gap, the study aims to introduce a novel Kannada dataset and develop an effective classifier for improved sentiment analysis in Kannada texts. The study presents a new Kannada dataset from SemEval 2014 Task4 using Google Translate. It then introduces a modified bidirectional encoder representation from transformers BERT for Kannada dataset called as Kannada-BERT (K-BERT). Further, a probability-clustering (PC) approach is presented to extract the topics and its related aspects. Both the K-BERT classifier and PC approach were merged to attain a K-BERT-PC classifier, integrating a modified BERT model and probability clustering approach for achieving better results. Experimental results demonstrate that K-BERT-PC achieves superior performance in polarity and sentiment analysis accuracy, with an impressive accuracy rate of 91%, surpassing existing classifiers. This work contributes by providing a solution to the scarcity of labelled datasets for Kannada sentiment analysis and introduces an effective classifier, K-BERT-PC, for enhanced sentiment analysis outcomes in Kannada texts.
Keywords
BERT; Kannada; Probability clustering; SemEval 2014; Sentiment analysis
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v14i6.pp6572-6582
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).