Convolutional neural network-based model for web-based text classification

Satyabrata Aich, Sabyasachi Chakraborty, Hee-Cheol Kim

Abstract


There is an increasing amount of text data available on the web with multiple topical granularities; this necessitates proper categorization/classification of text to facilitate obtaining useful information as per the needs of users. Some traditional approaches such as bag-of-words and bag-of-ngrams models provide good results for text classification. However, texts available on the web in the current state contain high event-related granularity on different topics at different levels, which may adversely affect the accuracy of traditional approaches. With the invention of deep learning models, which already have the capability of providing good accuracy in the field of image processing and speech recognition, the problems inherent in the traditional text classification model can be overcome. Currently, there are several deep learning models such as a convolutional neural networks (CNNs), recurrent neural networks (RNNs), and long-short term memory that are widely used for various text-related tasks; however, among them, the CNN model is popular because it is simple to use and has high accuracy for text classification. In this study, classification of random texts on the web into categories is attempted using a CNN-based model by changing the hyperparameters and sequence of text vectors. We attempt to tune every hyperparameter that is unique for the classification task along with the sequences of word vectors to obtain the desired accuracy; the accuracy is found to be in the range of 85–92%. This model can be considered as a reliable model and applied to solve real-world problem or extract useful information for various text mining applications.


Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v9i6.pp5185-5191

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).