Detecting spam e-mails using stop word TF-IDF and stemming algorithm with Naïve Bayes classifier on the multicore GPU

Manjit Jaiswal, Sukriti Das, Khushboo Khushboo

Abstract


A spam detector is a program which is used to identify unwanted emails and prevents those messages from getting into a user's mail. The preprocessing method of stop word, TF-IDF and stemming algorithm is discussed in detection of spam e-mails using the Naïve Bayes classifier. The study was focused on how the algorithms can be applied on a number of e-mails consisting of both ham and spam e-mails. First, the working on process, algorithm and steps which are followed for implementation of stop words, TF-IDF and stemming algorithm on NVIDIA’s Tesla P100 GPU are discussed and to classify by executing of Naïve Bayes algorithm whose generate the accuracy. After complete training and testing of the spam e-mails dataset taken from Kaggle by using the proposed method, we found that a high training accuracy of 99.67% and testing accuracy of about 99.03% while run on the Tesla P100 GPU that raised the speedup of both training time period and testing time period which is improved of training and testing accuracy around 0.22% and 0.18% respectively when compared to that after applying only Naïve Bayes i.e. conventional method to the same dataset where we found training and testing accuracy to be 99.45% and 98.85% respectively.Also,we found that training time taken on GPU is 1.361 seconds which was about 1.49X faster than that taken on CPU which is 2.029 seconds. And the testing time taken on GPU is 1.978 seconds which was about 1.15X faster than that taken on CPU which is 2.280 seconds.

Keywords


google colab; GPU; Naïve Bayes; NVIDIA; porter’s algorithm; stemming; tesla; TF-IDF;



DOI: http://doi.org/10.11591/ijece.v11i4.pp%25p
Total views : 0 times


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

ISSN 2088-8708, e-ISSN 2722-2578