Optimizing breast cancer diagnosis: combining hybrid architectures through Apache Spark
Abstract
Early detection and diagnosis of breast cancer are critical for saving lives. This paper addresses two major challenges associated with this task: the vast amount of data processing involved and the need for early detection of breast cancer. To tackle these issues, we developed thirty hybrid architectures by combining five deep learning techniques (Xception, Inception-V3, ResNet50, VGG16, VGG19) as feature extractors and six classifiers (random forest, logistic regression, naive Bayes, gradient-boosted tree, decision tree, and support vector machine) implemented on the Spark framework. We evaluated the performance of these architectures using four classification criteria. The results, analyzed using Scott Knott's statistical test, demonstrated the effectiveness of merging deep learning feature extraction techniques with traditional classifiers for classifying breast cancer into malignant and benign tumors. Notably, the hybrid architecture using logistic regression as the classifier and ResNet50 for feature extraction (RESLR) emerged as the top performer. It achieved impressive accuracy scores of 98.20%, 96.59%, 96.64%, and 94.84% across the Break-His dataset at different magnifications (40X, 100X, 200X, and 400X) respectively. Additionally, RESLR achieved an accuracy of 97.05% on the ICIAR dataset and a remarkable accuracy of 95.31% on the FNAC dataset.
Keywords
Big data; Deep learning; Hybrid architectures; Machine learning; Pre-processing Spark
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v14i4.pp4261-4272
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).